Event Recording

Running Machine Learning Analytics On Traces

Name: Running Machine Learning Analytics On Traces
Uploaded: 2023-05-11T12:00:00+02:00
Duration: 18 min 12 s

Posted on May 11, 2023

Let’s do things differently. To start with, let us view logs and traces as no different from any other data. The data an application indirectly generates when in use (the logs and traces) is no different from the data an application directly works with (input and output). So let’s keep them all together in a scalable cloud storage repository. Once it is there, it is just like any other big data. We need to analyze and apply intelligent monitoring to detect situations of interest. So we need to apply trained ML models to a stream of such data for immediate alerting when the traces indicate an unwanted behavior occurring or brewing. This talk will show how to harness existing technologies to do just that.

Show description

Speaker

Fawaz Ghali

Principal Developer Advocate
Hazelcast

Show Transcript

Basically I wanted just to cover the presentation from real time machine learning and how you can apply it on basically traces. This use case is kind of like, you know, applicable to traces. So if you want basically to secure an application which has some kind of logging messages, you probably want to find out what solutions you can use, what type of security solution you're gonna use. And essentially the use case is for traces, but obviously you can apply it to any other domains.

Most solutions focus only on one specific input and one specific output and it becomes really hard in terms of where you want to see how you can get the results. So first of all, anyone recognize these guys on the screen here? Yep. Who are they? The Beatles obviously.

Beatles, yes. So I came from Liverpool from the uk so this is why where I am based. And today what I wanted is kind give you kind of idea of real time story. So you can take this and apply it to your domain. So most people assume it's basically you have some kind of problem that you're trying to solve and you want to apply solution, right? So it looks straightforward, the process is not hard and you want basically to see what's going on and as they always say, that's in the details.

So you probably can sense where things can go wrong here in terms of packaging your solution, deploying it if you want to scale up and scale down. So how can you apply security to all these steps and make sure that you also handle load balance. Because if you can secure a standalone application or small application and essentially you want to take it to the cloud, which essentially exactly where you want to deploy this to deploy these solutions becomes really hard even if you want to monitor these type of solutions.

So this is what we'll be talking about today and anyone can tell me how long it takes for an eye to blink. Any guesses? Millisecond. So it takes about one third of second, right? So imagine that you have a problem that you're trying to secure and on top of it you want a low latency solution because this is really problem. So what you can solve in terms of software on architecture you can solve with hardware, for example, network hopes, input, output. So all this you have limited sla, which is like roughly around, you know, less than a hundred milliseconds.

So you want to apply these type of solutions and you see it everywhere. You see it in finance, you see it in IOT devices, same as previous presentation. You see it in medical devices, sports for example, and machine learning, which what we were talking about today. And there is kind of like a, if this one works, nope, okay, aim it. Come again, right? Aim Over here.

Okay, screen right. So basically we want to combine two different concepts here for this type of solutions. So you want to combine time data. So this data you want to process even before you store it, right? So imagine you have an iot device and this iot device obviously broadcasting some type of data you want to process this data before you store it. So let me repeat this again just to make sure that you grasp this concept. So you want to process the data even before you store it. Usually most solutions, like 99% of solutions focus on storing data.

So you have an IOT device, you write some kind of code and then you deploy application, you save it into database and then you process it. It doesn't work with realtime applications. So all these applications you see smart cards for example, smart houses, IOT devices, all these applications required to have realtime data applicable to you, to your application. So this is where realtime data comes into place and we want to process it before we store it. But this will give you just the overview. You still need context.

So the historical part where we all live in is essentially you store your data somewhere and you try to process it. So we're trying today to combine these two different types. So most applications you, you see it online, Uber, LinkedIn, they use this same, same type of analytics here. Now how many is too many? Obviously if you have one single device, it's not a problem. Probably you have your hundred Ks or less or more. So you can process, you can write application, you can secure it and you're good to go.

There is no problem here but all we're talking about is like billion transactions per second. So there is no way for an simple application to handle this type of data and O on top of it you want to apply security. So this will also add delay to your application and there are so many ways to actually handle it, but what we're interested in is kind of like two different concepts. So for this talk today, we would want to make sure to have some kind of alert. So if I give you now a a use case and I will tell you basically let me know how it's gonna work in the future.

So this is what we're talking about. I want to know if my application will fail or not, if there is a security on it or not. And I'm trying to predict what's gonna happen next. Obviously you need to define some kind of trends and this kind of like you need to scale it, right? So there is no way for a standalone application to run on local machine to be able to handle all this type of data. So you need clusters, nodes and you want to do make sure that you ha avoid bottlenecks. There are so many tools to do it.

So, and you can split it into 12 sections. It depends what type of solutions you want to apply for real-time machine learning here, but it's kind of like hard to actually have all everything because sometimes you actually cross the border between two solutions and if you work in industry, you know you want to save time and save cost. So you want to look up for a solution where you can combine between various concept and various tools at the same keep cost minimum. So to make it easier, we split it into different categories.

So we have stream data, so this straight data is coming in this moment we want to add and we have the fast data store where we kind of like want to access data store somewhere. Obviously we want to minimize latency and you still need to have multiple tools, right? So the solution we're proposing today is a com company called he, this is a company I work for and it's open source so you can play around with it. Obviously you want to hand download so you depends which program language you use.

If you are machine learning engineer or data scientist, you don't want to learn program language, you can use SQL for it. So you are good to go. Most presentations focus only on one input, which is like, you know, it's fine if you want to write small application but it doesn't work if you want to scale it, right? So I want to have handle data from Kafka for example. At the same time I have some IT iot devices, maybe sensors, maybe like weather forecast and all coming into one platform.

So I want to handle multiple inputs, I want to do sim processing on top of it I want to do machine learning and from there I want to produce results. So the use case is quite straightforward, right? So what you want is to be able to scale it. And if I mention scaling, some of you assume scaling is about data or maybe, I dunno. So maybe you say I want to scale my application. So you mean process, but the reality is like you need to scale both, you need to scale compute as well as you want to scale data.

And even if you can do it, the problem becomes like if you want to minimize latency for these type of solutions is to make sure that your actually your compute is partition aware. What do we mean by this is you want to store your process as close as possible to your data. So if you have an application running on multiple data centers, for example, one in Germany, one in the uk, one in the state, and they have different sources, you want to make sure that you have your compute related to data stored in Germany, in Germany and so on. And based on different data centers.

So this is an example what we're trying to do today. You take your data from multiple sources and you want to write your machine learning on top of it. So you have multiple instances, obviously you use Python to write your model. And from there you actually can do the machine learning part obviously because you want to scale it. So you want to write it in microservice architecture which allows you to, you know, split your code into smaller parts and from there you can have multiple Kafka topics running at the same time.

So this will allow you also to access the fast data store, which is where we store it. And in order to do this architecture, you see here we have a multiple ingestion process. So you actually have multiple data coming from multiple sources. Could be Kafka, could could be iot device, and then you have some kind of job submission. So the job submission is essentially your machine learning model. You ingest your data and you want to make sure that when you ingest your data to have multiple machine learning scoring algorithms running at the same time.

Obviously to do this scoring is important because if for for some cases you have just one machine learning model, it's fine if you have again small data set, but if you have like large data set and multiple sources, you want to something to use something called composite score which allows you to run multiple algorithms or multiple machine learning learning model at the same time. So with that being said, let's just do this study here. So this is basically on 1 billion transaction per second. So this you have 1 billion events or messages and this is just on 45 node.

And the cool thing about it is the linear scalings. So for the demo I assume that you are here because you want to learn how you can deploy the security on here. So you have multiple options. So if you have just one single model, you can deploy it on your cluster. Now if you have multiple models, which we assume to do, you want to do in realtime applications, you want to run all these models in one structure and then you can't applicate this cluster into multiple instances in order to scale it up or down depending on how much data you have.

For demonstration today you can have like a on host deployment. So this is like a small application. So you can download the jar file for example, platform open source and you run it on your machine, it works just as fine or you can deploy it with something called site card. So you store it, basically you store it your machine learning model as close as possible to your zeca instance. And from there you can also create the something called separate form where you have your models running in separate data center and from there they can access all your nodes.

So for demo, just to explain the demo in simple terms. So you write an application, this application has set of log messages. Usually you have some kind of tracking mechanism for this, but you don't know if there is a security threat on this application or not or if you want to predict what's gonna happen in next step or not. So this is what we were trying to do today. In order to do this, if you are familiar with programming, you need to store your log messages secured in a map structure.

So map structure is just memory structure which uses key value pairs and from there you have the key for where is this log message is coming from. So the IB address and timestamp and the message is essentially how high security level you want. So warning or error or just information for example. And from here I don't want to just spend too much time on the code, you connect it to the cloud. It's very important if you want to have a secure application running for machine learning to connect it to the cloud.

And you can also query, so you can actually see your log messages coming for each log message, you give it a score, it's not important. It's same thing for your bank account. Your banks usually give you a credit score. So same concept here we can apply you see score here, we use this score to create a machine learning model. In simple terms is linear regression, but it could be model. And we have our data coming into this type of architecture. We define trend based on linear regression and from there you can basically create a map to predict what's going to happen next in your application.

So this is the architecture here. So from here you can see that you read your logs from multiple sources. You need to make sure that you secure these log messages not in Jason format or string format. So you need to apply encoding, usually use your favorite encoding or methods that you work with. And here you have your data coming from, I dunno, some Kafka topics for example, or some iot devices. And you want to ingest it into one platform. So you have shared memory. And in this shared memory you can calculate the trend.

So if you know machine learning and you know linear regression essentially it takes entry points from your, from your data basically. And we give it a score and we have a window applied to it doesn't matter this window. And we try to create a trend here. And this trend will help us to kind of like define what's going happen in my application next. So I want to make sure every time I run the application to predict if there is a security threat or or not, if there is a problem coming to happen or not. And from therefore each new message basically we get this prediction going on here.

Obviously we want to make sure we send the output in a secure format as well. So you can send it back to Kafka or you want to send it to file, you can store it or send it to a client for example. So the difference here is how you can do it in real time. So there are free solutions if you are interested because we're running outta time. I want to leave some time for questions. Let me skip here and give you these three solutions if you want to apply this type of, so you want to basically combine multiple data sources at the same time. So there are three ways.

First and second are essentially batch mode. So this is kind of like you have data stored or your log message stored somewhere and you want to process it and then provide some kind of order. But we're talking about is this part here, which is basically kind of like looking at the data in real time. So essentially you combine the data coming in this kind of moment. In this moment with the data coming from traces to provide context. And then based on these two different types, you kind of like look at the results in real time.

So just to summarize, if you want, you can basically take a screenshot of this or photo you want to store your data, first of all into the cloud. So your traces onto the cloud, doesn't matter which cloud provider you use and you want to make sure you have multiple machines. Obviously we want to be these messages to be secured. But in this case, if you're interested in performance, you need to make sure after you encode it to decode it's, and this is where you, if you are interested in Jason structure, use Jason. If you interested in speed, you use V.

You need to kind of have to have IMAP or map structure in memory to store your secured messages where you do some kind of rebalancing. And of course you need to define your eviction policy essentially because you are limited on space and obviously you need to consider security. So the security part come into this platform as a feature so you can use it, you don't have to if you don't want. So you can you bring your own security. So with that being said, because it's open source, so there is a community basically if you are interested in this type of realtime application.

So feel free to join or scan this call, secure our code. I will stay around here if you still have any question. And that being said, thanks very much. Thank you.

Wow, thank you very much. That was really, I have to confess some of, some parts of that went completely over my head to technical, but the very messaging that you can actually do 1 billion transactions per second in real Time. Can I get my mic? So large companies, for example, over Netflix, they run 10 times more this number. So you can imagine how many transactions you have per second. So there are real use cases.

Yeah, People. So every time you use a link in, for example, Twitter, Uber, Netflix. So this type of platforms and solutions they run sort of similar real time machine learning on their platforms. Obviously they have built in solutions for it.

Okay, well awesome. So if you do not have any further questions from the audience, let's just directly jump to the next presentation.

Yeah, thank you for.

Like this?

Don't like this?

Running Machine Learning Analytics On Traces