Big Data meets Security: Analyzing systems logs to understand behavior has become one of the main applications of big data technology. Open source initiatives as well as commercial tools and applications for big data integration, collection and analytics become more important building blocks of cyber attack resilience through better collection and analysis of very large sets of log and transaction data, real-time analysis of current events and potentially also prediction of future behavior.
Good evening. Thank you for the introduction and thank you for being here late this evening. We formed this partnership with COA Cole to join two competencies. We at B, we are looking at big data, data management, data analytics technologies for almost 20 years. Now, I found that the company end of the nineties, and we obviously had a great market in front of us. And still today, we see a lot of demand and you read a lot about it. And about every application area of digital transformation is obviously linked to data and analytics of data and copy a call being the specialist on information, security, identity management and so on. And the intersection here is obviously big data security analytics, and that's what we work on together. Making use of these complimentary expertise areas and to looking to look at big data security analytics, the basic two ways to, to look at it and, and coming first from the big data world, we see that if we work with companies doing big data projects and to see what problems and challenges they have, they actually report that data, privacy and security, one of the biggest concerns and challenges they actually have.
So that's a big topic here next to the skills they are requiring and typically are not really having here. And then going further ahead. If we look at the data that's analyzed in big data projects, this is a survey we ran last year with more than 200 companies. And actually we do that every year. We ask them, okay, what do you do? And here, the question was, what type of data do we actually analyze in your big data projects? Number one was the transaction data, which is very logical. I mean, whenever you look at customer oriented use cases in big data, or whether you want to optimize logistics or supply chain with a sensor data and so on you, you typically, you also always need transaction data, but then system locks with a 59% of companies saying that they already using it for different use cases, but security analytics for sure is one of them. So already today, if companies are thinking about big data projects, they are already looking at big data security analytics.
What we did then is we started a survey together with cupping, a call to take a, a look at what's going on in companies right now. And we started with quite a simple question. So is that actually an important topic for you? And you see here that today around a little bit more than half of the respondents here, we had more than 300 companies answering here said, well, it's important or very important, but I mean the, the better or more interesting figure here is the one that's. How about in the future? And here 87% said, well, this will be an important topic for us in the future. So that I think gives us some indication that this is an important topic to look at, but saying that something is important and actually doing something obviously is a big, big difference, especially in it. So we asked the next question.
Okay. But what do you actually do? Did you already implement big data security analytics measures? And here you see about a quarter of companies said yes, and another 17% said plant within the next month. So as usual, we see this big gap, high importance, but to do actually something to spend money, then we don't see that lot already, already doing something here. But just a third question we wanted to ask, is it worth why spending time and resources and money on this topic, big data security analytics. And what we are seeing here is the companies that are planning to do something to implement big data synchron they expect about 20%, 26% say, well, we expect high benefits, high profits from that. And more than half said, well, we would expect moderate ones. But comparing that to the companies that already did implement big data security, I think gives an interesting hint here.
What's about to be expected and what might happen because you, the one that's already used, it said, well, more than 53% that we've got high profits here, high benefits from using big data security analytics. So I think we have a pretty good indication here that it's worthwhile exploring this topic in more detail, we will present all the details and all the results of this study tomorrow at 11 o'clock and in, in a workshop. So if you're interested in learning more about what we found, how companies are using it and much more details, I invite you to, to come to that session and we take a closer look. So now big data security analytics. What I would like to share with you is our experience from the analytics world. How do we categorize that? How do we make a sense of all the different goals and disciplines we are seeing in the market?
Because typically there's quite a lot of confusion. There's talk about, well, we need to, to report and we need suddenly to have real time data. Then there's big data and small data and fast data and so on and so on. So we see there's some need to, to categorize and analytics and just want to show you quickly how we do that. Because often that starts a lot of things and helps companies to approach these topics. And first of all, it's important to see that there are basically two streams and one is looking at data addressed. So we have data that we have produced, and we would like to know what happened or why data's happened. That's the let's same, more traditional forms of data analysis. So want to have reports and analysis, and we call the tactical intelligence. And when we look at what might happen, so the predictive piece of this, where it gets really interesting and where we start to model, and we would like to understand things and use machine learning, for example, to have pattern detection, algorithms running on our data and so on, then we actually have a different type of analytics.
We call that explorative analytics, and I show you the difference in a minute, but it's often important to understand that there's, it's not the same thing you're trying to achieve here. And then if you look at data in motion, so that data that's still being produced, that's still actively, let's say coming out of system or is, is looking at events and at, at activities as they happen, we have about the same, the same type of, of activities here. But we are offering looking now at, at, at activities as they happen. So that would be rather monitoring task. And then typically it's very important to understand what is, what is really interesting around here, because typically you have too many events to be really looking at those. So we think about rules and alerts. And if you think about the, let's say the, the threat management, then obviously you, you look at something that you already know here, but again, here would also would like to understand what shall we do.
So how can we use the predictive piece, the models we might have built on this real time data to be able to respond quicker and not wait until something happened and then have the forensics, and to understand what happened to maybe build a rule to prevent exactly this incident to happen again. But it's rather might be interesting to have some pattern detection on what's going on and then find some action and prevent basically the, the threat or the bad things to happen. And obviously we have some interchanges here between the two, two lines here, which is obviously analysis can also be done on events. So we, we might sort out the events that we find interesting and that we want to store. And later on, maybe make a longer term pattern detection on that to find also dependencies over time. Not only in the small time window, we are actually looking at at the moment.
And if we want to automate the action on models, typically we build the models on historical data. That's the typical way to, to employ machine learning. Then we deploy the model, for example, into an application that's running real time and checking whether the model fits or not. So that often helps a little bit to, to see the differences and the, the application of analytics on data and motion. We would typically call operational intelligence because that's very close to an operational process. And while the process runs, we would like to understand what's going on here and organizations typically progress along this. So typically they start with the, let's say, forensics, the reporting, the understanding what has actually happened. And then they right now in the last years, they're really progressed into what might happen. So we see a lot of uptake on predictive modeling on usage of data, mining, machine learning algorithms, and so on.
And the same happened in the, in the realtime world, basically. So we typically see companies starting with a monitoring piece and then getting more advanced in being able to being able to, to analyze the data and me maybe even automate some actions here, because typically that's the ultimate goal because events things are happening so quickly that you typically have to take the human out of the equation. So typically you want an machine to respond what's going on and let the humans do the analysis, let the humans build the models, but once things are happening, you typically try to automate what's what's going to happen here. So now how do we do that? And as often, and in technologies, the organization issues are much bigger than the technology issues because we, for, for doing everything, I just mentioned, we actually have quite a mature stack of technology available, and that's not nothing new or, or very fancy in a way, but it's the big question is how to set that up.
And the main thing to see is that the biggest challenge for companies typically is get the tactical and operational intelligence working together with the explorative piece. That's where you build the advanced models and so on. And we typically work with a picture that we call the intelligence factory, because if you want to do reporting and, and monitoring on data, you typically want to have a stable system. You want to have trusted data where you can rely on where you can really trust that your decisions you make on these data are trustworthy. You want to automate those processes to be cost optimized, and basically have like a standardized reporting at analysis capability that produces high quality results. But if you have an explorative approach to data, if you want this so-called data scientist to build models, if to look at the data, to find something new, you have to take a totally different approach.
And here it's really about agility. It's about new insights. These people work in, in trial and error processes, which is unheard of in, in, let's say traditional it processes where you basically have to define a budget and an outcome and do a requirements analysis and write everything down. And then you build the system to specification. That's definitely not possible here. And that's where a lot of companies right now struggle to get this in line because you have a totally different approach. Organizationally, you need a different type of process. You cannot build an ROI upfront. You don't simply don't know, and you have to, to take this totally different view and approach to how you actually build explorative models. And that's really more a lab approach. So companies that run labs in, in their development process, for example, they typically are much better at this because they once understood that this is a different type of process, but we see that this now being transformed or, or used in the data field is very, very beneficial.
But you have to make sure that you are enabling this different approach because otherwise you will possibly will not be very successful here. If you look at the technology landscape, as I said, it's quite mature in many ways, but there are also some new and interesting things happening. And what we see on the data management side of increasing a popular are so-called data lake that store a lot of detailed data. And, and typically now built with Hadoop. So that's the technology that's actually not so around, not so long, it's a patchy open source project, but it's, it's really gaining a lot of attention. And it's already almost part of the, the standard repertoire in a, in an it architecture. And what you can do here is with pretty low cost. Let's put it that way. You can store a lot of data. So that's really a big data technology in that regard.
So a lot of detailed data, for example, the locks, you can store them for longer time. That's quite interesting. If you want to look at pattern detection, or if you think that maybe more advanced thoughts on your, on your systems might also happen on the long time that maybe someone implants, implant, something waits for quite some time, and you are losing that data over that time. And you cannot really go back maybe and take a look what, what has happened before. And so you have technology available now that actually makes that happen. So you can store a lot of detailed data for quite some time with costs that are still okay for many companies. So that's what we see. Little bit of warning here. Technology is quite immature. That's one of the new ones. So you'll definitely have some issues if you try to really run it.
So our advice today is often to, to maybe rather let someone run it for you, because if you're not really very much into a development oriented organization, you will definitely end up with a lot of things. You, you want to have to do yourself that you might not want to do. But anyway, it's, it's very interesting. Technology streaming systems are maturing over the last years. Quite a lot complex event processing has been discussed and, and technology implemented for quite some time. And now if, if we move towards more realtime data, this is one of the technologies that's, that's really working here. And then we have the so-called no SQL databases. It's like an umbrella term for all databases that are non relational and they can be used in quite specific use cases. That's typically how they're being used. So if you have specific types of data, you might look for example, at key value stores or column stores, white column stores, or whatever.
If you have more XML data, you might look at document databases, or what's also quite interesting. If you look at fraud detection to take a look at graph databases, we have seen very, very interesting examples of totally new insights, new detections of fraud on the same data, which you couldn't achieve before with the traditional methods, because you simply couldn't see the graph, the dependencies of all the objects you wanted to look at. And on the analytics side, we definitely have a very broad mature offering of BI tools, visualization tools, and advanced analytics tools. That's quite an old category in, in it. It's basically relying on, on methods that are 30, 40, 50 years old, or even older, if you think about statistics and the tooling has been around also for, for a long time. So that's there. The big question is how to make use of it first to get the data and then make use of it.
Then quite a lot of actually increasing amount of tools and applications for this specific use case security analytics, vendor lock files, analytics, for example, in the exhibition, you you'll find some vendors here, like securonics Babi, maybe the big ones have some offerings, the big vendors. So that's something to look at. And then we have applications for quite specific use cases. So if you think of a specific use case, often there are also specific applications or vendors around. So quick view from the big data security analytics expert. First of all, we can say from the, the experience and also from the survey, we did companies re report good benefits if they're employing it. So I think it it's worthwhile to take a look at it. You can categorize how to approach this. I think with the model I I've shown to you with the framework, what type of data you want to look at, what's basically your time dimension.
Is it the past? Is it the present or is it the future? And then you can take the different approaches. I mentioned that there is powerful technology available, and there has been some advances also over the last years that maybe really open up new application areas and new possibilities. And then often the organizational and process point of view is sometimes a bigger challenge than the technology, which is often there. And we see this as a key success factor to be able to, to understand that there is an explorative approach that's totally different from a more factory approach and that the, the key success factors, the ability to bring that together and make a working model out of that. That was that. Thanks a lot.
Thank you very much. Thank you very much. Excellent presentation giving a very good overview on the topic. Can we see the question slide because there's two questions. There are two questions, which I think are relevant for this audience on, and maybe also representative to some extent, okay. Since you are kind of coming from a different area or slightly different area, there you go.
First, the first one is very interesting. One, one of the challenges for incident response teams, that's one of the, the options that you described is that they have too much data coming into their systems, too many false positives and not enough skills and not enough time to deal with alerts. What is your view?
But it fits exactly into the model I've shown to you. So first real challenges to narrow down the data and look at the data. That's really interesting. So that's, you need some rules or possibilities to do that. And if you look at the application of machine learning algorithms to define models, or maybe even rules, then that's the way to do it. That's helps you in learning from the past. What has been interesting, what might be interesting to find the patterns that indicate interestingness, and then you can basically narrow it down.
Okay. And the second question is also, that's, that's another question it's is looking not at big data of security. It's looking at security of big data. Yep.
As I said, big challenge, almost half of the respondents we had in the service set. That's one of their biggest challenges. And we have several levels of challenges here. One is what can I actually do to secure the data more from the technical side, but then it's also about what can I actually do with the data, what I'm allowed to do, because these data links, one of the ideas is to bring everything together. Then if you want to make, really use of it and, and tackle the interesting use cases that for example, combine data from a lot of data silos, then you have to, to, to grant access rights to all of that data. And we've seen a lot of companies where actually now users get access to way more data than they usually head in their typical transactional and BI systems just by being part of the digital transformation initiative of the company. When they suddenly got access to the data that where, or the data was. And that's quite a, quite a good challenge we see right now what's happening.
There's one has been voted up. What about the data scientists? Is this, is this something you would propose or to recommend to install as a permanent organization?
Yeah, that's the question right now, as I said, organization is the topic. So for, for many organizations, it's either it or business, it's a real challenge to, to get this explorative approach going. So they often start with this lab approach because it's a distinct organization outside of existing structures and there you could get the speed. They don't have to comply with all the, the standard processes and so on, but long term, we see that the struggle is to operationalize the results again. So to come back out of a prototype, then a model, you build to get it back into production. And so I think the long term view is rather to find a way to integrate that more closely, but you, I'm pretty sure you need the skill set. So we see more and more applications to have more advanced analytic capabilities and the organizational professional, what's the best way to integrate those people.
Okay. Not so easy. And I know that there's, there's not so much focus on university education on data science so far. So the last question since, because of still have one minute, I think do your, the question is do your clients trust integrity of the system event logs that are foundational security analytics tools or put it the other way around what are means for validating the, the data for, for elevating the trust or for improving the trust that you have in the data,
The trust and data, the,
In the source data.
Yeah, yeah. As usual, the data quality problem starts at source. Again, the answer is more organizational than technical. We do have means to cleanse data. We have, we have pretty advanced means also by using, by the way, a trans analytics and machine learning to profile data. So that's, that can be done so you can understand quality better, but to, to have better quality, obviously you have to change it at the source. And there you typically, that's more an organization measure because it's people that are, are either entering the data or defining the processes or handing the systems and so on.
Okay. Thank you very much. You're welcome for those who are interested in having learning more around this topic, there will be a session tomorrow morning if I'm right. 11, where you can go and see whether it's going to be interesting for you. Thank you very much again. Yeah. Thank you.
How can we help you