KuppingerCole's Advisory stands out due to our regular communication with vendors and key clients, providing us with in-depth insight into the issues and knowledge required to address real-world challenges.
Optimize your decision-making process with the most comprehensive and up-to-date market data available.
Compare solution offerings and follow predefined best practices or adapt them to the individual requirements of your company.
Configure your individual requirements to discover the ideal solution for your business.
Meet our team of analysts and advisors who are highly skilled and experienced professionals dedicated to helping you make informed decisions and achieve your goals.
Meet our business team committed to helping you achieve success. We understand that running a business can be challenging, but with the right team in your corner, anything is possible.
So the, the topic I want to talk about is one that is near to, to my heart in a way, because I think we spend a lot of time talking about technology in cyber, and we spend two little time talking about outcomes, and I'm a firm believer in KPIs. I'm a firm believer in, in that you have to measure in order to optimize and, and minimize negative impact. And I sort of want to give you a way to talk about the positive outcome and to discuss it and to look at decision making in the light of those outcomes.
And I'll do that by talking about incidents and responding to incidents, and one of the, the ways that we deal with them. So to recap, what an incident is, incidents in computer security are any adverse event that has a negative impact on any of the security goals, that's confidentiality, integrity, or availability of compute assets or information. And in many cases there's actually a fourth goal. And that safety, maybe that ties into the talk you just heard, but definitely that's an outcome of, of information assets today.
A few weeks ago, there was a case at he, he university in Germany, where they were suffering a ransomware attack and had to basically divert incoming ambulances to a different maximum care facility. And that eventually through a chain of events caused the death of a, of a patient that was being rolled into a hospital that was 40 kilometers away. So even when what you're producing may not seem like your, it is connected to safety, we're all part of larger systems. So incidents can have negative outcomes in very real ways. And I think that's one of the reasons we are in cyber.
And one of the ways that we can, we can think about it in about the significance of the work that we do and incidents are the practical manifestation of risk and risk is a term that you are all familiar with. You have four ways of dealing with risk and risk can be expressed as a, as an expected loss. Basically we expect expected annual loss is the cost of an asset as it is implied in an incident how many assets you have in that class and how often the negative event affecting the asset occurs.
So that risk manifests and the way that you can deal with, with incidents or the way that you can deal with the risk is you can either avoid it, like stop that business function, sell off that business unit. You can transfer it, you get insurance, or you outsource, you can accept it. So you accept a residual risk at the end of every risk management process, or you can mitigate it. And mitigation of risks is really what we do in, in cyber. When it comes to security controls, we can minimize the impact through the proper application of best practices and cybersecurity controls.
We can minimize attack surface again, by following best practices, by doing things like proper patching, by making sure that we configure firewalls the right way. And we can minimize the probability with all of those mitigation measures together. But when we minimize the impact, that's how we touch all of the incidents that remain even when you have accepted the residual risk. So if you need to do business, you have made some level of risk, acceptance decision. And that's what we then deal with in incident response.
And that is the practice of systematically handling computer security incidents in a prepared systematic fashion. And that's been around for 50 years. So that's not really a, a new thing that has been an established practice. We have C certs, we have CDCs, we have SOS. We have all sorts of, of preparedness in, in organizations that helps us perform incident response duties.
Now, these generally our organized in three phases, there's the first phase where it is before an incident occurs. That's when you set up the teams, you minimize the risk by applying proper controls and procedures. You prepare to manage incidents as they come in, then you handle them during their life cycle. So during the incident per incident, you start with something negative occurring, and then you follow through until you eventually contain it and have regained full control.
And then there's the post incident phase in which you'll learn from the experience you just had and improve based on the lessons learned. And the time that I want to focus on is the time spent in the incident from start to finish. That is the end to end time to contain the time that you spend until you regain full control and can have confidence and the trustworthiness of your information assets again, so that you can go about your business without accepting additional risk in your environment.
So it's this per incident timeline that I'll focus on, and we can break that down into individual phases as well. So I'm following the GC model of incident response here, which starts with the negative event that negatively affects any of the confidentiality, integrity, availability goals, which could be an attacker, could be something else could be malware, could be whatever incident. And then the next thing that happens is it is being detected. And that phase can take an awfully long time.
There's 56 days given as a number by, by Manian for last year, there's 197 or so days that PON Institute determined in, in 2018 for data breaches in the us. So, so this phase may not sound like it's long, but it can be the longest. When you look at the Heinrich high university case, the compromise of the environment predated the ransomware incident by about six months. So for a period of about six months, apparently they were unaware that the environment had been compromised. So cutting down every individual metric will eventually result in having a much shorter time to contain it.
Time to detection is at the very beginning of that journey. Then you have an automatic response that you enact something like an antivirus system, doing something you have alert that is being sent to your team to your, so that you then need to deal with and identify is this real? Is it a false positive? Is there any follow up required? Is there an investigation that's needed or not? And then you have manual response steps that may need to be taken things like cleanup work, regaining control, making configuration changes, anything you need to do to regain control.
And then there's the hopefully last step. And that is verifying that you were successful in containing the incident. All good that you had a plan and you executed on it, but without verification, you don't know if it was successful or not. Like if you think about the, the incident I mentioned, how do you make sure that there are no further back doors in the environment? How do you know that they don't have any persistence points anymore that they can use to adverse the effect outcomes for you? And then eventually you're at the end of the journey.
And what I want to look at is breaking this down into individual aspects. What does most influence each individual face?
And the, the amount of time spent in those? So this is about wall clock hours. It's not about the time spent by your team, but obviously wall clock hours and time spent by humans have a relationship. So time to detection, one of the, the biggest negative influencers is when you require prior knowledge in order to detect an event, if you look at what antivirus has done over the past 30 years, then they have enumerated lists of badness and then distributed them. And then you could find known badness in your environment.
And whether you call that indicator of compromise or signature or pattern file or something else, doesn't really matter. It's the distribution of prior knowledge of what the negative event the incident looks like so that you can discover it. Autonomy is on the other side of that scale. Autonomy means you can look at the actual behavior when an attacker acts on their objective and understand that it's an attack without needing to know how it occurred in order to be able to detect it. In other words, you can be the first organization that's being hit and you can still detect it.
That's also a difference to taking things that do static analysis before execution. This is looking at the actual behavior in your environment. So that's autonomous detection, as opposed to prior knowledge based then once you've detected something, there could be an automatic response. And in a lot of organizations I talk to in a lot of security operations centers, automatic responses are just not a thing outside the simplest incidents. And anti-virus one of the biggest reasons is organizational not technical.
And that is that there's no acceptance of business disruption, any risk to availability as a result of automatic responses. And I think you cannot have the benefit without accepting a very low level of risk of interference with a legitimate course of the business.
So when there's a false positive, and you have an automatic response set up, that can be disruptive to a part of the business, but you can mitigate that with organizational procedures, swift follow up, reducing the scope of those responses, et cetera, in order to reap the benefit of having an automatic response within milliseconds or seconds of an incident occurring and starting rather than requiring round trips to humans. But one of the biggest influencers there is that you need to have a high signal to noise ratio.
If 90% of what you detect is irrelevant, then you can't tie automatic responses to it, or your business will go bankrupt. So you need to have the opposite ratio. You need to have mostly true positive detections. Then you need to know about it. You need to receive the alert as we work from home at the moment, welcome to my house. And a lot of us do, we don't often have a VPN connection to our head office very often, that is not the case.
And if you rely on needing that in order to have the telemetry and knowing about the alert, it can take many hours or even days until you know of a security incident. So having that always on connection to your security controls is vital. In order to cut down on this time, they need to identify what to do with the alert, and if there's just too many alerts to handle, or if they're atomic and not tied together, say one, attacker does one thing, but it causes 50 alerts.
Then it's going to be really difficult to put those puzzle pieces together and take a decision from it, make a decision based on it. So correlating it, turning it into actionable decisions, decision supporting evidence. That's really important here, as well as the prioritization and grouping of incoming elements. And once you decide that an investigation is needed, make sure that you have all of the context.
If you need to assemble all the puzzle pieces yourself and search for the puzzle pieces as well, rather than getting a preassembled 80% complete version of, of the puzzle, then your staff is spending time on work that really should be done by computers. And that can save them a lot of time in the process. And if they have to go to different systems to understand what's happening in their cloud, to understand what's happening in Kubernetes and what's happening on, on the fleet, then there are wasting time doing all sorts of context switches between these different systems.
So putting it all in front of them immediately is really helpful in order to make the more effective responders. And then eventually you may decide that you need to manually respond, that you may need to clean up in order to recover, that you may need to remove persistence points that you may need to shut down accounts, et cetera, that there's a lot of work that that needs to be done in order to, to deal with the attack. There's lots of approaches that, that rely on interactivity in that stage. So that require humans to write scripts, to develop code.
And then like with any development type activity, there's a cycle, it's something you write, it's something you test, you run it and then it doesn't work the first time. And then you end up redoing it and refining it. And there goes a few hours of work hours that the attacker can continue on their path because you're unable to contain them. But if you have a recording already of the steps that the attacker took, why not just have an undo button that allows you to undo what they did, but have a human decide when to use that and when to apply that knowledge.
So humans are still very much in the driver's seat for understanding the, the big picture, understanding the attacker motivations, what assets they were after, et cetera, but they shouldn't be wasting their time, especially the highly qualified people in, in, in the, so the Analyst there on basically writing code in order to do things that can easily be automated and the same goes for verification.
When you verify that you have been successful in your response, if that requires a lot of manual work, if you were relying on, on scripting to do it or on manual configuration work, then it becomes really difficult to find out if you were successful or not. It's also difficult to find other persistence points and to find all of the ways that an attacker has been in the environment when you don't have long retention spans.
If you think about the incident I, I mentioned earlier, which is just a, a current example from, from Germany, then if they don't have data to go back eight months, nine months to the past to find out what other things the attackers may have done, then it's very difficult to confidently state that you have truly thwarted the attack and that have truly contained the incident and their access. And if new detections come in, they need to be correlated to those existing incidents you already dealt with so that you even understand if you are successful or not.
So automating that verification is an important aspect of cutting down the time to containment as well. And then eventually you reach the end of your journey and you have contained the incident.
Now, one of the things that I haven't really mentioned is I haven't spoken about technology at all. I haven't mentioned any of the wonderful words you see on this side or that we use in a lot of the conversations at a technical level. And I kind of think that they're irrelevant. I probably shouldn't say that working for a vendor, but whether it's magic or some incredibly interesting AI model, doesn't really matter as long as it gets the job done.
But I think there are about five criteria for decision guiding criteria that allow you to, to understand what the factors are that go into reducing the time to contain that you can sum it up in, and none of them are technological. And the first one is automation it's to prioritize machine speed over human speed. Whenever you have a chance and it's realistically achievable because humans don't scale, humans need to sleep. Humans don't want to be called out of their, their homes at on the 24th of December at, at 6:00 PM.
So automate what you can automate in terms of responses to incidents and anything you can do in milliseconds on an end point, doesn't require going into a queue, being looked at by a human a lot later. So it also reduces their loan prioritize, autonomy, prioritize technologies, and approaches that don't have to rely on prior knowledge of your adversary and of the techniques that they will be using in order to detect and respond to them.
That's really what I mean with, with autonomy, prioritize correlation, as one of the things you demand from security solutions, make sure that you get the full picture and that you're not being handed just puzzle pieces. And then you can work with those puzzle pieces and a qualified individual or set of individuals to assemble that into a full puzzle, but make sure that the tools that you choose assist with that capability and provide you a preassembled puzzle and aim for an end to end integrated process.
That's really a, a concept where you start from the simplest incidents to the most complex, but you handle them from beginning to end, ideally in one process and not with many different teams that need to be involved. And that introduce further delays. And ideally you can achieve that by doing it in one platform that covers the whole estate for a particular security function, because the benefits that you will gain from that synergy are extreme in terms of the time to contain that it rewards you. And I hope that was thought provoking and useful to you.
I'm looking forward to your questions and, and hope you'll have a great event today.