Webinar Recording

Unstructured Data – A Blind Spot for GDPR Compliance

Name: Unstructured Data – A Blind Spot for GDPR Compliance
Uploaded: 2017-09-22T12:00:00+02:00
Duration: 1 h 18 s

Posted on Sep 22, 2017

GDPR will apply to all types of systems where personal data resides. That goes beyond traditional database, CRM or Identity Management systems: Emails, spreadsheets and text documents, PDFs and images, web pages and data collected from social media are only a few examples, and they are everywhere in the organization. All of this might and will contain PII (personally identifiable information), including systems like Microsoft Exchange, Office365, SharePoint, Skype, OneDrive, local folders or IMAP-accounts. Achieving compliance requires an adequate approach for data governance, but many organizations do not have a data governance program in place. Applying data governance to unstructured data is an even bigger challenge, as technologies are not prepared to handle the data-centric approach to the upcoming EU regulation.

Show description

Webinar presentation, STEALTHbits

Webinar presentation, KuppingerCole

Speakers

Matthias Reinwarth

Head of Advisory
KuppingerCole

Jonathan Sander

CTO
STEALTHbits Technologies

Lead Sponsor

Show Transcript

Good afternoon or good morning, ladies and gentlemen, welcome to this Ko, a Cole webinar, unstructured data, a blind spot for GDPR compliance. This webinar is supported by steal bits. The speakers today are me. My name is Matthias ARD. I'm a lead advisor and senior Analyst at Koa, a cold and later Jonathan Sanders, CTO of steals technologies will join us for the second part. Before we start some short information about a coal, the obligor housekeeping notes, and a look at our today's agenda. A few words about cooking, a coal.

We have been founded in 2004 and are headquartered in Germany with a team of international Analyst spread across the world, including the us UK APAC. And of course, central Europe. We offer neutral advice and expertise and various areas to companies, to corporate S to integrators and software manufacturers. We started out with IAM being the original starting point, but we are now working in the areas of information, security, GRC, and governance. And generally speaking, we cover all the important topics in the areas concerning the digital transformation.

A short look at our business areas, promise it's very short. It's the three areas of research where we provide a range of strategic documents and reports, including our leadership compass, which compares vendors and market segments. We do events and we will get back to that in on the next slide. And the third part is advisory where we provide support for our customer organizations, and we are happy to call ourselves best in class and trusted advisory partner there. I promise we get back to the events and these are a few of the upcoming cooking, a cold conferences.

Actually, these are not three but four. We have embarked already on the consumer identity world tour with an event in Seattle, which has taken place just a few days ago. We will continue that in Europe, in Paris. And we will finalize that in Singapore at the end of the year, we will have the next generation marketing executive summit 2018 in Frankfurt, which will take care of the upcoming trends in marketing and the same we will do for the digital finance world, where we cover all areas, which regard FinTech and PSD two and all the topics around that.

So that will be an interesting event as well. So these are the upcoming events that we will be doing some guidelines for the webinar. All the participants are muted centrally. So you don't have to take care of that. We control the mute and unmute features. We will record the webinar. The podcast recording will be available tomorrow, along with the slides as PDF versions. And we will have a questions and answer sessions at the end.

So that means that you, as the participants are expected, of course, to enter some questions through the questions feature that you have in the go to webinar control panel just on your screen. So, and, and at any time that you feel this should be, there should be more clarification about one topic regarding Jonathan's or my part, please just type in your question and we will take care of that in the final part of this webinar. If you look at the agenda, it is three parts. The first part is the part that you're just watching. And it's this introduction.

And my first short talk about key requirements of the general data protection regulation and their relevance for unstructured data. And then Jonathan center of stuff, Bitz will take over for another 20 minutes or so, but I'm talking about data ownership, at least privilege, GDPR and unstructured data. The real deep dive into the, into the challenges that come with GDPR and the unstructured data issue that many organiz every organization has.

And the third part, again, something like 20 minutes will be questions and answers where we will look into what you provided on at questions through the go to webinar questions panel. And I will get back to that again and remind you just to not forget to add your questions there and that's it for the introduction already. So let's start with my first part, the key requirements and a very, very short wrap up of what the GDPR actually is, because I think that many of the participants of this webinar already know the GDPR is just to get a common baseline to talk about.

So if we look at the GDPR, we have, yeah, a three step approach here. We have a, a boxing before the GDPR. We have two years of implementation and with the GDPR. So just to give an overview before the GDPR was designed, the European union general data protection regulation, there was a data protection law within each EU member state, and every EU directive that was issued had to be transposed into each national legal system. So it has to be, had to be made a law that need what was, was to be changed.

And that is to the right, when we say with the GDPR, it was mainly aimed in the first place at a harmonization at EU level. Of course, as all the EU members, state data protection laws were a bit dated. It was to deal with technological developments and new factual situations. The main thing that was in focus was the strengthening of existing data protection standards, and quite interesting, especially to our non-European listeners. It is aimed at binding business established outside the EU to European standards when they operate in the EU. And we will have a look at that shortly.

And in the meantime, there was two years of implementation. That means the GDPR actually went into force on 25th of May, 2016, and it will become applicable on the 25th of May, 2018. And it will be accompanied by national laws. And this two years period was especially also meant to give organizations a chance to achieve compliance.

And if we look our at our calendar, and if we look at the 25th of May in 2018, and we use our favorite calculation tool, then we get up to 246 days to go starting today, which is not really much time left to start work or to complete work or to achieve and provide evidence for compliance. If we look at the short overview of the data protection principles that are behind the GDPR, then they are quite straightforward.

And from, from a user's perspective, they seem perfectly okay and, and fabulous for protecting the, the data data privacy and the security of data subject that is citizens. So from lawfulness fairness and transparency to purpose, limitations, storage limitation of the time and the, and the actual purpose that data is allowed to be stored and processed data minimization. So the reduction to the core of what really is needed of course, accuracy. So if you store something, it should be correct.

That is a good approach, integrity and confidentiality confidentiality, to make sure that the data is, is, is safe and protected. And in the end also accountability that needs to be somebody who is accountable for storing this information for processing this information. We are talking about personally identifiable information about data, subject information, so that everybody who takes care of this information needs to be made accountable. And that might be the data owner or the data processor, which might be different parties in that game.

And this results to a set of key provisions that I'm just listing here and just mentioning shortly to make sure what is all included in the GDPR. That is something that people have to comply with organizations have to comply with that they need to implement. So many organizations will need to appoint a data protection officer. This is not true for every organization, but every organization should try to find out if they need to, and if they need to, they should to comply with the GDPR. As I mentioned before, it's globally applicable.

That means that every organization that stores and processes information about citizens of EU for all of them, the GDPR applies directly. So it doesn't matter where the organization is actually located. If you do business with European citizens, then the GDPR will apply to you.

You need to make sure that there is adequate measures in place to make sure that data breaches are detected and that various stakeholders are notified for, for sure there must be a notification towards the data protection authority that is associated with an organization, and that needs to be done within 72 hours, which is not too much time. And if the data is very sensitive, that has been leaking, it might be necessary also to, to notify the data subjects themselves.

So to get to every citizen, to every data, subject, to every person that is stored within the system that has been, that has been leaked information within the, within the process. What you need to make sure is that there is consent or a comparable legitimate ground for processing the data that might be a contract, but very important is the term consent.

So real, really the statement by the data subject to make sure that this type of, of, of processing and storage of data is actually agreed to. And that is very clearly defined how that needs to be documented. I'm not going into detail here, but it's quite an effort to document consent life cycle management at that, at that point in general, the rights of the data objects have been extended and dramatically. And that leads to, for example, this right to be forgotten.

So the, the, the right to have data deleted, which is no longer needed anymore, many other data, subject rights, including export of data in a structured format. An interesting concept is the concept of the one stop shop. That means once an organization that is acting internationally, that an organization should identify their main location, their main country or main residents to understand which is the main data protection authority, which will then take care of all subsidiaries of an organization.

And finally, that is something that is most probably well known are the administrative fines, which are quite substantial, and which go up to 4% of groups annual turn over, which might result in quite a large amount of money once a breach has been detected. So all this, just to give a short basis for what we are talking about when we're talking about GDPR, and that is quite some work to do for many organizations to achieve compliance and to provide evidence for that compliance.

So we as coping a cold tribe to understand what can we do to help our customers, our readers, our clients, in understanding what to do for the GDPR. And so we produce a, a short what we call leadership brief, a short document, which actually aims at how can we help people understand what needs to be done when it comes to the GDPR. And we identified six key steps. And I think that that will be something that Jonathan will look into when it comes to unstructured data as well. So we just have had a, a first approach at dealing with this topic.

So the six key actions begin with discover the PII data. So personally identifiable information needs to be discovered. So that means you need to identify all systems within your organizations that actually store and process this personally identifiable information. And that already can be a challenge, but unless you have done that first step, you cannot continue in any direction because first of all, you need to know where the data is and how to deal with it and where the data is when it comes to unstructured data, which will be a strong challenge. For sure.

Once you have identified the data, you need to have a, a, a means in place that controls access and controls access to all these places where this PII is stored. And this control of access needs also also to be applied according to given consent, to, to contracts and every potential legal ground that applies when it comes to the actual right to process and store this data at all manage consent. I've mentioned that before, make sure that you have consent at hand and well documented when it comes to storing personally identifiable information and to processes process it.

So that needs to be well documented and needs to be shown as proof when it comes to processing data. Of course, you need to do that. Not only within your on premises systems, which are closer to many organizations more well understood, but it also applies of course, to cloud services everywhere where you store information, that that might be outside the premises of your organization in cloud services and manage services in outsourcing services, prepare for a data breach, make sure that you are really ready when it comes to a data breach.

Of course, you will do the utmost to, to prevent it, but you need to be prepared for a data breach because it might be that the 72 hours are very quick over where I'm very quickly over, and you need to understand what to do when it comes to a data breach. And the final recommendation that we did in that short document was really to implement privacy engineering, to make sure that security by default and design and privacy by default and by design are implemented within your architecture from ground up, which is not an easy thing, easy thing to do.

We know that, but this should be an approach that helps you in understanding and creating an infrastructure that is GDPR compliant from ground up. That does not mean that infrastructure can be great, GDPR compliant. It's always processes plus infrastructure, but you need to implement adequate technologies, processes that achieve GDPR compliance. We're talking about structured and unstructured data today. So we need to understand what is different when we are talking about unstructured data.

So, first of all, of course structured data is something that we all know, we know databases, we know elder directory, surface, active directory. We know structured file formats like CSV files. And they are, as I say, above organized and discrete, and this is something that can be more or less easily thrown under a regime that applies to, to access control. So you can really say that column is only visible to administrators, and that is usually very simple to do, and you can assign access to structured data more easily than you can do it to other data.

And that is something that Jonathan will look into later on, we are talking about unstructured data, which is difficult to organize, which is in unmanaged content repositories. And if you think in the back of your mind of your own organization, all the systems that are around you will surely identify very quickly unmanaged content repositories that may be shared systems, shared file servers that may be, might be SharePoint, nested structures.

So a, a mail within a, or a document within a mail file, general mail, date, data, binary, data videos, unstructured file formats, everything that comes with file storage in general. So all of this is much more difficult to, to understand and to control. And when I said before, understand where your PII is when it comes to mail data to nested structure, to, to, to mail folders, which are spread across different service, and maybe even something like cloud services, Dropbox, or something like that, that might be very difficult to achieve, to get GDPR compliance here.

And I hope that we get some insight here from, from Jonathan as well. So if we do a simple example to illustrate that a bit that might show the complexity, that really is behind that. So we start out with the, with a typical marketing organization, a marketing team within an organization that really does something which is, which might be well covered by consent or a contract or the right tick box within the, within the consent form.

And, and they create based on that given consent and the information that is available, they create a target customer list. And that is handed over to a team member who should work with that target customer list. Well covered through consent, everything fine. And because he's asked, he will use this file and put it into a mail and send it over to a colleague because the colleagues is working at a sales department and he thinks that is important and, and useful information. I might want to use that at well, that at well, that at well, and now he comes up with a modified customer list.

So this colleague, he or she, she will add some information. She will change it. She will delete something and add something. So we have a modified customer list. So we have a different purpose, different file, and she, or he worked with that file. And once that is done, and the work is done, that was not originally covered by the original purpose that is maybe stored in a sales SharePoint site as a file within that folder that is related to, to the sales team. So if we look at that process, we understand that we have lots of new usages of that personally identifiable information.

So first of all, we have PII in the Excel file. We have PII in Excel, in mail, in exchange. Then we have a local copy. And if we think of how we do work to today, that might not be a local copy. That might be a, a copy that is actually located within Dropbox or one drive or Google drive or somewhere else. And which during the process of just storing it there, left the European union and is now in the us because the services located there, we have PII in Excel with change purpose, and maybe changed content. We have PII in Excel SharePoint, and that is, of course, it's called SharePoint.

It's shared by link and it's, and, or it's shared by default. So the team can have a look at it. And this colleague might send around a link to that personally identifiable information. So we see that this information that was originally well structured and well covered by consent in that Excel file over time was com was used in various other compositions. And it's now stored in what we call unstructured systems.

So a file within SharePoint, it should be considered or a file server or anything else like that should be considered as unstructured because it is not well covered by access control. And if we look at data breach, for example, the information that was stored in the Excel file is now subject to the threat of a data breach in the SharePoint server. So if you look at the requirements that we initially said, we have some very severe issues which come with that very usual way of processing and dealing with data.

So we have the issue of consent, which might be given for the target customer list, but surely not for the sales list. Transparency is very difficult. So if a user, a data subject says, Hey, I want to have a, an insight into where you store my data across all your systems that might be not possible to give the, the, a complete and consistent answer for that. So the rights of the data subjects delete this information might be difficult to, to execute actually data minimization is surely not, not something that we're looking at here right now.

We are not looking at storage limitation because nobody will take care of deleting that file on the sales SharePoint side, most probably purpose limitation. This is something that we talked about already, who will be held accountable for that. It's difficult to decide and what happens in the case of a data breach. So these are some strong issues that come up with that way of dealing with data within an unstructured environment. So if we look at my final slide, so we have a lot of tasks and challenges, and they get much more complex and challenging when it comes to unstructured data.

So, first of all, as I said, assess your organization, understand where the information is and do everything that is required when you look at the processes and the systems, and that GDPR compliance into your systems. That is something that is really important because you have to have it into, within your business processes and in your it systems. So organizational plus technical, you have to implement content management. I've man. I mentioned that before, and it's really essential, and it should cover every usage of data within the organization.

No matter whether structured or unstructured storage is behind that, prevent the breach, make sure that breaches don't happen for example, through encryption, through strong access control, but prevent it, but prepare for the breach as well, as I mentioned before. So if something happens, it should be detected in a timely manner and notification should be available for all required stakeholders. And pre-canned communication should be available just in case this happens and demonstrate compliance, which might be the most challenging part of that.

So not only do the right thing, prove that you're doing the right thing, and that is true for everything that you're storing regarding personally identifiable information. So there should be framework controls and measures well implemented. And that is something. If we look at unstructured data, which will be really challenging. And as this is my last slide, I would like now to hand over to Jonathan Sanders of STBI that's technologies to continue having a more deep look into what happens actually and how technology and processes can help when it comes to unstructured data.

But I want to first remind you to add your questions. If you have not done already to the questions panel within the go to webinar software, just to make sure that all questions that you have regarding me are regarding Jonathan are well prepared for the third part. While we now hand over to the second part. All right.

Matthias, can you hear me? Yes, will. Thank you. Excellent. All right. Thank you for that very useful summary. I actually love the way you talk about unstructured data, cuz that is almost identical to the way I identify how unstructured data ends up filled with this sensitive information.

You know, the typical workflow of people taking controlled data applications and trying to do their jobs well, but creating risk with files stored all over the place. So it's, it's excellent. So for my part, I'm gonna attempt to answer three sets of questions.

First, there are a lot of questions that we had here at steal pits when GDPR came down the line and in order to help us get answers, obviously we went and spoke to a lot of experts, some of them here at Kore Cole, but we also went and spoke to a lot of you folks, and I'm gonna share some of the results of a survey we performed for that purpose. I'm then gonna answer the most common questions we get asked about GDPR in relation to unstructured data. And then I'll show you how some of our customers have answered the questions they've been posed by meeting GDPR needs.

And then I'll hand back to our forensic KuppingerCole to take us through the final part of the webinar. So this slide here covers the top question, right? How important are people thinking about GDPR when they approach it? And as you can see, most people considerate a priority, but only a little over 25%, put it in their top three, which I find interesting this survey was done this year. So it was actually very much in the now and it was interested to see that result. As you see, by the way, on the one side of the slide, we talk about a little bit of the demographics involved here.

It's 520 organizations represented from all over the world at many different levels in many different industries. So this was a very wide sample. If you wanna see the whole survey, I'm gonna go through a couple more pieces here, but you can go to St spits.com and grab a copy. We're giving it away for free.

Now, another thing that was interesting is when we broke this down by the industries that replied, we definitely saw some breaks here where there was some that were much more focused on it than others, interestingly, technology organizations. And that includes people like telecoms seemed to have the top, the highest priority there. And that's not that surprising considering how much the technology industry processes information about all of us, right? So that makes a good deal of sense.

The surprise on this slide was actually, or in this set of data was how small, a number of higher education organizations seemed to think this was a big deal, especially by the way. And I don't have this in the slide, but when you broke it down to higher education outside of the EU where there's tons of EU students data, clearly in their systems, they didn't have a very high priority on this.

And, and that's, that was an interesting result that surprised us. Unsurprisingly, if you ask people who owns this within an organization, a lot of people say that it's an information security or an information technology functions since it does in fact fall. So squarely into the technology area, but a healthy number saw it is legals responsibility, which says they're viewing this like other compliance in their, in their view. But of course this also masks some, you know, small organizations where legal also contains the office of the CFO, who is the boss of the CIO.

So some of this might be a side effect of that, but not too surprised to see who would be responsible for this. And when it comes to why people were stating that they were gonna have a big problem meeting GDPR, of course, it's no surprise that number one was lack of budget because there's never enough money for everything. But what I found very interesting was the second place here, limited understanding of the regulations. And by the way, it's not second place by much, right. 32 versus 29%.

But this is interesting to me in part, because I actually find the regulations themselves rather clear, but I think it has more to do with people not applying wisdom or knowledge. It's not about not knowing what the standard says, but they're applying wisdom. They're not sure that what it says is gonna be what it means ultimately. And we'll talk a little bit about that in the next section, but it, it is the most interesting part of, of that part of the survey.

So moving on from the survey, what I'd like to talk about is what sort of questions do people pose to steal bits as the experts on unstructured data with regards to GDPR, and I'll point out first that on our website, if you were to go to steal pits.com, you would find an interactive version of the diagram you see here on the screen.

And what we've done is we've broken down all of the chapters articles and individual pieces of the GDPR standard and made sure to show where those are applied to the technology that we supply, but more importantly, to the areas of information technology, where we specialize. So it looks unstructured data, it looks at your infrastructure. It looks at all the places where GDPR can touch things and it makes it clear what the connections are between these.

So if you want a really detailed understanding of the topic of unstructured data and GDPR and how it relates to very specific parts of this standard itself and what it says in detail, I would recommend going to our website, finding this under our GDPR section and clicking away. And again, this is a free resource. You don't even need to register by the way, to grab all the information that's on this.

Well, actually I think you go three or four clicks D you register, but the, the, the first couple layers are there for anyone. So that's a great resource for you. If you're trying to learn more about GDPR.

Now, the, the thing that I'll start out with is that the relationship between unstructured data and GDPR is unknown, right? And I know that's probably not what you might expect me to say, right? It would be much more advantageous for me to say, well, here's exactly what it is. And it ties to these things and we've tried to tie it as best as we can, but to the point I made on the survey slide about, you know, people saying their second biggest challenge is not understanding the regulation.

I, I think that if you've been around a while and you've seen some of these compliance efforts roll through what becomes clear is that until there are legal challenges, until there are places where the regulation and it's enforcement meets a real company and the real company, and it has an impact on them and they try to find them and they try to define why they are out of compliance. And of course they get their lawyers and try to fight back against that until you see that happen, it can be very difficult to understand exactly what the impact will be.

So we need some test cases, and I don't mean, you know, things you run through QA, I mean, test cases in the sense of legal cases, that will show us exactly what some of this language means, because the problem is that GDPR is very clear about information right now in Matthias section, he talked in detail about having to have consent and having to have privacy. And this all applies to the information. And it's very clear when you think about it in terms of perhaps, you know, Matthias being an EU citizen, the data about Matthias that might be stored, let's say, in my database, right?

That would be a very clear thing. I have a responsibility as the processor of that information, but what it's less clear about is data. And I mean, now the technological form that the information lives and flows in through systems and where the responsibility is clear about what we're supposed to do with the information, what that translates to in actual practical terms is not as clear, right? There's no specific guideline that mentions any specific technology, right? Including unstructured data.

So, you know, you know, the idea that you can have a very clear present notion is probably naive, right? We, we do need to get some test cases before we really know. Now that said, one thing that is also extremely clear is that GDPR protected information lives in unstructured data. There is no question of this, right?

So employees and contractors, right, when they're doing the thing that Matthias described, taking information out of controlled systems, structured systems, where you've probably got a pretty good security model, or at least you've been supplied a good security model by the vendor. In most cases that's going to get, and, you know, we can use the attacking term here. It's going to be exfiltrated out of those controlled systems as a normal means of work and get itself into unstructured forms. And of course, that's something you wanna allow, right.

You know, to, again, some Matthias point that the marketing person sharing information with the salesperson and the coming up with unique ways to open up new opportunities for the organization, of course you would, that's desirable. And what GW R is asking is that you have controls in place when this information is going from one type of system to the next, when it's going anywhere. If you know the information is there, you need to have the appropriate controls, which of course raises a huge number of questions. How do you know the information is there?

How do you determine that this information has in fact flowed into some unstructured form has been stored on some place that you can't predict, because it's not an algorithm making a choice as to where it might be stored. It's a human and humans by their nature, very unpredictable. Maybe they put it in SharePoint like Matthias example center, or maybe they just put it on their hard drive, or maybe they put it into, you know, a USB thumb drive that they carry around. Cause that's how they prefer to work. And we enable people to work the way they prefer to work so that they can get things done.

But at the same time, we have to have some measure of control, even when that's the case. And so this is where GDPR meets the real world. And this is where the challenges start rising as to how can you have the appropriate oversight of unstructured data to meet your obligations. But at the same time, allow people to work in a way that is conducive to the organization growing and thriving. So we of course have some suggestions, right? If you think about the process from conception through to, you know, implementation of controls for GDPR, there are a couple of guidelines that will help, right?

And, and it comes down to two things. But of course, these two things imply lots and lots of other sub agendas, but the first one is established business owners for that unstructured data that has to be protected under GDPR. Now that aligns perfectly to what GDPR asks us to do, right? It asks us as a data owner to take responsibility for that. And that's at an organizational level. My organization is the one that has the role of a data owner, right? Or a data processor for that matter.

But within the organization, I need someone, some human to be the person I can identify as responsible for any set of information, wherever it might live, including in an unstructured form. And this implies lots of questions, right? This implies, you know, where all these unstructured data is, you know, what's in it, you know, who has access to it, you know, who's using it and how they're using it. And these obviously are complex questions that many organizations don't have very good answers to right now.

Now, once you have a business owner, the notion is that they can make it authoritative decisions about proper access and use, right? From a technology perspective, you're aiming for least privilege, cuz that's just the thing to do because over the years, we've proven that that will give you the best results. But specifically in the context of GDPR, you are aiming to make sure that there is a person who is going to play the role of being responsible for information in whatever form it might be, including and structured.

And that person can at any given moment, tell whoever carries the role of data privacy officer, or is in fact perhaps dedicated as a data privacy officer, what the circumstances with some specific set of information that may have been requested by the user or become impactful for some other reason. And that is why you need a person who owns this. So somebody needs to be able to take responsibility when those requests and problems arise. The other thing you can do is monitor the activity for unstructured data because adding an owner is excellent, right?

It helps you from a governance standpoint and it helps you from an operational standpoint, but it's not like this owner is gonna spend every day, all day just looking at this information and the data that contains it and where it goes. You need a system to do that. And it's something you're gonna need to deploy to look at the flow of this data as it goes through these systems.

And even before you're ready to assign owners, this can be useful because you can be watching for anomalies that may represent, even before you understand who's responsible on a, you know, on a sort of director level, you know that the organization is already responsible and looking for anomalies and access to this data, whether it's an anomaly in the form of an attack or an anomaly in the form of just misuse as defined by GDPR, you need to be able to see that going on in order for you to be able to control it.

Now, of course, stealth bits has a platform stealth audit that could help you do this across the bottom here. What you see are all the different systems that stealth audit can help tap into it. As you can see very focused on the unstructured data side, going everywhere from on your servers, to your filers, from things like NetApp and EMC and Hitachi, also your cloud systems office 365 Dropbox box exchange and more right. And of course SharePoint what Matthias mentioned earlier as well, both online and on premises.

Now these critical data systems, what we're gonna do is we're gonna have steal, audit, go and scan all of these systems. So it starts answering the question, where is this data who has access to it, right? Those scans start to do it. And this tells you who has access, what we call effective access. And what we mean by that briefly is this. Isn't just simply looking at the permissions.

This is taking the permissions and intersecting them with all the other layers of information on the system side, on the network side, everything that can possibly be calculated to determine who actually has access. We add all of that up. We can also then use that to help contribute to data ownership. This is to helping to directly assign these owners from a business standpoint, we can also do sensitive data discovery, answer the question.

Does it, in fact, contain Matthias information, does it contain EUT information that makes it GDPR relevant? And we can also do file activity monitoring now follow activity monitoring. We can also do this on its own, right? So it's an optional component overall, but it can also be run on its own. So that discipline of watching what's going on, you could start there and then build up more capabilities later, layering things on as you're ready, right? So the platform lets you start where it makes sense for you. Once you've get it in place, you're gonna have reports.

You're able to do investigations. You'll at alerts. You can even put self-service access request and entitlement reviews into place and you can take automated action based on what we see there. So if you see an entire set of access granted to people who just, it just doesn't make sense and it's clear, it doesn't make sense. You could put actions in there to remove that access or at least modify it perhaps from right to read.

You can look for conditions like open access, put policy in place that says, I only want open access under these very specific conditions any to other time alert me or maybe just shut it down. So you have the ability to control a lot of what's going on in these unstructured data storage systems that you probably do not have today. And more than that, we can actually take these systems and we can make them a part of your existing platforms, right?

So we can actually take all of this and integrate it into your SIM systems from an activity standpoint, into your identity management systems, from a governance and access standpoint, and essentially become part of the larger solution that you might have in place for GDPR and beyond now real quick, I'd like to cover two stories for how steal fit's customers are approaching GDPR in the real world. The first one, it's a multinational us based financial company.

And about, I'd say about, you know, four months ago now they had put in a big RFP as a result of having one of the big five consultants come in there and show them their exposure for GDPR. The comment that this customer made and what's interesting is they were a customer before this, right? And they ended making another investment with us.

But you know, we talked to them and said, the number was intimidating in terms of the risk exposure that they had because they do so much business with people in Europe and what they started to do with this consultancy during the process of this RFP, creating the RFP that is to, to understand what they would need to help solve. The challenges is they started doing what's called data flow mapping. If you've done GDPR already, or if you've done, you know, risk analysis about unstructured data or data flow in general, you've heard of this.

And essentially just looking at workflows that look not too, unlike the slide that Mattia showed earlier about how people use information in an organization, going from applications through to files, maybe back to applications. And you know, they were trying to make a very complete picture of how data flows through the organization and they thought it was gonna be all about applications. But what they saw very quickly was that users were taking a lot of this data. And as I said earlier, exfil trading it it's unstructured data. And that really increased the exposure.

That was actually one of the things that made that number. So intimidating was the amount of unstructured data that contained this information and they were not expecting to find this. So they actually came to us because we had been working with them in another capacity to help them with unstructured data security and asked if we could help with this.

And of course we were prepared to and we'd been helping them, you know, for the last couple months, get to a point where they're feeling more prepared, funny enough, they actually still have the RFP out there for some of the bigger concerns on the GRC side. But you know, in terms of actually locking down the unstructured data, they're actually feeling pretty good about that. You know, they feel like they have a good plan in place, right. And we're trying to help them execute on that. So the other story here is a multinational EU based telecom firm.

Now they've been very intensely focused on GDPR for a while, which by the way, fits very well with the results of the survey, right, where we saw these, you know, technology companies considering GDPR, a bigger priority, right? So they, this, this example fits very well into what the survey results showed us. And they're focused on that privacy by design, right?

However, they understand that this is a very long-term goal. And they're working with consultants to rebuild applications to, to suit that in the meantime, what they've done is they've created several solutions on the Splunk platform and for unstructured data, they're combining the stealth bits capabilities to watch activity for unstructured data and put that into this Splunk solution that is then helping them keep an eye on all of their information. That's GDPR sensitive.

So they can keep essentially a very, very quick focused tiger team on this while they transition on a larger scale to a privacy by design stance that will hopefully control a lot of this. Now, interestingly, one of the things that we've talked about with them is that, you know, they see that the unstructured data part of this is still gonna require other solutions where they they're gonna build into the applications, you know, strict controls that are GDPR focused. They can't do that to the unstructured data systems, cuz those are essentially off the shelf, right?

They can't alter the code of their file servers. They can't alter the code of SharePoint itself. So they are gonna be relying on us on an ongoing basis and we're helping them design what that system looks like too. And it will interact hopefully with the other controls that they build. Okay. So at that point I've reached the end of my materials. So I wanna transition back to Matthias at the KuppingerCole side and let's get some Q and a answered. So thank you, Jonathan. Thank you for that great insight into what do fit is doing at that point.

And that was really interesting and that we've got a lot of good questions already there, but please make sure that you add all your questions that are still open to the questions panel. Okay. To start out. I'm I'm I'm trying to start with a very quick question. I hope Jonathan, you can hear me. I can hear you, but Okay. Okay. It was so quiet. So sorry. First of all, very, very short question. Question around the product. Do you plan to have a connection with Google G suite as well? Or do you do that already?

Yes, we will be able to do Google G suite it's on the roadmap right now, but yeah. Be perceptive. You didn't see on the slide there.

Yes, that's correct. Okay.

So that, that was an, an immediate question. I thought, let me put that off the list very quickly. So for another question is how to, to, how to identify or how to identify whether data is actually PII or not. There's one question, I just read it out for internal audit. We are exporting user data, the details from ad and that contains username, mail ID and some additional information, including ID status. Would you consider that as PII? I first of all, maybe a question to you and I will add upon that as well. We are both no lawyers, I assume.

So this can only be an opinion, but Thank you Matthias for protecting me there. Yes you are. Right. I am not a lawyer.

I can't, I can't interpret don't take this as legal advice, but my answer is yes. I, I would certainly consider that personal information according to I read the GDPR standard myself and I've read it a couple times at this point, actually. So I would say yes. And I would say that you, there's a couple of things you can do to help control that, right? You can actually clearly wherever the export ends up, that's gonna be PII and perhaps an unstructured form Excel or whatever.

And that would be something you would want to keep track of of course, but you can also keep track of that, the active directory layer as well. If you monitor your active directory well and monitor the activity against it, you'll be able to see when and how that happens. What data is, is leaving that system and perhaps put some controls around it to make sure that certain information never leaves or only leaves in certain forms. Okay. Yeah. Great.

And I I've just checked with, with, with article or chapter four of the European or the GDPR and that's that all information that is related to an identifiable or identified natural person is considered to be PII. So that is as far as I can interpret it, it's, it's clearly a PII that we're talking about there and if it is used for a different purpose auditing, that should be covered in my opinion. Another very interesting question that I have here is from a very concrete use case.

I read it out again and let's assume that you managed to extract data surname and email address of a notary that is, that happened to be undersigned under agreement and that is extracted from PDF and that leaks, would you consider that to be something that the notary should be notified of? Or how do, would you deal with that? What defines criticality and how do you deal with that? Wow. Okay. So first of all right, if we understand what we want to accomplish, right, we wanna make sure we know where that data lives and know that it has the appropriate controls in place.

Certainly that data would fall under something we could find and help control. If the question is more, I'm struggling to try and actually understand exactly what the question is asking.

Frankly, if the question is more, you know, does that constitute like the last question, something that GDPR would put you in a response to be responsible for again, you know, don't take it as legal advice, but my, my opinion would be yes, I believe that would certainly also fall under a situation much. Like we said, in the last question as Matthias article four, it is pretty clear from a layman's perspective that that does seem to be something you would want to control. Right?

And that, that is something that, that, that comes up in another question as well again. And what is, what is sensitive data? Maybe that is something that was understandable in what I said, of course PII is sensitive in any case, but of course there is more or less sensitive PII when it comes to, to health data, to, to financial data, to, to, to resume data or professional experience data, which should be considered more sensitive. So there's, in my opinion, no, no such thing as non sensitive PII, but of course there are different degrees.

That is one question that came up that I just wanted to, to make sure that that sure, Just from our, from our perspective on the technology side, when we talk about sensitive data, just purely from our products standpoint, sensitive data is defined as anything that you've told the system is sensitive and you know, the way we do that, very typical approach, right? You use regular expressions and keywords and certain algorithms like the one algorithm for credit card numbers, for example, to identify and, you know, basically zero in on what that data is and where it is.

And from our purposes, you can actually combine a bunch of these into higher level ideas like that you would do use to identify for GDPR PII, right? You could, you can identify that we have plenty of that out the box hundreds of these defined already, and you would use those to zero in on what is sensitive data.

Now, if you in your organization have something that you consider GDP, GDPR sensitive, that is not covered by what we have out of the box. It's easy enough for you to add those as well, using those same mechanisms I described. And then we can add that to what we would look for in your unstructured data. Okay. One question very closely related to what you mentioned earlier within your slides is how very simple question, but complete a complex topic. How do you identify owners for unstructured data?

What, what, what is a typical process within custom organizations that you work with? So there's four basic ways that we do this. Three of them are out of the box. And one of them is in cooperation with, with the customer, from a consulting standpoint, from the outta the box standpoint, it's pretty simple. It's a heuristic that looks at who in fact created the data and that, you know, is pretty clear when we collect the information we have, who is the person that is the manager most associated with a block of data.

And I call a block of data, cuz you can do this at a folder level at a share level. There's different ways you can slice and dice where you wanna identify an owner.

We also, if you're using the activity portion of things and we will then take into account who is actively using that data and how right. And then we have heuristics for each of those categories to sort these, to come up with what we call the probable owner based on what we see. And in fact, we have a heuristic that takes then all of that and looks at who is the most probable owner based on all of the information that we see now, that's what we can do outta the box. As you can imagine, that's relatively generic, right?

That's looking at, you know, what we know every single organization is doing with data. There's no question that if you have data, you're doing those things.

It, you though have another set of metrics that uses information that we know that we collect. And this happens quite often, whether it's the contents of the data or it's the structure of the systems it's stored on, or it is things about the ownership of the data that are not visible at the data layer, but maybe visible from an identity system, like an active directory or your identity access management platform. If you wanna pull all that in, do a different kind of calculation, we are perfectly capable of doing that.

And we've often done that for customers so they can actually get ownership calculated exactly the way that they want that. And by the way, we can also take her out of the box stuff and relate it to those as well. So we actually work with a bank in Canada, for example, where they use our activity and information from their HR system to do a custom calculation about who owns the information, who's the probable owner in that case, hopefully that's clear. Okay.

Now, great. No, thank you very much for that. We are getting close to the, to the hour, but there's one question that I really like because it, it brings a good additional aspect.

So where, what are the few things that I can tackle today with little to no budget, if I don't want to spend any money, what would your recommend somebody who just wants to start out understanding what to do and to provide adequate measures? What would you recommend to people saying, no, I don't want to spend money. I just have people and time.

Well, okay. That, that's an interesting one.

I mean, from, from our perspective, I mean, one of the things you could do, I mean, obviously it's not the way we prefer to position our software, but we do offer a free trial and we do have customers that certainly have gotten a lot of value out of the free trial to go and find, for example, open access. Right? That's one of the things that's quite easy to do with very low privileges, very low impact on the environment, but a very big bang is in the sense that, you know, open access is definitely trouble when it comes to GDPR because open access is essentially no control over that data.

So, you know, identifying and eliminating open access just with the free trial is something you can do. There are also other less elegant ways to do the same, right?

You can, you know, use PowerShell and things like that to go scan your data and, you know, find information, find open access. It is a lot of man hours in time in that case. But you know, it is something that would then be no spend or at least no spend to a software company, I suppose, as opposed to spending on your staff's time. Not sure what else I would, I would advise that completely free though.

UMTS, maybe you have some ideas Actually. No, I think, and, and, and if you look at the, at the, at the calendar and we look at only 200 something days left, I would actually try to choose a, a professional solution rather than really trying to, to do something that really is yeah. Resulting from, from yeah. Trial and error, which, which I don't think is, is really something that we should do. So we have a lot of questions left and we will follow up with them after this webinar.

And if you do have more questions from the audience, please make sure that you're getting in touch with Jonathan and or me. The main addresses are in the first slide of my presentation so that you can get in touch with us. So I really thank you, Jonathan, for, for providing your insight and your expertise in the area of, of unstructured data and GDPR. Is there something, some famous final words that you want to to add before we close down this webinar? I'll thank you as well, Matthias and cuppa Cole for hosting us today.

And I'll encourage people again, just to go to steal pitts.com and look for the GDPR section. And you'll find there, the survey that I referenced earlier, as well as the tool to map GDPR articles and, you know, chapters and articles directly to their technical implications.

So, you know, does some free resources for you right there. Okay. Then that's it for today. We would be happy to welcome you and another webinar soon. And that's it for today. Thanks for your time and for your participation, for your questions and to, to you, Jonathan, for your answers and your insights, have a great rest of the day and goodbye.

Like this?

Don't like this?

Unstructured Data – A Blind Spot for GDPR Compliance