Webinar Recording

The No. 1 Rule of Secure Cloud Migration: Know Your Unstructured and Dark Data and Where It Is Located

Name: The No. 1 Rule of Secure Cloud Migration: Know Your Unstructured and Dark Data and Where It Is Located
Uploaded: 2019-05-22T12:00:00+02:00
Duration: 50 min 45 s

Posted on May 22, 2019

With a huge amount of data around, cloud migration is the ideal solution today. A necessary stage in migrating data to the cloud is putting it in order. This is particularly important when it comes to unstructured, so-called dark data: files and documents that are undermanaged (excel files of budget estimates, PDFs containing important patents, Word documents containing personal employee or customer information), in general the data that is not managed in an orderly fashion such as structured database which is easily governed. Usually, this kind of data that tends to be misplaced, misused, abandoned, hacked, or leaked, in the cloud and local server repositories putting enterprises at risk.

Show description

Webinar presentation, KuppingerCole

Webinar presentation, MinerEye

Speakers

Yaniv Avidan

CEO & Co-Founder
MinerEye

Martin Kuppinger

Principal Analyst
KuppingerCole

Lead Sponsor

Show Transcript

Good afternoon, ladies and gentlemen, welcome to our equip and cold webinar. The number one rule of secure cloud migration know your unstructured and dark data and where it is located. This webinar is supported by mine.

I, the speakers today are ya ADA, who is CEO and co-founder of mine eye and me Martin Kuppinger I'm one of the founders and principal Analyst at KuppingerCole. Before we dive into the topic, very quick introduction of Ko, a coal and a little bit of housekeeping. And then we talk about how to protect your unstructured dark data when moving to the cloud or even without moving to the cloud. So it could be a call.

We independent usual and focused Analyst company with people around the cloud globe, focusing on information, cyber security, identity, and access management identity and access governance and other areas concerning the true transformation. We deliver three groups of service to our customers, which are research events and advisory.

So research for instance, our leadership documents, which we publish regularly on a variety of topics, our events, such our Sr European identity and cloud conference, which we run just last week and our advisory services for defining strategies, roadmaps, and other stuff. So here we deliver benchmarking strategies, support architecture, support technology, election, and project guidance. So supporting you in understanding where you are and where to move and how to do best to it. This is what we do.

And we also support you with side of our webinars with a variety of anxiety events, including our digital finance role, the blockchain enterprise days in September, our cybersecurity events, which we will run in Washington DC and in Berlin, our upcoming first AI related to we Munich and our consumer it events again in the us and in Europe having said this let's directly move to the webinar, content, housekeeping, a little, very little bit. You are muted centrally. So you don't have to mute around with yourself here, managing this.

We are recording the webinar and we will put the podcast recording online relatively fast after the webinar. So you can then access the recording. You can share the recording with your peers, and we also will provide you with the slide deck for download so that you also have access to these. And that will be a Q and a session. By the end. You can answer questions at any time using the questions area and go to webinar control panel, which usually is at the right side of your screen. The more questions, the more likely the Q and a. So don't hesitate to enter your questions.

So let's have a look at the agenda. The agenda is always in our equipment called webinars, split into three parts. In the first part, I'll talk about regulations and other drivers that require businesses to understand where data resides and why they need to get a grip on the dark data.

So I'll also look at not only technology, but the entire people, organization tools perspective, and the second part, then mine, I will talk about how to easily categorize and identify the sensitive data during before or after data migration and how to be on the safe side and the what data breaches by using the right tools. So he will really demonstrate on what to do to get a grip on the data, how to do it, and factual demonstrating that is feasible to do that. But three is already management will be the Q and a session. So with that, let's start with a high level slide.

But I one that is, I think always important to keep in mind that is when we look at all the things we do around security, compliance identity, et cetera, we need to do the right. We take the right actions. So I still see, and I have a lot of discussions about just focusing on checklist compliance. So compliance actually means you meet loss. You meet regulations audit then is that you also can prove it historically, so you can pass an audit. And so you have checklist compliance, you have passed the audit fine, but action is what you really do.

That might, might be less critical or more good if you do the right stuff, then the auditor asks for. So it's really actual what you should focus on. Audit compliance are hardly related, but they don't really make you secure. And they'll come to one of these slides later on where this becomes super obvious that what auditors primarily ask for is far from what you need to do from a business perspective to mitigate access risk. That leads us to the second aspect, which is access risk access risk. Today is a business risk. So inappropriate access directly connects to a variety of business risks.

I don't need to bring up these incidents and some large banks such association around, which has cost the banks. Billions, there are operational strategic risk, their reputational risks specifically when you look at PII. So personally identifiable information. So personal data, there's a huge reputational risk if you have breaches, but there are other things as well. Think about your intellectual properties. If they leak that could put you into a critical competitive position, that's a strategic risk. So access risk clearly is not an it thing anymore.

It's about having the business policies, follow them, implementing it executed changes correctly and bring the business because they have to expertise about when to end, when to terminate a certain access, how to deal with access cetera. So we need to make it work for the business, but we also need to understand if we don't have a CRI access risk, then we have a business risk, major business risk. And when we look at regulations, I just wanna peak one, which is a very well known GDPR.

So we, while ago we, we, we published a leadership brief, which is free for download from our website around the six key actions to take, to prepare for GDPR. I think there's still a lot of actions in which many businesses need to take because they're not yet there where they should be. They're these six key actions factually, when we look at these are when they start with discover the PII data, we can extend that when we go beyond the GDPR regulation to the entire access risk perspective, it's that only discovering the PII data, it's discovering all sensitive data.

You need to understand where data resides. If you don't understand that you're not able to protect data, it all starts with discovery and understanding the data. So that's touch it on a minute classification, stuff like that only then you can control access only. Then you can do things that would be very GDPR specific, such as managed consent. Only then you understand where data is processed and you can only then manage your cloud services adequately, select the right cloud service. Understand what it means.

One of our advisory methodologies for instance, is the standardized cloud risk assessment, which helps you understanding which risk you take when going to certain cloud services. You need to understand that it depends very much on the data that resides there, and then you need to manage appropriately. You need to prepare for a breach.

Yes, and you need to do everything securely privacy engineering. So privacy divided, design security by the same. And it's super simple, how data can spread specifically unstructured data. So you very quickly can lose control about a data and give me a very simple example. And that's something that along these lines happens every day in most businesses. So marketing might create a target customer list, which actually is derived from the CRM system. So it's an expert in an Excel because it's easier for handle with an Excel file. That ex file rec resides somewhere.

That's where the problem starts. Where does it reside? And what happens with that? The team member takes that he sends that we are male as an attachment to somewhere.

Oh, where does it reside? What happens with that? A colleague.

Okay, nice. The colleague modifies that list. So it's different data puts it on the sales SharePoint side. So from marketing to sales, the SharePoint side might then be already cloud. It might have been entered in the cloud even earlier. It might have been ended somewhere else because once it's out of this central, the managed systems, we always are about to lose control about unstructured data.

So most business don't have appropriate control about in that case, for instance, PII in axle, in mail and exchange local copies, they might end up at some cloud storage such box or so, et cetera, et cetera, we have a challenge. And that might be first shared hard to keep a grip on that, but we need to do, because it's about content. It's about transparency. So where does it reside rights of the data subjects? So can I delete it if I don't know where it is?

No data, immunization, storage, limitations, only storage when you need it only have it when you have a purpose for that, et cetera, et cetera. So there, even from GDPR, but I think it's easy to transfer. It's beyond GDPR. There are variety of reasons why this is not the best way to do it and why we have a challenge here. And that is what leads me to a sort of a related perspective and discussion around that, which is around how does our access governance, the broader sense of the world need to change.

And when we look at access governance in a broader terminology, that access is not only static entitlements of on premise, maybe some cloud services. It's more it's about, there are so many types of access. You access your device, you access a network, you access by a system by authenticating. Then certain entitlements are tracked. Whether you have them, that's part of the authorization process, cetera, cetera. They run everywhere. The current future of access governance. What is commonly implemented is in IA tools.

The identity governance registration tools is pretty much focused on static entitlements historically on, on premises system. We saying growing support for cloud services, but frequently not as staff, as deep with having the same depth in support for cloud service then for on premises, it's getting better, but it's still long way to go. So access governance need to do more. We can discuss devices and threats. There's a little bit of different kind of beast, but we need to look at all types of data. Also unstructured data that will be discussed first more in the next slide.

So application system, data access across all deployment models, device and network access, as I've said, is a little bit of different beast because this is really frequently covered more by the network and cybersecurity things and endpoint security. All of them together obviously are what you need to do to support zero trust paradigm. So apply a lot of controls, track them continuously don't trust a single entity, but we definitely need to go beyond the traditional focus.

And if you put all these various dimensions, we could add to access governance in one picture, then it gets pretty, pretty full picture here. So we, we see there's the access management is one dimension, which could be non-privileged. So the left hand side more to the upper edge non-privileged privilege, which is some dynamic access, but there's also the static entitlements. And then it's also case education also usage, and we have different deployment models and we have everything.

If you look at the, the columns from, from the right to the left, we have access to the device that which goes to a network, which goes to a system to an application system. And then some data is used. And we have, which is a little bit of separate thing. We have all the access policies and we need to get equipment. We look at access governance and all of the standard IGA in fact, looks at a green areas on this better or worse, depending on the implementation in the yellow area. So the cloud services, cetera, but basically it's a relatively narrow focus.

So in a broader thinking, we need to focus on a broader perspective on access governance, including data governance. And then we have to structured area. So data which recites databases, big data governance, analytics, very interesting topic. We have the policies across everything. We'll touch it in probably a little later on as well. But we also have this huge area, which we need to look at and to extend the focus we have in standard IHA, which is that one, which is read the unstructured data regardless of the deployment model, where does this data rec reside? How we can we track it?

How can we classify it? How can we ensure that we only have appropriate access to the data? This is one of the things we need to do. If we want to do a modern, comprehensive access governance, which covers all types of our access risks. So we need to think beyond the traditional approach, we need to go broader. And that is what we need to do. And that all means we need to get a structured data, which means we need to track data. We need to classify it. It so that we can protect it. And data classification then becomes a super important element, which we can do in various ways.

We do it because the regulations, there are various regulations we have, we need to do it efficiently. So classification helps us. So we need due to the regulations. It helps us to better discover data, to better apply access policy, to work better with that. And it increases our security because we then can also use other tools to add security. Once we know where the data is, once we have classified right? Once we understand it, we can apply security. That's the way to go to track, classify, protect, mitigate the access risks. Data classification is a challenge.

It's not super, super simple, but it is feasible. And we need to go for approach that really helps us do so. We need to mix probably as some things beyond trust technology, we need to understand the purpose, the level of detailedness. So what is really the way we want to do it? We need to then define our approach, set up the technology and not every approach for classification works equally good for every type of organization. If you're a governmental organization or a defense organization, you have definitely a different approach on that than you have as a whatever technology company.

So make it work for you, set up the technology. You also need to train the uses to some extent. So some part of that is always a little bit of use if they're involved. So there also automated approaches depends on, but obviously some need to understand it, check works, good, optimize it. And then you can build on that, build on the results and improve your security, mitigate the access risks. So the interesting, regardless of what we do, trust, focus and classification trust, focus on, on protection, focus on the entire perspec entire area of access governance and their enforcement.

It starts with the user and we need to bring in the people. We have need to have to ride people, but also the right organization before we come to the tools. So we need to do everything. When we look at the people, it's about awareness, they need to understand why to do that. And a lot of things can be improved when people understand, just have an understanding why we need to protect certain data. Once it's easy to explain today, I believe because most people have the same challenges in their everyday life as their everyday tools, we can empower them. So enable them to, to protect.

We need to help make it work for them. Easily organization, clear guidelines. What do we do? Access policies, data usage, policies, and enforcement of them, a policy, which is not enforced is not worse. The paper it's printed on and ensure that you have the people you need always people to support it, to work with the tools, to improve the staff, to check the staff, to enforce policy, et cetera, and then the tools and here from my perspective, it's very important.

If you look at this big area of access governance, access risk data protection, there are so many tools and ensure that you don't end up with a Sue of tools. You can't handle focus, define the portfolio, measure tools, compare tools regarding for instance, their risk mitigation impact and their, their cost or their, their effort. It takes for them. And think about what is the best mix of tools, their standard approaches standard ologies to do. So portfolio management is one part of your strategy roadmap work you have to do.

And then you come to the automation part, try to automate as much as it can minimize the human innovation intervention, but bring in human innovation into winter, wherever it is needed and doing this right will help you successfully reducing the amount of access risk you're taking in your business with that. I want to hand over to Janni who's the next speaker. And he will talk about how to easily categorize, identify the sensitive data, how to do it before and after moving to the cloud, how to be on the safe side and avoid the data breaches.

So Yeni, it's your term. Yeah. Thank you very much, Martin. And nice to have you all in this presentation. I will take from mounting as from the point of how the most hot topic today of cloud adoption for most of the companies, I think 97% of the organization and securing and managing better. The unstructured data actually reduced that risk that we are actually discussed before and show you how we can deal with this in a very structured and very consistent way. Leveraging new technologies you can see from, from the chart hill, the rate of adopting cloud.

And most of it is due to let's say the major challenges opportunity to introduce by the cloud adoption. Okay. Most of the companies are now driving some sort of project that mostly driven from cloud or cost saving, not just, you know, CapEx cost saving in storages.

And so, but also operational expenses where in the cloud to run big data applications and, and processes might cost even more. Therefore they need to have a very good plan on how to migrate data to the cloud or applications into the cloud and be very selective and very effective in this space. So this is a business approach or kind of a, kind of a cost saving perspective, how to drive the data to the cloud. On the other hand, we see two major vectors that affect that directly.

The first and foremost is the regulatory aspect of data privacy, not just GDPR, but we see also in the us CCPA, the California privacy act, and also in Massachusetts and the federal bill that is coming in. So everybody is dealing now with data privacy and data protection and the regulation of how to manage that data.

But the third vector that is currently pretty much hidden from the eye of most customers or most companies is the scale of dark data and the enormous exponential rate of growth of data that is undermanaged UN uncontained unidentified in most cases in the form of unstructured data in files. And this is those three vectors in the market actually create the challenges and opportunity that I'm going to talk about. So the Doug data specifically as defined is the data that there are not very good tools to capture or unlock the value or the risk behind the data.

Hence we don't really know or understand what is the level of risk to the organization, not just in terms of reputation, but also in terms of the actual business results, right? There are no good access patterns to that data is, is Martin stated before? The reason is because data is not well categorized and classified, and it's very hard to classify it.

And of course, data that is missing and incomplete because it's not managed well at the end of the day, you know, what is out of sight and out of mind, nature doomed to be less protected and less maintained and, and actually face an unmeasured risk at this space. Now, when we go to organization and how and ask how many files you have and where those files are and what are the categories, I think very close to a hundred percent of the organization, the answer would be, we don't know, right? Which actually is a good indicator to what is the state of managing that data.

So what we can provide in this space is leveraging new technology to handle that by automatically calling and extract insights or information about the data, aggregate that information into actionable categories, where for the first time the user can interact with those categories and understand what is the essence of the data.

And by understanding what is the essence of the data on large scale repo like cloud, and on premise, it can then set up policies and set up classifications on top of that and using the AI capabilities to track the changes of the data, because we all understand the dynamic nature of the data across those, eh, processes. The value would be adopting cloud faster and basically comply with the requirement of management to reduce cost. And on the other hand, stay protected and more compliant and be able also to reuse the data and extract the value out of that data.

So when we are talking most customers out there, we see that most of them are pretty much planning or in the, in the, in the process of, you know, depends on the, the maturity of the organization, but in the, in the process of implementing the digital transformation revolution, right, where most businesses are and identifying that this does not come without any cost on their side. In most companies that evolve themselves, the data evolves data has been changing, right?

In most of the financial services industries, there is a culture that contradicts when it comes to multiple regulations, right? There is the data privacy regulation that comes from purposeful processing. You don't store data without any purpose, but on the other hand, they tend to save everything right, just to make sure that they comply with the policy around the financial services. So they're very cautious in deleting stuff or reducing the data.

There's also the point where data is spread across multiple platform, hence less managed well because of the nature of, or the dynamic nature of data and the lack of analytics over that, the abundance, the overabundance of data sources and tools that actually generate the data, more data, wider attack surfaces, and the how to find and utilize valuable data actually at the end of the day result in losing business opportunities. So you see, there are many, many challenges when you go and digitize your business.

On the other hand, the way to handle that is basically start top to button and redefine the data management strategy and data management is, is tightly coupled with data security at the end of the day, or access control over that data. At the end of the day, when creating a new data management strategy, the most powerful replay would be to focus on handling sta and absolute data as the first stage, just to reduce the analytic surface, right?

We wanna get rid of data that has no value and only obscure the value or the risk behind this huge and vast data storage that we keep on saving and also handle the save strategy and culture of those organizations. How do we help them basically reduce the data that has no meaning and has no value to them and count, and basically handle that a culture of saving all the data. The second thing would be to enforce the policies of course, of data retention and policies of data protection that is challenging, but it's less challenging to what you get the grip of at the data.

At the end of the day, it's all comes down to using the right technology. In most cases, dark data and unstructured data is very hard to analyze.

Therefore, it's also a consequence of, of new technology that comes in like, like a collaboration technology, a clouded technology, you know, unstructured messaging systems and so on. So forth, therefore only new technology can actually solve without data problem. And that's, what's basically what we're gonna deal with. Just to give you an example of a flow for business solutions or use cases using the minor I approach it's of course phased, but it provides you some logic on how to handle that.

As I said, the first step would be in, in the cloud adoption process over unstructured data would be to minimize the data. Essentially the, the tool will help you to group information about data that is not being used and contain a lot of duplicates and near duplicates, which is the nature of fives at the end of the day when we create them.

And that will help you drive an immediate data reduction or data retention policy to get rid of that data, including the internal analytics of PI personal information, not just PII at the end of the day, you need to understand whether this data belongs to somebody. Somebody owns that data and not necessarily contains PII or identifies specific identifiers internally.

Again, the result will be to reduce the data and basically focus on what needs to be migrated to the cloud because it's being used across business processes. And which brings me to the second step, and this is how we can secure the data label, encrypt prevent access based on those categories that the system will actually solve through and visualize to the end user. There's also the requirement to well segregate the data.

At the end of the day, we want to make sure that the cost business processes data is saved and stored where it should be not just to comply with regulation, but also to be efficient with the data and reduce the attack services from internal and external actors or adversaries. Those are the three major steps that will enable you to go and, and do the leap in the maturity model of being able to secure or unstructured data. The phrase approach basically offers three phases to these kind of projects.

Most of the, of, of those companies will start with some sort of impact assessment on a very limited area of data. After all, we don't know how much files we're going to scan. We don't know how many categories are there and we need to visualize this.

So a one month of assessment over those unstructured repositories will provide a very good fuel and justification to actually expand the actual scanning and sources and categorization, and will buy you more time and more resources from the business side, the after six months, usually those organizations are well positioned to categorize the data and track those categories. Remember classification is not just about the entities or the content. It's a combination of content, the context and time accumulated world data.

And that's where new technologies come in and have the ability to harness compute power available, compute bio to track categories as they change across those repositories and business processes.

The third one would be actually, or the level of maturity would be to fast track smart cloud migration, basically pick up those categories of data that is being used unstructured data and migrate, minimize, and segregate it again, based on those category automatically into the hybrid cloud model, whether it be one cloud or one public cloud or multiple public clouds or a on premise data center that I'm gonna pass on into a short demo that will show you how this is basically comes into much more materialized process with this tool.

You can see here that is basically very flexible in where you deploy that, right? It could be deployed in your cloud as a customer or on premise or in a hybrid mode where it scans multiple sources like SharePoint online or OneDrive or AWS. It doesn't really matter. At the end of the day, these are files that are sitting on disc somewhere. And at the end of the day, you need to provide access to this tool to scan the data and accumulate information about the data continuously as a first stage.

So you see, it will actually scan multiple sources around multiple geo location around the world and will basically aggregate information and recommend as a first step. What kind of duplicates you have, what kind of linguistic topics you have. So we'll be able first to analyze the data, just to give you an example, a duplicate data would be actually analyzed automatically by the product itself saying, Hey, there are some Excel spreadsheets, as we can see here that contain a lot of information that might be interesting.

Or in this other recommendation, we can see a technical proposal for some sort of a, let's say a, a, a tool or software tool. And then the user can basically start and analyze the data by using those recommendation. Remember, the AI will bring you as close as possible to actually visualize the entire risk and value behind the data, or at least understand where it is and what the meaning behind the data. The next stage would be basically to apply or to create some sort of policy over that information. But let's first start with the multiple capabilities that this platform helps you.

There's also the ability to SP be very specific with very specific business critical files. So for instance, I know that this is an agreement or contract that I want to protect. I can take that contract and basically use that contract as a training set to provide to the AI machine and track it. So creating a rule that basically says, Hey, I'm gonna provide you a, an example of a file you're gonna drag and drop that file to the machine learning capability.

It will then analyze this file automatically for you, you know, instead of you creating rules and run them over big data repositories and will be able basically to find very quickly where similar information is located, even in zip folders or simple other formats of the data across your organization. And this is a key capability for a Analyst or for security operations to start and get a grip of the data and categorize it selectively starting from the business critical data. Okay.

So there are two approaches first reduce the data or the duplications that is not necessary and is not being used. And the second would be to start from your crown jewels and basically segregate them and handle them first. Right? And of course you can add some more complex rules, like give me all the files that contain first names or emails, et cetera. I can add geolocation rules to comply with GDPR, gimme all the files that are out of Europe, for instance, in this case. And when I'm gonna execute, I'll have an immediate view of what kind of data is captured by the system.

And if I'm content, then I'm gonna basically track the data or track that logic over unstructured data, Providing that a category, a data segregation, and maybe a labeling mechanism for future integration for physical labeling. Once I'm gonna click save, this will actually turn into a training set for the machine learning capability without touching the data. And for a period of few months, you will be tracking the behavior of unstructured data, accumulating information, fixing business processes, maybe even eliminating redundant archiving process that it left behind that inflates the data.

Once you are ready, then this query will be integrated into security systems that will encrypt the data, or will reduce the data by deleting or archiving any incremental addition to those results of queries. At the end of the day, the approach is to get a group of the data in terms of information, understand how data behaves across our repositories, and then come up with the policies and the actual integration of, of flowing that into the policy enforcement tool.

At the end of the day, we want to overcome those major, those major challenges that mostly our around scale of how many files I have the ability to analyze those files because they are extremely unstructured and there's no good analytics over that for that AI and machine learning will help you as end user, but will not replace end users in taking the decision of what kind of fraud policies we want to trigger and what kind of insights we want or value we want to extract from the data, but it will redefine the skillset that is required from our Analyst.

At the end of the day, they will not need a code writing capabilities, but they will be required to have a very good business acumen on what is the essence of the data and how risky it is to store the data where it is found and what needs to be done and what system needs to be integrated in order to keep it safe. I'm, I'm pretty much done with this kind of presentations. Then I will hand it to you Martin. Thank you very much. Umif so I'll make 80 percenter again, this was very insightful and we have a already couple of questions here, so let's directly Trump into the Q and a session.

And so there, there are several questions which are directly related to your demo, the NF. So, so the first one is how do you add data sources?

What, and what can these sources be structured and structured? So how easy is it for a customer to, to add data sources, how easy it is for a customer to, to enhance the system? What does it needed and which types of data sources are supported? Yes. So from the technological point of that, the system is designed to be scalable and easy to use by a microservices architecture behind the scenes. Adding data source is pretty much defining the username and password to read the files in the data source.

And we currently very focused on unstructured data sources, hence cloud repo, always SMB sources, shelf files, SharePoint exchange, and so on and so forth. The relational databases sources is a totally different beast, right? It's like your access control issues.

There, there is not yet a maturity level to manage policies across all structured and unstructured data. And, and the problem is, is of course identified and will be handled in the future.

Currently, the key thing is to get the grip of what is black box for us, and this is the unstructured data. So in most cases we will add only a structured unstructured data sources like file shelves, SMB, and SharePoint to the tool itself. It takes no more than few minutes to add that and configure that and for the system to digest information and add that as you go, even if you already are working on, on several data storage.

Okay, thank you. The next question is can the tool handle different languages or how does it deal with different languages? Right? So there are two stages.

The, the technology that groups data by and, and identifies duplications and new duplication is totally language and format agnostic. This is one of the patterns that we have over here. We use computer vision to do this analysis. Hence we can handle any language, any format at the end of the day and point the customer on which data has a lot of duplicates and is obsolete for the linguistic analytics.

We are supporting currently English, German, and French as the main languages and going forward, we'll add more languages to analyze topics on top of the data, hence enlarge the customer base for us going forward. Okay. Another question we have here is what is, what is it reading the metadata of or brains about which metadatas are read how they are, they processed, how they are, they handled these metadatas.

Yeah, so, so the tool extract, when it scans the sauces, it will extract multiple data elements. It will extract of course, all the, all the data, all the metadata from the file system itself and store that internally in an elastic search model, it will extract also interesting content that is analyzed by the NLP engine, from the files themselves and selectively store, only personal information and elements that may imply on risk.

And also, so elements that will apply on the topics on top of that, the system has a unique capability to transform the actual file itself to a vector. What, what we call a signal later on to be correlated with similar signals and all those are being stored in the local database of the management system.

That's why the analytics of the data is done regardless of the connection to the sources and without any dependency, and also for every cluster or group of information that is duplicate, you'll get all the relevant information, which is the metadata, like the location and the permissions and the owners of the data, and so on, so forth, and also the content. That's how we basically combine content and context of a groups of information and can apply logic over those pieces of information and track them over time. Okay. Great answer.

Another question here is what is the well you of AI when it comes to unstructured data management and protection in hybrid cloud environments? So why do we need AI or why do we better work with AI than without, Yeah, I think I can do a simple analogy for all of us. I think if, if let's say 20 years ago, we had a 30 years ago where we had the siloed mainframe, you know, where we managed data and it was pretty much siloed and pretty much contained in a specific place. And there was a single language where we can analyze. I think most of us did not even think about AI.

We needed a very defined skill set on how to write queries and analyze the data today. The problem is so different.

I mean, the data is so dispersed across multiple repositories. They are so differently. The protocol of approaching to the data and expect information is different.

I mean, we have dozens of those, the rate of, of, of the growth of data. It, it keeps on, you know, growing in such a way that there's no human being in the world, I guess that can actually correlate and extract information from all those data sources by using all those auto calls at one single language. And of course the skill varies.

There's, there's so much skillset required. This is exactly where AI comes into play. AI is capable of doing all this basic and structured operations in, in order to normalize the information about the data into a single pane of glass, right? When we normalize this information and, and provide that access layer to the information, we actually overcome the problem of the dynamic nature and the dispersed of the data across those platforms and the formats and help the end user analyze and apply the business requirements and queries he wants. Okay. One more question we have here.

So if they have more questions, please end them now, but we have one more question here. There is this, oh, no two questions actually. So the one is how do we overcome the resource problem?

When, when I'd he that guts, I would say it's always tied. It's probably not always for use, but it remains tied. And I think there's another challenge of resource problem. I'd like to add from my perspective, which is how to find the people who understand AI well enough.

It's, it's a very good question. I think the, the answer lies by the design of such a products. Okay.

When, when you use, for instance, minor AI product, you don't need to be a data scientist in order to train the, the system. And it all lies on how we package those solutions to our customers. You need to be a very business process as savvy, right within your organization to come up with those, for instance, files that are being used across this business process.

And, you know, that are critical to your, to your business. Those files are basically can fed into the system and the system will do the rest behind the scenes. The entire infer model and training model is, is basically totally transparent to the end user.

Therefore, the end user does not need to be a data scientist. He can even tune the accuracy. Like I did the, in the presentation in the demo before he can tune the accuracy or the level of, of let's say similarity. He wants to identify without even knowing that he's tuning a model behind the scenes. So I think AI and machine learning is mature enough to be packaged in such a way that there won't be a, a specific skill that is required.

In fact, it'll reduce the skill requirement from the end user, it'll require a much more physical and soft skill requirements from our operators. Okay, good answer. And I think it's very important also to make AI applicable trustable also for the less experienced users. So let's move to one more question we have here. Is it possible to have unified interface, to define data policies or access policies over structural and structured data sources? I think it is possible.

It is, it is very important to have a unified layer at the end of the day, but I think it's, it's also linked to the maturity level of an organization, right? You cannot create this unified layer without having those two feet of structured and unstructured information governance. At the end of the day, when, when you eat on a table, you don't put the plate on if you don't have legs to the table at the, at the end of the day.

But the idea is, yes, we need to have a layer that unifies the policy, not just around access, but also around protection and around management of data and, and a interpretive layer or interpreter that will actually send those as a commands and in such sets for the values AI components that will go discover track, classify, and basically protect or manage the data. Okay. Thank you very much for that answer. And I'm a strong believer in saying we must shift or put more attention on the policies.

Unified them have a common understanding because at the end, the set of policies is more homogeneous than the data we have. And so if we started the homogeneous policies, it might be easier to handle a lot of stuff here. We could Analyst the questions. So thank you very much, Annie, for your insights and the information delivery. Thank you very much to the attendance of this call webinar. I hope to see you soon at one of our upcoming events or one, our, our next webinars. Thank you very much and have a nice day. Bye.

Like this?

Don't like this?

The No. 1 Rule of Secure Cloud Migration: Know Your Unstructured and Dark Data and Where It Is Located