Webinar Recording

Big Data – Bigger Risks?

Name: Big Data – Bigger Risks?
Uploaded: 2016-10-14T12:00:00+02:00
Duration: 59 min 7 s

Posted on Oct 14, 2016

Big Data technologies were invented to store and rapidly process the vast amount of data available today into useful “Smart” Information. What is common across these technologies is that their initial aims are focused on data processing capabilities rather than security and compliance. One particular concern is the lack of control over identity and access especially in the area of administration. This was fine when the application of the tools was confined to experimental or small scale usage. Now that they are being widely deployed for commercial purposes this is no longer satisfactory.

Big Data technologies need to be managed and secured in the same way as the other components in the IT infrastructure.

Show description

Big Data technologies need to be managed and secured in the same way as the other components in the IT infrastructure.

Webinar Presentation, KuppingerCole

Webinar Presentation, Centrify

Speakers

Barry Scott

Chief Technology Officer
Centrify

Mike Small

Information Security Management Advisor, Fellow Analyst
KuppingerCole

Lead Sponsor

Show Transcript

Well, good afternoon, everyone from England and welcome to this KuppingerCole webinar, big data, bigger risks, which is about securing your smart information infrastructure. I'm Mike Small, and I'm a senior Analyst at KuppingerCole. And the other presenter this afternoon will be Barry Scott, who is the chief technology office officer for Centrify.

So this, this, this webinar is being run by KuppingerCole. And those of you may know us as an industry Analyst. We provide it research. We provide advice to both the users of it and to vendors of it, as well as other services. And our particular focus is around information security in the technology that supports it to enable business businesses, to securely and compliantly run their it services. We run a number of conferences and the next conference is going to be on consumer identity in Paris, which is in France, not in Germany. There will be digital finance world in Frankfurt next March.

And then our annual grand conference is held as always in Munich, which is the European identity in cloud conference. So this webinar, the general rules are that you will be as an attendee. You will be muted automatically by us. You are able to ask questions at any time by using the question widget that should have come up on your screen.

I will answer those questions at the end or, or I will pass them on to Barry for answering that the webinar is being recorded and there will be a recording that will be published to, to all the people who have been invited, and that will be available from tomorrow. So this webinar will take place in two halves. The first half will be by myself, where I will talk about what the challenges are, what big data is and how identity and access management is as important, if not more important with big data than it was with other kinds of data.

And in part two, Barry, Scott will basically describe how centrifies approach to big data security can enable you to mitigate the risks and increase security in your big data environments. So the first question is to understand what we mean by big data.

Well, in fact, first of all, I think this is really a relative term because as you can see from that slide, the amount of data that we were able to run businesses on is much less than the amount of data that we actually have today. And what has been happening is that the technology has changed so that all of this data is being generated by all of these different systems. So you've got organizations that are generating data. You've got devices that are generating data from mobile phones, smart watches, and tablets.

And these are a rich source of data that is being tapped by all kinds of consumer facing organizations. The world is evolving even further in that nearly every device that you can think of either is already enabled or will be enabled to produce data from sensors that are contained in it.

And all of that data can also be exploited now in, in, in, in a sense, what one interesting comment that I heard one interesting explanation of big data was given by an American who basically said the era of big data started when it became cheaper to leave the data that you've already got than it is to actually delete it. Now that's, if you will, a, a comment on the problem of managing all of this data, but the value that comes from it comes from being able to analyze it.

And in fact, this big data provides many new opportunities, but at the same time, it introduces a whole series of risks, which are currently testing the limits of our ability to secure it and to remain compliant. Now, the big data itself is not really very valuable. This is just an enormous stream of nonsense. What makes it valuable is the ability to analyze it. And if you can analyze it, you can then do a number of things which help you to get competitive advantage. And the ideas behind this are not new in fight.

The first book that I can see that wrote about these was, was at least 30 years ago, where there was this chap called Michael Porter, who was a professor at a us university. And he talked about the three ways that data can alter the way business works. And it allows you to generate different kinds of products. And we are already seeing this in, in the kinds of things that, that, that are being created to help, to create and exploit that data.

It is cha it changes the nature of competition that if you can find out and identify your customers and their needs more effectively than your competitors, then that helps you to improve your competitive advantage. It also helps to improve your competitive advantage by enabling you to manage your business in a more effective way. And we see a wide range of examples of the way in which organizations are doing this. And so we have in the UK, a TV channel that is able to analyze audience data in a way that makes it possible to deliver targeted ads during the program.

And it does this based on being able to figure out while the program is on the traffic that it sees on Twitter, Facebook, as well as even would you believe the games that people are playing that I I'm sure we've all heard the story of the us retailer that was realized that if they could identify ladies who were pregnant, then they would be able to target them. Because if you can get a couple that have just given birth to a child and get their business at that point, they are likely to remain very faithful to your supermarket and brand in the UK.

And in much of Europe, there is deployment of smart meters, which is being sold to end users as being a way of cutting their bills. But one of the major advantages is the ability that it gives the electricity and the power and utility companies, the, the ability it gives them to, to cut the costs that would otherwise be involved in having to bolster the electricity supply infrastructure, to deal with the actions they have to take to avoid climate change due to carbon footprint.

And finally, in our own area, we are seeing an increasing use of big data analytics to detect and hopefully to prevent financial fraud, as well as to investigate and to track down criminal actions by cyber criminals, trying to infiltrate our it systems to steal our data and to give in install ransomware and things like this. Now, having said all of that, there, there are also a whole number of information, security risks, which in some ways come from exactly the same sources as they do for conventional infrastructure, but are just even more of a, of a challenge.

That the first thing is that the question of securing the infrastructure, and we've all learned over over time that we have to secure this infrastructure. Otherwise the criminals will get into it, or the, the infrastructure will be used against the organization or even against the, the, the country. Unfortunately, many of the big data technologies were in fact invented with a primary focus on improving their efficiency and their ability to process massive amounts of data in a short period of time.

It's the classic it conundrum that functionality always seems to win in the early stages of a market. And security is always an extra afterthought that's being added on that. Having got the data, there are many ways in which organizations could analyze it. Improperly, are you sure of where the data's coming from the data?

If, if, if a company real, if the criminals realize you are dependent upon your data feeds, then they could substitute alternative data feeds in order to confuse you or to misdirect you. How do you know who is able to access this data in order to analyze it, and who is able to access the results of your analysis, which can be quite important. And then you need to be able to remain compliant. And although we've now got more data in many, many geographies that jurisdictions relating to privacy, and the way in which data can be used, have in fact been tightened up.

So the problem of privacy and security has got worse from a compliance point of view. And yet we now have more data which can be lost, stolen or misused.

So if, if you want to take a specific example of that, the upcoming GDPR, the general data protection regulation, which I is enforce in Europe from May 2nd, 2018, this introduces a series of important requirements on people who use data, which is, and the first of which is consent per purpose. So that's the problem because much of the, the data that is currently being used that is harvested was originally collected for one purpose. And it was kind of in the 15,000 words terms and conditions that you ticked the box for said, it could be used for all of these other things.

Well, the GDPR says you have to have explicit consent for every purpose that you can use, and you cannot actually use it for anything other than what that original purpose was. Then on top of that, you have customers and people who have given their data can subsequently ask for their data to be erased. I E they have a right to be forgotten. And that's another challenge when you have this massive amount of data, knowing where it's got to knowing who might have analyzed it and into what form, and if it is lost in any form, you are going to have a mandate requirement to notify those breaches.

And all of this I is, is going to be put in place where you have to have a kind of active governance regime where you need at any point in time, to be able to prove that you are complying with things. So that certainly introduces a series of new challenges. And these challenges certainly come from the technology, which as is a tradition in the it business was designed with functionality in mind. And there are basically three kinds of technologies which are involved in ingestion, storage and analysis.

And the ingestion is important because here you have the potential for enormous rates of generation of data from things like Twitter and so forth storage. I involves being able to store massive amounts of data that is not necessarily suited to the kind of tabular relational storage.

And this has led to these object storage systems like Amazon S3 to in interesting databases that are designed to have high availability for searching like MongoDB, a a kind of hybrid, which is SAPAna where in fact you have a full acid compliant relational database, but which can be organized in a way that enables massive searching using in memory searching.

And then in terms of the analysis, you have a whole series of technologies that were created in order to analyze this data, using massive parallel computing, not to mention machine learning where the, the, the actual algorithms for the analysis of the data will be learned by the machine, rather than given by the analyzer. Now, in both, in most of these cases, the analysis involves multiple replicated servers, performing the analysis in parallel, and that introduces the problem of administration that your data is going through all of these servers.

And so its presence through them could be compromised by any of them. And yet often the management and the administration and the access control on them is not very well managed. And in particular, this is the case with Hadup.

Now, Hadup is a technology that was originally conceived of to be based on, shall we say, commodity technology like PCs that you didn't otherwise want that are put in a rack and are organized so that each of the data is spread around these PCs and these PCs share the job of analyzing it through a process called map, reduce where the data is. First of all, mapped into twos and then reduced by counting up similar twos. So that leads to a significant problem to do with the administration, the access control, the auditing and the management of the data that's on this system.

So you, you, you, these systems have root administrator accounts, or they have administrator accounts if their windows boxes, but there may, there may be no control over them. There may be no specific access control or, or thought being given to how you are going to audit the data. And we all know that simply deleting files on regular file store does not actually erase the data. And so where these, these systems are being shared, how, how can you be sure that the data that you've analyzed or the results of that analysis have not in fact been left when you finish and all of that is made worse.

If you go and use one of the systems like elastic map reduce, or the other cloud providers that provide Hadoop or spark type services for analyzing that data. So this leads you onto this problem of the access challenges.

Now, there are actually two specific kinds of challenge that need to be addressed. One of which is that some organizations are starting to realize that they have data that they could sell. So how do you enable APIs in your environment that enables you to sell that data in a way, which at the same time protects it so that you can't just have someone who sort of downloads it and then sells it onto other people so that you can get your monies without of it.

And this is particularly the case, for example, whether companies are selling it and other kinds of other kinds of organizations have must lots of data that is in fact potentially valuable. And the second thing is the need to be able to include the new technologies through, for example, being able to integrate the access identity and access management tools and techniques that you have already tried and trusted on your, in, in your environment to integrate these so that you can use the same tools and technologies to, to manage access to your big data now.

So when it comes down to managing these risks, you find that actually, you, you want to do three basic things, going back to where we started from. You need to be able to secure the infrastructure and the big data analysis infrastructure needs to be integrated from an access control, privilege management and API access control in preferably with it, with the same tools, same technologies, and same processes, as you are using for the rest of your infrastructure, you need to be able to secure the way in which the data is analyzed.

And that means governing access to who has access to the data, how that data is used and having an audit trail of what happened to it, where it came from, why you had a right to use it in that way, and that it has not been used or leaked in any other way. And it, it, on top of that, you need to be able to assure compliance by proving. You really do have control of how it is being used, and you can show this through proper auditing. So what we have is we have ha big data can become bad data. If you don't take the right actions to secure the infrastructure and look after it properly.

So in summary, big data is here to stay. It tests the limits of information security and our ability to comply with laws and regulations. And what we need to do is to control access to this, to secure the infrastructure and to audit and provide proof of compliance. So with that, I'm now going to hand over to Barry Barry, Scott, the chief technology officer of Centrify, who will describe what, what, what, what Centrify are doing to help with this. So over to you, Barry.

Great, thanks very much, Mike, and good afternoon to everybody. Hopefully you can see my slides. All right. So as Mike mentioned, my name is Barry Scott, and I'm the me CTO for Centrify. And for those of you not familiar with Centrify, we're a software vendor in the identity and access management space. And specifically what we do is address the areas of identity as a service and privileged identity management, both of which actually relate to the, the big data discussion we're having today.

And what we generally do is protect against the leading point of attack used in data breaches today, which is compromise credentials. So what I'm gonna talk about this afternoon fairly briefly is the challenges that we are hearing. Some of our customers face in their big data deployments, the solutions that Centrify provides to address the challenges, how the big data vendors and Centrify are actually better together. And I'll go through a brief summary and then hand back to Mike for some, for some Q and I at the end.

So big data obviously brings us a huge amount of business value and opportunity as Mike said initially, but by pulling much of our data together to get a coherent and valuable big data solution, we are making this very high value data, an incredibly high risk target, and it's obviously got to be treated as such. So the background to big data is that it was created to give us different ways of looking at existing data sets to give us business advantage, as Mike said, and increase our business agility, but what wasn't initially built into it was security.

So another aspect to big data is that we'll have many different classes of users accessing the data. We'll have developers creating scripts and queries. We'll have it staff managing the infrastructure on which it runs. We'll have data scientists making use of the whole environment, and there'll be applications themselves running on it as well.

Now, very often in today's world, the users themselves don't have a consistent identity across environment such as this. And it leads to a very much heightened risk profile. So as an area of high value data, then also the big data environment will come under scrutiny in terms of audit and compliance. So there'll need to be proof of the access controls provided. As Mike mentioned in his summing up proof of least privilege in terms of showing that only the right people have appropriate access, and they'll finally need to be an audit trial of users, access and activity.

So big data deployments can introduce identity silos and that in turn can lead to increased risk and increased identity management costs. So the solution we're talking about today is that Centrify identity and access management for big data. And the first area that we're gonna talk about is providing active directory based identity and access management to that big data environment. On top of that, we also provide role-based privilege management and access control, which we'll talk about.

We also enable that session auditing to, to address the regulatory and compliance needs that we've both Mike and I have mentioned. And finally, we can easily address very common requirements nowadays, whereby we need to give outsources or external partners secure remote access to systems and data. So let's go down into each of these in a bit more detail. And first off is the active directory based IM for big data. So most of the time through the presentation, I'm gonna be talking about ADU here, but we will touch on no SQL towards the end as well.

Now, as mentioned previously for CAMBI a danger that big data deployments introduce extra identity silos. And that's absolutely the last thing we want nowadays from an identity management perspective, also taking a, had deployment into production. It needs to be put into what's called secure mode, which effectively means enabling it for KBRA. So both of these things together make it a big win to be able to simply integrate our big data clusters into active directory ad is generally the source of truth and, and the main authentication mechanism in most companies.

So using our existing ID identity or, or rather the credentials, our ID credentials to access more resources can only be a good thing. And ID also provides robust Kerberos environment as well. So Centrify has a technology called zones, which are simply ID containers or IUs, and they make it really easy to initially enable our ID users for access to new Linux, big data environments, or in deep windows, and also to centralize and consolidate our existing Linux and Unix users so that they can log into the machines they need to and should have access to using their active directory credentials.

So these zones also allow us to simplify ad integration when you have multiple big data clusters that need to be put into secure mode. So the fact that integrating with ad also enables curb Ross for the Linux machines gives another benefit in that the users can then have single sign on using curb Ross to the Linux platforms using Patty and, and other SSH clients. And of course we can apply various forms of multifactor authentication when accessing those Linux machines in the big data clusters, or indeed when we're actually logging onto windows in the first place.

So the great thing here is that there's no parallel identity infrastructure needed. We use the existing ad environment you have, which will already have redundancy high availability built in, and this lowers risk and also lowers the total cost of ownership. Now kind of, as an aside to this, we can also use a product we have called Centrify privilege service, which is part of our platform for bright glass access to Linux machines. So generally our approach is to want users to log on us themselves and then execute any commands as allowed by their role.

But there's gonna be some special cases where we might need to allow log on as route, for instance, without wanting that user to know the route password, and we'll then regularly have to change the password for security reasons as well. So onto role-based privilege management. The next thing really that we need to achieve is enforcement of this are back on our big data clusters. So we're gonna have different people, as I mentioned, needing to access the cluster nodes and they'll need different levels of access.

So for instance, the, it admins need to manage the infrastructure itself, but don't really have a business need to be able to access the data, the data scientists. On the other hand, they need to be able to log on to run jobs, but they shouldn't be able to edit the system configuration files that maybe the Hudu admin should have access to, or the big data admin.

So again, with zones and another feature we have called computer roles, we centralize this role-based privilege management or our back into active directory and eliminate the use of root privileges for all, but the rare break glass scenarios, I mentioned a moment so we can allow per command privilege elevation, and also when needed provide a white listed restricted shell where that's required. So as mentioned earlier, the privilege service can also be used for those rare situations when someone does actually need to log on as a shared privilege account, such as route.

So in, in the sort of story so far, we've got active directory providing our central identity store and, and single sign onto the big data platform, providing curb Ross. We're also controlling what people are actually allowed to do once they've logged into the cluster nodes, assuming they've got the rights to be able to do so. So to cover the audit and compliance requirement, we provide full session auditing or session recording.

If, if you'd rather call it that when we talk about auditing here, what we mean is the ability to have full session capture, recording all user activity. So imagine that the user has a camera strapped to their head or something, or a camera sitting on their shoulder, recording the sessions. All the sessions are centralized into an audit store where they can then be searched and reviewed by auditors or managers or whatever the requirements might be. So we'll have full accountability that it was Barry that executed a particular command on a particular node.

And when that happened, and we can also look across the entire cluster, indeed the entire environment to see what barrier's been up to over the last days, weeks and months will also have captured the sessions where someone needed to log on as route and use the privilege service that I mentioned earlier to broker that connection now onto the VPN list, access side of things. And I read in a Gartner survey recently that a hundred percent of us outsource some of our it, and that to me, implies that all of us are giving people from outside our firewalls access to our internal systems.

Now hackers have exploited this in a number of major data breaches over the last few years where VPNs have been the initial first point of, of breach into networks. So once inside the hackers move naturally often using valid credentials that they picked up from that external source to hack privileged users. So they can then exfiltrate and monetize the data those users have access to, or, or get up to whatever nefarious things they want to. So what Centrify provides here is this VPN list, secure remote access.

So it can be used not only by outsource developers and admins, but also by your own staff who need access from outside your network to the internal systems. Now, what you can can, hopefully just about seeing the picture on the slide is an outsource developer and an admin, and they've been given access to log on to their customers. Centrify identity service Porwal that you can see in the middle, and we can apply multifactor authentication to that log on as you'd expect. And they're then presented with a list of applications or servers that they're allowed to access.

So when they click on one of these tiles, they're connected via an HTTPS connection directly to the customer's environment and onto the destination application or server via that cloud connector. The Centrify connector that you can see there that actually lives inside the customer's firewalls. And that actually only ever makes outgoing connections, which external users connect through. So there's no need to punch extra holes in your firewall, anything like that. So this gives very simple and secure browser based access for remote users. You'll notice there's no VPN there.

And as I mentioned, it can be used for access to internal applications systems and indeed network devices as well. So in terms of better together, why, why do we say that the Hadoop vendors and Centrify better together might not immediately be clear from everything I've said so far and hopefully the next couple of slides will pull things together. So in terms of access authentication, audit, access management, the had vendors all offer something, they offer different functionality, but at the Hadup level, now they provide integration with L app.

And this is fine for simple ad environments, but it does take a very flat view of an ad structure. Whereas we take account of the typical, we Centrify that is we take account of the typical complexities found in ad multiple domains, cross forest, or domain trusts use of global catalog servers. And so on when we're accessing active directory via that the headache vendors also provide initial scripts to put the cluster into secure mode and Ize the service accounts.

Now we provide scripts to do that too, but also we enable curb Ross for the various users that need access to the environment as well, rather than just those service accounts. And also we give automatic renewal of curb Ross tickets, which is needed for any long lived jobs that might be running in the cluster in terms of authorization. There's a clear line between Centrify and the had vendors.

There are various tools that provide apples inside her do to say that user or group X can access piece of data, why, and also the L app lookups that UC does for ad users and groups, but the complexity of the enterprise ad environments I've already mentioned also applies here to the multiple forest, multiple domain cross forest trusts.

All of those sorts of things also Centrify provides initial role-based access to the operating system itself, starting with whether a particular user can log on or not and privilege in terms of who can execute, what on the machines, once they've got onto it, maybe maybe they can restart her, do brand jobs, edit conflict files or whatever. So Centrify provides the unified identity for the users and, and service accounts to the entire cluster, which the Hadup tools can then make use of in their acing and so on.

So with audit again, there's tools for auditing within Haddo, but Centrify provides the session recording of a user's whole session from log to log out rather than the Hadoop centric auditing at the data level. And finally, as I mentioned earlier, in terms of access management, the centralized zones make it really easy to integrate multiple clusters into the same ad and hence the same curb Ross environment without any risk of different environments treading on each other's toes and getting in each other's way for no SQL vendors in Centrify.

We make it very easy to Ize the clusters by integrating them into ideas I've already been describing. And then GSS P I takes over for, for SSO curb authentication and so on. So where there are LDAP interfaces from these tools as well, we provide an L D proxy. So this makes sense of ad structure to what a relatively dumb L app processes that really need to know about that complex ad structure in LDAP sense. So the use case similar to so had, and Centrify helps secure access at the operating system level.

So just really moving on to summary, the final slide looks at applications, big data, and the operating system platform is layers in the environment. And I hope this will finally make it fully clear what I've been talking about and where Centrify fits in compared to the big data vendors, the tools that they offer and applications. So at the bottom of the slide, you can see we've got the O user access layer where Centrify provides identity from ad to the Linux machines, in terms of credentials from ad IDs, G IDs group memberships to allow users to log in.

Now, then when they run the command line tools, you see mentioned there, the identity in terms of user ID, group memberships, and so on is taken into those tools and used by them. So then moving up into the big data layer, the group memberships and user identities that are have been provided by Centrify from active directory can be used to authenticate and to apply for access controls within those big data tools.

And finally, at the application layer, users can authenticate and access tools using the identity identity, provided them by Centrify and the tools themselves also provide levels of audit and administrative delegation. So what I find interesting here as well is that many people now access big data at the platform level via a terminal or SSH session. And we have that covered as you can see, and as I've explained, but as time goes on, more and more access will be moving up.

The stack and management, for instance, in some of the Hado vendors already happens, management will be via web apps that in turn access the underlying cluster, we're also ideally placed for that transition because as you've seen, we've got the platform level use case covered today, but we also have functionality in our platform to enable SSO to the web applications with our Centrify identity service. I'd a solution.

So in summary, then I've spoken about the challenges that putting a big data project into production can experience I've mentioned along the way that data becomes a high, very, very high risk target also comes into the focus of the auditors, so needs to be addressed. I covered how Centrify provides ad based identity and access management and enables curb Ross for the big data environment, indeed, any Unix in Linux environment, as well as role based access control and privilege management. And we do that for Unix Lu and indeed windows as well.

I spoke about the operating system level auditing capabilities we provide, and then just briefly covered the solution for secure remote access by your outsourcers or remote users, business partners, whatever I showed, what Hadoop vendors provide and where Centrify fits into that and how we add to the security audit and access requirements of a big data project. I rather how we address them. And lastly, I explained how centering the native HAUC tools fit together are different levels in the stack of applications, big data and the underlying operating systems.

So as Mike said, during his present citation, the access needs to be controlled and the infrastructure needs secured. And hopefully with that, that brief description, I've given you an idea of how that can be done with big data. So thanks very much for your time. And I'll now pass back to Mike. Thanks. Okay. Thank you very much indeed for that, for, for that Barry, and thank you very much for your insights into that, the product that you have and the way way you've developed that. Now we now have time for some questions and hopefully some answers from us most.

So if you have a question the best way to ask that question is to use the little widget that should be part of your go-to webinar and type in a question, and I'll try and make sure that we find an answer to that. I, I don't see any questions at the moment.

So I'm, I'm going to ask one or two questions of Barry myself. So the first question is that there are already some kinds of tools that are produced by had vendors that supposedly have our back and secure access enabled, for example, projects like century ranger, NOx, or, and so forth. How would you position Centrify solution against all of this?

Yeah, I think where we fit with the likes of century ranger and NOx, as I mentioned, as, as if you like at the end of that presentation, there's different layers where we've got the operating system level, we've got the big data level, we've got the application level. And it's really about the fact that we are providing that level of security around the operating system, providing the unified identity, which we then pass on into the other tools, which they can then use to be able to, to do their thing.

It's really similar to, to in the past where you have Microsoft SQL server, which is fully into integrated with active directory, but the identity itself comes from active directory. And that's what we provide with her, do the big data scenarios, but then there's still tool, even though it's completely integrated within ad the still tools on the actual sequel level or the big data level in the context of what we're talking about today that make use of that identity and, and credentials that has been passed to it. Things like group membership and all that sort of thing.

So I'd say that's really where we, where we fit Mike, it's the, the, the level, if you like the difference between the operating system, the platform and where that, that morphs over into the, the applications or indeed big data itself. And that of course is always one of the major risk areas, because if you can get into the platform, then that tends to give you access all areas, doesn't it? Absolutely. And I think over the years, you know, there's been a, with a lot of applications, there's been an attitude of, yes, we're, we've got a, maybe a big data project going on.

We need to secure big data, but then you might have people externally running the underlying infrastructure on what big data is running or on what the applications are running. So you really do have to take a, a sort of holistic view of, of the entire cluster, not just the big data piece that's running on that cluster. Yes.

I'm, I'm not sure that I have data on exactly how many organizations are running the, had themselves. And how many have chosen to go with cloud based solutions? Do you have any solution for the cloud Barry? Yeah. In terms of the active directory integration and so on and the enabling then absolutely.

We, we're seeing more and more of our customers implementing infrastructure as a service, be it on Azure or AWS or whatever. It's really important that the whole orchestration of those systems bringing new systems into the cluster obviously would be a classic example. And really what I've spoken about this afternoon, all that applies to that environment too. Yes. Yes. As you say, I think nearly every organization of any size that we come across is, has something going on in the cloud. And so we are working with increasingly hybrid environments.

So going back to the hada area, some people say that the answer to all of this is through enabling LD a so would you like to position what you are doing with active directory against L D? Yeah, I think without, without getting into a deep philosophical debate about L a active directory is basically LDAP and KBRA Ross put together. So when there is integration that a lot of open source type stuff does using LDAP using active directory. But as I mentioned during the presentation, that's always looking on LDAP as being a very flat structure.

Whereas active directory is quite complex in terms of the multiple domains, multiple forests, how to use a global catalog server and all of those things. So absolutely use use L app to use LDAP and use active directory as the L app store, if you like, but to give the intelligence to handle all of that, you know, the cross forest trust, et cetera, et cetera, you really do do need something like Centrify, which makes sense of that to the, to the systems that are just treating it like a simple L app store. Yes.

That, that's a, a very interesting point because for someone who has lived through all these evolutions, I find it fascinating that LDAP was really a poor man's version of X 500, which had all of the things that you, you mentioned in it. And eventually it got the 500 didn't sort of seem to gain much, much traction. And people went for these simplified directories, but now probably the most widely used environment is of course active directory.

And it would seem to me to be an enormous benefit since nearly every organization that you come across has active directory as its main means for authenticating users and for managing users to integrate with that would seem to me to be the biggest, biggest advantage. Yeah. And as I mentioned, you know, the, we don't want to be creating new identity stores. We don't want to be popping up a new LD app directory for every cluster that we build, because we're just, you know, making a, making the situation worse. If we do that, we've got a very robust in most of our environments.

We've got a very robust disaster recovery, high availability, active directory provides curb Ross provide out, provides L D a. We want to make our big data secure, putting it into secure mode requires curb Ross use what you've got, which is active directory.

Yes, indeed. And I think what you've just said is a message that applies on a much wider scale than, than that, that we're still seeing organizations that are living with the pain they have for, for the fact that each application built its own user store and privilege system and the sort of managing to help them to cater for that. And to stop that from happening again, in the future, as all these new technologies arrive, preventing them from all, having their own answer to the agile problem. So you mentioned gone, yes. You were going to say something.

No, I was just gonna say that in terms of applications, I think it's slightly outside the scope of what we're talking about today, but we are seeing, you know, Sam come around for the second time on Federation and such like, yeah. So it it's a slightly different world, 12 years later, but the single sign on thing, I think it's being addressed better than it was first time around. And we need to be providing identity as a service solutions to, to address that. So we don't end up with the same proliferation of, of directories that we had.

What 10, 12 years ago when the regional identity management boom was. Yes. So another, another thing that I have to ask you is that many of the vendors are actually curb or providing curb for their clusters via or wizard.

Why, why can't users just use that? Why is it better to use the Centrify stuff?

Well, that can be used. We've got integration guides. I didn't mention during the presentation, but we're partnered with map Cloudera Horton works, couch based data, stacks, MongoDB, et cetera. So there's integration guides on the website for all of them. And basically the, the rising of the cluster that they do is for the service accounts.

Some have more, some have less service accounts, but what we are adding there as well is the ability for the users to then be able to log on via curb Ross so that the data scientist logs onto his windows machine in the morning, double clicks on a node in pat and ends up on the cluster with single sign on having Keber Ross from, from active directory to achieve that. But also of course, you don't just want Ize everything and allow access everywhere.

So we, it's very important as well, that as well as enabling the users for curb on those boxes, we make sure that they can only access the boxes they should be able to, and also execute only the things they should be able to on those boxes as well. So this comes back to where we were at at the beginning of the question section about the platform being very important and should be looked on as part of the whole solution. Yes.

Now it it's interesting because time and again, we see in different areas that the driving force behind the acquisition of these new technologies comes from outside of the it department. So you get the marketing department or some kind of line of business that wants to use these things, and they tend to go off and either acquire or rent or buy these stuff without necessarily thinking about the, the problems that we've been discussing now, where, where do you see the demand coming in organizations for your, your, your technology?

Maybe you have a, a customer story that you could anonymize and tell us about When, when talking about big data specifically, rather than the sort of classic shadow it, that you were beginning to describe there, what happens. And I can think of a couple of customers where this is taking place is that somebody somewhere thinks big data would be a good idea. So they do whatever they want to do with their big data project. And then there comes a point at which business value can be seen, et cetera, etcetera, everything's looking great.

And then everything grinds to a halt because it's gotta go into production and production. The people behind production say, hang on. That is the crown jewels. You're putting all this massively important high risk data together. And you haven't thought about how to secure it. You thought about what business value it offers you, but not what business impact breaches for instance could cause. So that's the point we are, we are beginning to get involved earlier, as people become more aware if you like of having to secure the whole platform, rather than just the big data on top of it.

But as I say, it it's, it has slowed down a number of big data projects. I can think of that when they go to production, the it security guys who obviously have to get wind of it at some point effectively, put a stop to it. And that's the point at which they say, right? How are we gonna secure this? How do we secure things generally? What is our preferred method across the organization for a using B having a single source of identity? It's Microsoft active directory. So use this.

So we've got, I think we've had about 55 or 60 customers over the last year, 18 months that have of hit that kind of place and ended up using us for it. But on the other hand, we've got other customers who have sort of digressing slightly on the question, but we've got other customers who have integrated their Unix and Linux machines with ad over the years. And it's almost, there's an aha moment when they think not only are they logging in with the same username, but they've also got, so GSS API starts to work. So you can, you know, you have single sign on to all the boxes.

You can integrate couch based data, stacks, MongoDB, all of these things. And that's the real, you know, the real benefit. It's nice to have unified identity, but the single sign on and such like that can be brought as well. In addition to securing the platform from the point of view of privilege and least access, those are the really sort of powerful things that really, really the customers find useful. Thank you. Now we actually have got a question and this is a question definitely for you, Barry.

It says how specific can the Centrify access controls, privilege management be EG specific users with role X and data? Absolutely.

Yeah, it can be as really as specific as you, like, if you start from the top and work downwards, you might say that everybody in group X can access all of the nodes in cluster Y and allow them to log onto all those boxes with their own active directory identity. You might allow them to use single sign on. You might not, you might want them to have multifactor authentication when they log in. You might not that that can be sort of drafted in as part of that role of whether the right of whether you are allowed access to that machine.

Then you might have people that want to be able to edit a particular file. So you could give the rights to execute the command to access that particular file, or alternatively, you could have a restricted shell as well. So you can really tightly limit what people can do.

But on, in terms of the granularity, what we find is that people, different customers have different attitudes. Whereas some people might give a role 700 or a thousand different particular rights and have rights at a very granular level and not worry so much about auditing. We have other customers who audit absolutely everything and don't pay so much attention to the rights because you know, people behave better when they're being audited.

So the answer to the question is you can go down pretty deep, obviously, as I mentioned, there's also the, once you get into heroo itself, there's the Les et cetera that can determine who's allowed to do what, where, but we provide a, a unified U I D G I D name space that you can take into those, those areas so that the big data had do whatever apps can then use that unified identity throughout as a consistent method of file access control, and so on. So hopefully that answered the question, Mike. Yes. Thank you very much.

Well, I think we're coming up to, to the end in time of this webinar. So have you got any final words you'd like to say before we finish Barry? Not really Mike, thanks.

I think, you know, I've pretty much said it all and repeated quite a bit of it during the presentation, but thanks for everybody's time and thanks to you for presenting. Okay.

Well, thank you very much, Barry. You've told us how we can answer some of the major challenges that I set at the beginning of this, which is how we can control access to the analysis of this big data, how we can secure the infrastructure, which is used to process this big data and how we can audit and hence ensure compliance with things. So with that, I'll say thank you very much.

Barry, Scott, chief technology officer for Centrify and thank you very much for all the audience for participating in today's webinar. Thank you very much and have, have a good afternoon. Thank you. Bye-bye Thank you. Bye.

Like this?

Don't like this?

Big Data – Bigger Risks?