Event Recording

Preserving Privacy in Identity-Aware Customer Applications

Name: Preserving Privacy in Identity-Aware Customer Applications
Uploaded: 2022-05-12T12:00:00+02:00
Duration: 18 min 19 s

Posted on May 12, 2022

As customer identity programs mature, they bring new opportunities and risks. In the rush to launch new customer experiences, personal data is over-exposed and over-replicated. The default is to ship all identity attributes, to all systems, on every request in order to make access decisioning easier for application developers.

This approach disperses identity information across the application stack; which increases risks of data breach, data loss, and compromised identities. As a result, consumers lose trust and new business opportunities falter; or worse, customers like the new experience, but its success creates security and compliance liabilities that expand exponentially. To remediate the risk, data teams enter a never-ending cycle of costly data analysis and audits.

Identity architects and developers need to address privacy requirements earlier - not in post-collection data management, but instead in the application development process. While Privacy by Design and Privacy by Default principles are a helpful framework, they offer little practical guidance for developers to actually build privacy-preserving applications.

We will discuss how to use identity data at run-time, in the context of the application; how to retrofit existing applications with privacy requirements; and how to easily evolve applications over time.

Show description

Speaker

Mayur Upadhyaya

Co-founder & CEO
Contxt

Show Transcript

So thank you for joining this session to discuss privacy, preserving science applications. And I think, you know, a big theme for much of the sessions over the last couple days has been sort of the, the relationship that density plays in both privacy and security.

And so, without knowing who's on the other end of the connection, we can't create any sort of access control. So identity is critical to security without, without identity. There's no way to give the end user or the consumer, any source of agency over their preference, their control and how their data is used.

And, you know, I'm not talking just about the cookie based consent banners that have played our web experiences for the last four years, but actual preference actually, how are you using my data? How is that gonna be pre-processed? And what purpose is that for? And whilst we're at the very infancy of, you know, privacy on this journey, it can't be done without any sort of, you know, you need density there to, to form that connection between the end user and their preferences.

And if we think about sort of the different ways that privacy sort of aligns to your technical stack, the sort of very three paradigms that have sort of come to play. The first is privacy is confidentiality, which is a very draconian sort of policy around sort of your most important assets. And typically this is gonna be how you control access to your databases or your data lakes. It's very sort of, it's very technically led, you know, it's gonna be around, you know, how do you get access to a data space if you think about then the next layer down.

And this is where I think many of us are here the last couple days, it's around privacy as control. And really this is where there is that relationship between the consumer and the data controller or, and that is usually facilitated by an identity provider. And this is a relationship of trust and it's a value exchange.

So, you know, in that, you know, as a consumer, if I make a choice to disclose some information to you, there is it's for some value, right? There's a reason why I'm giving you information that could be credentials to gain access. It could be to get functionality from your application, or it could be to get a service such as eCommerce. Now what's super interesting about sort of privacy as control is there is this expectation from the consumer that the data controller or the identity provider has processes in place to guarantee the non-abuse of that data.

And now we have protection, there's legal frameworks, such as CCPA or GDPR to provide some coverage for the, the consumer, but it's still very much trust based. And the challenge that we have is that when trust is eroded is almost impossible to get back again. And this sort of takes us to the third and more emerging and more nuanced sort of concept in privacy and privacy controls and that's privacy as a practice or contextual privacy. And this is very nuanced it's non-binary. So if you think about consents today, it is binary.

It is, you know, it's one size fits all for most consent for pro sub processing, other than what, again, what we see around ad tracking, but here is where it's, as we think about the next few years, we think about sort of privacy preserving applications. This is the opportunity for application developers to create some level of interaction or feedback loop between their request for consent and their processing processing of information.

So for example, you know, rather than asking front for permission to process location for personalization, you could ask at the point where location has value to the experience. So it could be in your app, it's on your mobile phone. It could ask for your location to personalize your experience rather than asking for bundle consent upfront.

And if we can move to a level of contextual privacy, we can start to sort of change that trust relationship with the consumer, but this is a target stake and not a state and not a current state, because actually, if we think about what we have today in our application stacks, so we have, we have our application that could be a web service, a microservice. It could be a mobile app now, already that mobile app. Let's just say, for example, it's behind a, an intelligent content delivery network. It already has information such as your IP address. It already has information such as your location.

It already knows what your language preferences is because it's there in the header and potentially it has a risk score and a fraud store score. And, you know, that's all just there by default at the application level, just by you being on the end of the connection. Now that application can go to the database.

Now, the APIs that it hits database are set with machine to machine controls. It can go get data from the database to populate the content management system, and now that's their selling in the application in its states. And at that point, it can go to the density store and get profile information around the end user.

Now, what's very interesting about this scenario, which is typical in all applications is that the application is now outside of any form of governance. That application has got data from the database, from the session and from the density store. And this now creates this new vulnerability or perimeter around application development is where does the governance of making sure there's no co-mingling of that data that we don't take the session data and push that back to the database. We don't take other data and push it back to the profile tool, but it is all there right now in the application.

And we think about how that application can then integrate with propensity vendors to help with targeting or advertising. How do we know that the information that's stored in these secure places, such as an density store or a secure database, hasn't been shipped to another vendor and just to put this into sort of, you know, an easier example to explain, rather than it being so abstract. If we just imagine this is a standard eCommerce website, it could be selling content, it could be selling travel, it could be selling devices, it could be selling alcohol, et cetera.

Most of them have a very similar paradigm. And in this scenario, I've shopped at this site before. So I have a registered profile, I have a registered credentials, I have a purchase history. And as a result, my information has been sent to many downstream systems. And here there's concept management system, there is a CRM, there are other Downing systems, there's an email provider, and then there's a ratings in review provider. All right. And so this is just by making that first process. Now we think about how that data moves and for disclosure I'm X, January.

So I have a bias to O and so apologies. I know other credential formats exist, but just in this example, a number of things have happened. So the content management system that has painted the screen has got a microservice for the navigation. The navigation microservice has gone to check to see if I have a logged in session. If I have a logged in session, it has then gone to find out my name, to paint the hello Mau at the top there. And hopefully it's done there for a, for an OPA token that it's gone and verified that the issuer of the token is the same as expected now, potentially not.

Now it has my name and that's like maybe 10 calls just to, to do that, that top navigation. Then we are now painting the screens around this content page, and we've gone to the database. We've got information around my purchase history and potentially we've checked to see if I've given consent for my information to be used for recommendations, probably not. But you know, ultimately there is gonna be recommendations engine there. There's going to be an AI behind that. And that's interesting because we don't know what information has been sensed there.

AI, now you send my first name to the AI. You might be able to infer my ethnicity is not Caucasian. You probably may or may not be able to inform my gender. Depends if you've got a good database of names that is behind your training model, you know, with the location, you'll be able to tell that I'm European. And so there's a lot of interesting information that's already gone into their AI if we haven't minimized it on the way out, but there's no nothing stopping the application, sending that data to the AI to paint the recommendations.

And finally, let's just say there's recommended products. And one of those recommended products is an adult product such as alcohol. The question there is, did it send my full date of birth or did it de-identify it and make it a random dates over 2003. And so this is where I think, you know, the first step was sort of privacy by design comes into place. And we take that date of birth example, did my dates of birth get shipped to downstream systems that are outside of governance or did the application or the APIs de-identify that information on the fly?

And in most cases, I think my days of be is in a lot of places. And I know this because there's some random websites that I signed up to a decade ago that send me an email once a year. And you know, so I know that my days of be shipped around then the next question is these microservices that are hitting identity endpoints, first of all, are the APIs identity signed, you know, is, you know, are they signed by no of to, or is there some master direct access machines machine control between the developer's API and the entity store?

Cause if there is, I mean, there's obviously some very high profile recent cases. I think there was a, a very well known president has a, a Peloton, they had a breach. There's some very interesting data about, about Joe Biden and, you know, he's resting heart state, et cetera. But the question is, you know, if that was scoped access, could you traverse that in a breach? And then the question is, and this is the ultimate risk I think for all of us is have we co-mingled API, sorry, a co-mingled PII.

I think this is the, the biggest risk, because if you have, there's an opportunity for the developer just to write logs, right? And so now you've got shadow logs of PII that are outside your governance, or you are shipping that data to Salesforce or MuleSoft without de-identifying it. And they've got shadow logs or shadow DBS with their sensitive data in inside the application. And then again, you know, if there's anything that's been used for processing or for personalization has consent been considered. And I think this becomes like, you know, the actual state of where we are today.

You know, if we think about the CI CD pipeline, you know, we know today that everybody has technical debt, you know, everybody has built applications, identity where applications for the last decade. And what we can't just do is rely on privacy, shifting left.

I mean, security is shifting left and privacy is shifting left and that is becoming more and more viable for developers to take more and more responsibility for the privacy challenge. But we can't just rely on that, right? We have to bake in privacy controls into our pipeline design. And so that could be very simple as, as you release new code, is there any checks and balance to make sure that you haven't released a pervasive API, you haven't changed the scope, you haven't introduced new attributes.

What is that check as you push from source controls alive that you haven't made the situation worse? That's perhaps that first one in the one in grade, but ultimately what we can do is start to think about the two concerns. So new applications and existing applications, and we have new applications. There is an opportunity to start to change our culture around privacy and to bake it into our software development life cycle. And that's a great thing initiative to start with today. There are controls we can put in place there's modern architecture we can leverage.

And there's a lot of well publicized mistakes that we can learn from on the brownfield side, which is where most people are. We have massive amounts of investment that's out there. We can't wait for the technical debts remediated. We've got to find ways to start monitoring the level of the situation today. And how can we start to find ways to inject privacy controls into the existing state?

Because for many of us, the original developers of our core applications or the subcontractors or systems integrators are no longer available, the people who built this stuff, can't come back and patch it, right? So it's not just technical debt, it's knowledge debt as well. And so we gotta start to think about, okay, pragmatically, how can we start to fix this in post? I'm using a post-production term, you know, so, you know, you're filming, you don't get the shot you want. And so you can, you know, bit of green screen, you've got CGI a bit of post-production, you can fix it there.

So we need to find some bandaids to, to get, get a step farther here. And so, you know, very briefly we can start to think about privacy operations and identity operations, and how do we get that to the pipeline into the C IDC pipeline? And there's a couple of things we can think about. One is first of all, the API security to make sure that we're considering scope to access and enforcing scope to access that we are verifying credentials and doing token validation. But I think the most important thing as we go through this journey is the concept of attribute based access control.

If you build an endpoint, you build a microservice and that microservice is leveraged by multiple applications is very unlikely that all applications need exactly the same data. And so how do we start to monitor that and start to restrict the applications to more of this is what they need based on the purpose, as opposed to this is what they always get. And I think there are definitely tools out there.

And so, you know, we can look at things such as identity gateways and API gateways, and there are a number of vendors upstairs who can support on that journey. And if they can start, if you can start to offload the token verification, that's perhaps the first step, because then you're starting to see the traffic. And that's a great place to start to observe the privacy telemetry.

And if you can then use that to add that telemetry to your existing security operations or dev tech ops, you know, there is, if you can set that privacy telemetry to your, to your SIM or skim, then you can start to get the same visibility on privacy anomalies as you do for security anomalies. And yes, very pragmatically.

You know, if you start to bake in privacy into your application security posture and consider just pervasive attributes in the same way as security, that's an easy start, right? But then if you can start to see privacy in your DevOps that sort of starts to get you there, that's a sort of good level. One level two is there's some really amazing innovation as privacy as code.

And it's, there's a lot of shared open source things around dot notation and taxonomies that you can bake in to give a common understanding that if you do see something that isn't expected that your DevOps team can can say, Hey, that's not expected rollback that, that code. And then, you know, ultimately we once get to his privacy policy enforcement, but that seems to be a little bit further than we are today.

And, you know, most of that is still done on a paper based basis. And this is where, you know, look, this is a journey there's a target state, you know? And so we've gotta be honest where we are a lot of us, the very beginning of our privacy by design journey. And you know, that first step is just baking it into the design process.

I mean, that's why it is privacy by design, right? That we are considering sort of that data organization as we are designing apps and designing things that become more consumer-centric rather than organizational centric, you don't create risk because you're under resourced and getting visibility over how data's being shipped around, just stack. That seems to be like that first step, you know, the ho step as we start to then send that to telemetry across your APIs and identity providers, then you can make it actionable and that's is step two.

And then step three is okay, embracing data minimization. And if you can do that, I think that's a much better target site than where we are today. And so just to leave with some takeaways from today, I would say the first thing for anyone with APIs and identity providers is to adopt an identity gateway or an API gateway because you'll get visibility over the requests and responses to those stores and how has been moved. And all the cloud service providers have a native one, and there are a lot of point solutions out there as well.

Leave here with that cultural shift back into the organization to privacy, by design, to start thinking about later minimization, et cetera. And then you can start to plan how you tackle your existing systems as well as Greenfield. And I think if we get there, that's a great step, but until then, you know, the last takeaways, if you can't protect it, don't collect it. And so that's real all the content I had to prepared for today's session. Like thank you.

I'm may part the I from context and we're on a journey to help sort of shine a lights on data and movement and data in motion and data and use and where it's been overused.

Like this?

Don't like this?

Preserving Privacy in Identity-Aware Customer Applications