A recipe for PII

PII, Personally Identifiable Information (also phrased as Personal Identity Information) is at the heart of identity security and privacy. Yet, like almost all terms in the Identity sphere, it suffers from multiple overlapping definitions leading to misunderstandings, heated discussions and a distinct lack of clarity.

Major sources of these problems are codified definitions from national laws, standards bodies and government agencies (such as the US National Institute for Standards and Technology).

There’s not much I can do about government definitions, no matter how wrong they are. But I can explain my reasoning for the way in which I discuss PII and which I believe to be a good basis for sound decision making. Too much decision making, unfortunately, is done in the heat of the moment without proper forethought or review. This is especially true – and especially unfortunate – when government entities get involved.

So let’s break down PII into its component parts. First, there’s Information.

Information is data, but not all data is information. As Mike Small and I wrote in From Data Leakage Prevention (DLP) to Information Stewardship: “Data is nothing more than the symbols which are processed by the computer. Data, in itself, has no meaning and no value. Information is data with context or processing that makes it useful…” A Social Security number, a date of birth, a postal code, a given name – each of these is a bit of data but none – on their own – is information. None, on their own, have context.

Take date of birth. Month and day of birth can limit the possible identities to approximately 1/365^th of the world’s population (given an even distribution) – a number approaching 20 million! You can limit this by adding the birth year, but that still would leave over 175,000 people in the “pool” – hardly a good identifier of an individual!

Given name, of course, is easy to see as a non-starter in the identity sweepstakes. How many “Dave”s do you know of? Even those celebrities known by a single name (Beyoncé, Christo, Oscar, Madonna, etc.) are far from unique. Google “Christo” and see how many show up. If I refer to “Beyoncé”, especially in an entertainment context, then most people will infer that it is Beyoncé Knowles, the singer, that I’m talking about. And, in fact, should I wish to talk about a different Beyoncé I would need to include additional data in order for the reference to be considered actual information, but this is the exception not the rule.

For some (and this is the source of much discussion in identity circles) that inference is enough to make that bit of data qualify as PII. For example, if I’m talking to someone at the European Identity Conference and a third person hears me refer to something that “Martin” said – with no further qualification of “Martin” – the inference will most likely be made that I’m speaking about Martin Kuppinger. But suppose I had been talking about my favorite movie lines and wanted to quote Martin Short’s character (Ned Nederlander) in “Three Amigos”. I might say “Then Martin said: ‘Wherever there is injustice, you will find us. Wherever there is suffering, we'll be there!’“ A stretch, maybe, but possible.

So without context, data cannot be considered information. And if it’s not information then it certainly can’t be “identifiable” information. Can there be information (in the sense of identity information) that is not identifiable information? Yes, I believe there can be.

I was an early subscriber to Google’s Gmail service, and was able to grab the address dkearns AT gmail DOT com. There are now many people who have the address dkearnsX AT gmail DOT com where “X” is a one, two, or three digit number (which allows you to infer just how many “dkearns” there are in the world). I know this, because not a week goes by that I don’t receive email intended for one of these other “dkearns” addresses but from which the trailing number has been stripped (either by the user, the sending party, an email server, etc.). I know this because they often begin with a salutation: “Dear Dennis” or “Dear Dierdre” or “Dear Danny”. Sometimes these notes come from friends or family of the intended recipient. Most often they’re from a store or service that the person has signed up for. In none of these cases, though, can I – as the owner of the in-box – be identified as the person the note was intended for. It’s information, but not identifiable information.

The danger there is that the other dkearns will say or do something to bring them to the attention of the security services (maybe Danny Kearns is a well known member of the IRA). The content of an email coupled with the known non-American status of Danny Kearns could place my inbox on the NSA’s watch list. In effect we now have Personally mis-Identifiable Information!

Your identity is made up of a very large, almost uncountable, number of attribute-value pairs (e.g., SURNAME = Dave). Each value is another bit of data. Few, if any, on their own can identify you. If you can collect a number of this data points within a given context, then you might have information. If that information is unique within that context – or “namespace” as we call it – then you have identifiable information. If, then, you can tie that identifiable information to a particular person you would have Personally Identifiable Information.

PII is important. PII needs to be protected. That doesn’t mean every bit of data is equally important nor that every bit of data needs a Herculean effort to secure it. It’s hard enough to keep the real PII private without stretching our resources so thin as to cover all possible attribute-value pairs. What I’m talking about is a sense of priorities, especially for those setting information governance regulations. Leaks, breaches and theft of Information, especially Identifiable Information needs to be addressed in as strong a manner as possible. Leaks, breaches and theft of data and non-identifiable information – not so much. Let’s keep things in perspective.

****************

Coming up on Thursday (10/26/2013) is my webinar “Authorization as a Calculated Risk”. I’ll be joined by:

Brian Spector, CEO CertiVox;
Jamie Cowper, Senior Director of Business Development and Marketing Nok Nok Labs;
Gerry Gebel, President Axiomatics Americas.

I hope you’ll join us also.

Like this?

Don't like this?

A recipe for PII