Data Sprawl appears to me to be one of the biggest challenges in information security. And, by the way, Data Sprawl is not an issue that is specific to Cloud Computing. It is a problem organizations are facing day by day.

What happens when data is extracted from a SAP system? One example: a CSV (flat) file is created with some data from the HR system. This file is delivered to another system, in best case using some secure file transfer. But what happens then? That other systems processes the file in some way or another. It might export some or all of the data, which then ends up in yet another system. And so on...

The point is: Once data leaves a system, data is out of control.

The problem is that this might happen not only with one CSV file but with 100's of them. And dozens of systems exporting and importing that data. Governance is difficult to implement. You can define a process for allowing exports. You might defined even rules for the use of exported data. You might review the exports regularly - are they still needed? However, reviewing what happens with the data at the target systems (are the rules enforced?) is pretty complex. But there is, up to now, no technical solution to solve that problem.

Things become even worse with Data Warehouse and Business Analytics. Data frequently ends up in large data stores and is analyzed. That means that data is combined, sometimes exported again, and so on. How do you keep control? Implementing Access and Data Governance for Business Analytics systems is a big challenge, and auditors frequently identify severe risks in that area - which is no surprise at all.

Another scenario is PII in the Internet. If we give some PII to some provider for some reason, how could we ensure that he doesn't give that PII away? No way, I'd say. We might use special eMail addresses or faked information to track back some abuse of PII, but that's not really a solution.

So what to do? Short term, it is about implementing processes which at least try to minimize Data Sprawl and the associated risk, like mentioned above. These processes and policies are far from perfect. That helps internally, but not for PII.

We might use (very) long-term technical solutions like homomorphic encryption and other technologies which are developed around the "minimal disclosure" approaches to address some of the issues. We then might use an approach like Information Rights Management which works not no a document basis but on an attribute basis. But most of these things will help us sometimes in the future, if ever.

But what about defining a policy standard which is sticky to the data? A standard which describes how data could be used? If systems support this standard, they could enforce it. That would be about having such a standard and allowing exports at least of sensitive data only to systems which support the standard and enforce the policies. If data is split up, the policy has to be sticky to all parts (as long as it applies to all parts). If data is combined, policies have to be combined - the intersection of the policies applies then.

Such an approach has limitations, because it will first of all need some people to define the standard. And, like with all standards, it is about the critical mass. On the other hand: Virtually every organization has the problem of Data Sprawl and lacks a valid answer to the questions which are asked in the context of Data and Access Governance. Thus, there is a real need for such a standard. From my perspective, the large vendors in the markets of Business Applications (e.g. ERP, CRM, and related systems), of Business Analytics, and of all the ETL and EAI applications are the ones who should work on such a standard, because they are the ones who have to support it in their systems. And they should start quickly, because their customers are increasingly under pressure from the auditors.