Recently, at a press briefing by German IBM boss Stefan Jetter who waxed enthusiastic about Cloud Computing, an elderly journalist rose and asked him a show-stopper: “Where are my data when they’re out there in the Cloud?” Jetter did a double take, but my colleague pressed on: “I mean, physically, where are they?”
Of course, the answer is: On some nameless server somewhere, anywhere in a grid farm in Ohio or Dublin or… In fact, the usual answer is : Who cares?
Well, for one the German privacy protection agencies. Passing data across national boundaries can be a federal offense not only here. The EU Data Protection Directive (officially Directive 95/46/EC on the protection of individuals with regard to the processing of personal data and on the free movement of such data) mandates that personal data may only be transferred to third countries if that country provides an adequate level of protection – something the U.S., just to name one, does not, at least not according to European standards, especially since foreigners do not benefit from the US Privacy Act of 1974.
Martin Buhr, the European head of Amazon's Web Services (@tallmartin on Twitter) and the champion of Amazon’s Elastic Compute Cloud (EC2), with whom I shared a recent panel on Cloud Computing, has a pragmatic solution to the question of where to store data in the Cloud and whether or not location matters. Amazon operates separate Cloud Computing centers in the States and in Ireland, so problem solved. Or is it?
Operating what are essentially two Clouds (called “Availability Zones”), each running on its own physically distinct, independent infrastructure, makes sense from a data center perspective. Common points of failures like generators and cooling equipment are not shared across AZs. This sounds similar to the common practice of data center redundancy, but normally this is done to ensure operational security. Data are mirrored back and forth constantly so if one center goes down, the other can pick up immediately. But in this case, at least theoretically, there is no redundancy since these are essentially two separate systems.
Only, of course, they aren’t. So Amazon has added a system whereby EC2 assigns regional IP addresses to its customers, so presumably it is easy to determine which data can travel across the Atlantic and which can’t. I don’t want to get into a long discussion about IP spoofing and similar technologies developed to foil state-run censorship systems like the Great Firewall of China, but you get the general idea. Okay, they use IPv4, but Version 4 addresses are a scarce resource. And yes, they claim they have compliance options that will make hosting data in the Cloud both safe and legal.
Maybe I’m cynical, but I’ve been around too long and heard too many tales of supposedly fail-safe systems being compromised by whiz-kids or Russian Mafiosi to really believe that quick fixes on the infrastructure level will hold out forever. I would prefer to see Amazon and others in the Cloud community discussing user-centric identity-based approaches to the problem instead of essentially saying: “Trust us” I’m pretty sure my elderly colleague won’t. He’d like to be able to check out for himself exactly where somebody put his data.
PS: Maybe we'll hear more on this at EIC 09 which starts tomorrow in Munich. If you're interested, stop by my panel on "(User Centric) Identity in the Cloud" which is scheduled for 2 pm on Tuesday.