Linkage – Page 5 – Kim Cameron's Identity Weblog

“I just did it because Skyhook did it”

I received a helpful and informed comment by Michael Hanson at Mozilla Labs on the Street View MAC Address issue:

I just wanted to chip in and say that the practice of wardriving to create a SSID/MAC geolocation database is hardly unique to Google.

The practice was invented by Skyhook Wireless], formerly Quarterscope. The iPhone, pre-GPS, integrated the technology to power the Maps application. There was some discussion of how this technology would work back in 2008, but it didn't really break out beyond the community of tech developers. I'm not sure what the connection between Google and Skyhook is today, but I do know that Android can use the Skyhook database.

Your employer recently signed a deal with Navizon, a company that employs crowdsourcing to construct a database of WiFi endpoints.

Anyway – I don't mean to necessarily weigh in on the question of the legality or ethics of this approach, as I'm not quite sure how I feel about it yet myself. The alternative to a decentralized anonymous geolocation system is one based on a) GPS, which requires the generosity of a space-going sovereign to maintain the satellites and has trouble in dense urban areas, or b) the cell towers, which are inefficient and are used to collect our phones’ locations. There's a recent paper by Constandache (et al) at Duke that addresses the question of whether it can be done with just inertial reckoning… but it's a tricky problem.

Thanks for the post.

The scale of the “wardriving” [can you beieve the name?] boggles my mind, and the fact that this has gone on for so long without attracting public attention is a little incredible. But in spite of the scale, I don't think the argument that it's OK to do something because other people have already done it will hold much water with regulators or the thinking public In fact it all sounds a bit like a teenager trying to avoid his detention because he was “just doing what Johnny did.”

As Michael say, one can argue that there are benefits to drive-by device identity theft. In fact, one can argue that there would be benefits to appropriating and reselling all kinds of private information and property. But in most cases we hold ourselves back, and find other, socially acceptable ways of achieving the same benefits. We should do the same here.

Are these databases decentralized and anonymous?

As hard as I try, I don't see how one can say the databases are decentralized and anonymous. For starters, they are highly centralized, allowing monetized lookup of any MAC address in the world. Secondly, they are not anonymous – the databases contain the identity information of our personal devices as well as their exact locations in molecular space. It is strange to me that personal information can just be “declared to be public” by those who will benefit from that in their businesses.

Do these databases protect our privacy in some way?

No – they erode it more than before. Why?

Location information has long been available to our telephone operators, since they use cell-tower triangulation. This conforms to the Law of Justifiable Parties – they need to know where we are (though not to remember it) to provide us with our phone service.

But now yet another party has insinuated itself into the mobile location equation: the MAC database operator – be it Google, Skyhook or Navizon.

If you carry a cell phone that uses one of these databases – and maybe you already do – your phone queries the database for the locations of MAC addresses it detects. This means means that in additon to your phone company, a database company is constantly being informed about your exact location. From what Michael says it seems the cell phone vendor might additionally get in the middle of this location reporting – all parties who have no business being part of the location transaction unless you specifically opt to include them.

Exactly what MAC addresses does your phone collect and submit to the database for location analysis? Clearly, it might be all the MAC addresses detected in its vicinity, including those of other phones and devices… You would then be revealing not only your own location information, but that of your friends, colleagues, and even of complete strangers who happen to be passing by – even if they have their location features turned off!

Having broken into our home device-space to take our network identifiers without our consent, these database operators are thus able to turn themselves into intelligence services that know not only the locations of people who have opted into their system, but of people who have opted out. I predict that this situation will not be allowed to stand.

Are there any controls on this, on what WiFi sniffing outfits can do with their information, and on how they relate it to other information collected on us, on who they sell it to?

I don't know anything about Navizon or the way it uses crowdsourcing, but I am no happier with the idea that crowds are – probably without their knowledge – eavesdropping on my network to the benefit of some technology outfit. Do people know how they are being used to scavenge private network identifiers – and potentially even the device identifiers of their friends and colleagues?

Sadly, it seems we might now have a competitive environment in which all the cell phone makers will want to employ these databases. The question for me is one of whether, as these issues come to the attention of the general public and its representatives, a technology breaking two Laws of Identity will actually survive without major reworking. My prediction is that it will not.

Reaping private identifiers is a mistake that, uncorrected, will haunt us as we move into the age of the smart home and the smart grid. Sooner or later society will nix it as acceptable behavior. Technologists will save a lot of trouble if we make our mobile location systems conform with reasonable expectations of privacy and security starting now.

Misuse of network identifiers was done on purpose

Ben Adida has a list of achievements as long as my arm – many of which are related to privacy and security. His latest post concerns what he calls, “privacy advocacy theater… a problem that my friends and colleagues are guilty of, and I’m sure I’m guilty of it at times, too. Privacy Advocacy Theater is the act of extreme criticism for an accidental data breach rather than a systemic privacy design flaw. Example: if you’re up in arms over the Google Street View privacy “fiasco” of the last few days, you’re guilty of Privacy Advocacy Theater.”

Ben then proceeds take me to task for this piece:

I also have to be harsh with people I respect deeply, like Kim Cameron who says that Google broke two of his very nicely crafted Laws of Identity. Come on, Kim, this was accidental data collection by code that the Google Street View folks didn’t even realize was running. (I’m giving them the benefit of the doubt. If they are lying, that’s a different problem, but no one’s claiming they’re lying, as far as I know.) The Laws of Identity apply predominantly to the systems that individuals choose to use to manage their data. If anyone is breaking the Laws of Identity, it’s the WiFi access points that don’t actively nudge users towards encrypting their WiFi network.

But let's hold on a minute. My argument wasn't about the payload data that was collected accidently. It was about the device identification data that was collected on purpose. As Google's Alan Eustace put it:

We said that while Google did collect publicly broadcast SSID information (the WiFi network name) and MAC addresses (the unique number given to a device like a WiFi router) using Street View cars, we did not collect payload data (information sent over the network). But it’s now clear that we have been mistakenly collecting samples of payload data…

Device identifiers were collected on purpose

SSID and MAC addresses are the identifiers of your devices. They are transmitted as part of the WiFi traffic just like the payload data is. And they are not “publically broadcast” any more than the payload data is.

Yet Google consciously decided to abscond with, tabulate and monetize the identities of our personal, business and home devices. The identifiers are persistent and last for the lifetime of the devices. Their collection, cataloging and use is, in my view, more dangerous than the payload data that was collected. Why? The payload data, though deeply personal, is transient and represents a single instant. The identifiers are persistent, and the Street View WiFi plan was to use them for years.

Let's be clear: Identity has as much to do with devices, software, services and organizations as with individuals. And equally important, identity is about the relationships between these things. In fact identity can only be adequately expressed through the relationships (some call it context).

When Google says, “MAC addresses are a simple hardware ID assigned by the manufacturer” and “We cannot identify an individual” using those “simple hardware IDs”, it sounds like the devices found in your home and briefcase and pocket have nothing to do with you as a flesh and blood person. Give me a break! It reminds me of an old skit by “Beyond the Fringe” where a police inspector points out that “Once you have identified the criminal's face, the criminal's body is likely to be close by…” Our identities and the identities of our devices are related, and understanding this relationship is essential to getting identity and privacy right.

One great thing about blogging is you find out when you haven't been clear enough. I hope I'm making progress in expressing the real issues here: the collection of device identifiers was purposeful, and this represents precisely the kind of “systemic privacy design flaw” to which Ben refers.

It bothers me that this disturbing systemic privacy design flaw – for which there has been no apology – is being obscured through the widely publicized apology for a completely separate and apparently accidental sin.

In contemporary networks, the hardware ID of the device is NOT intended to be a “universal identifier”. It is intended to be a “unidirectional identifier” (see The Fourth Law) employed purely to map between a physical machine and a transient, local logical address. Many people who read this blog understand why networking works this way. In Street View WiFi, Google was consciously misusing this unidirectional identifier as a universal identifier, and misappropriating it by insinuating itself, as eavesdropper, into our network conversations.

Ben says, “The Laws of Identity apply predominantly to the systems that individuals choose to use to manage their data.” But I hope he rethinks this in the context of what identity really is, its use in devices and systems, and the fact that human, device and service identities are tied together in what one day should be a trustworthy system. I also hope to see Google apologize for its misuse of our device identities, and assure us they will not be used in any of their systems.

Finally, despite Ben's need to rethink this matter, I do love his blog, and strongly agree with his comments on Opera Mini, discussed in the same piece.

The Laws of Identity smack Google

Alan Eustace, Google's Senior VP of Engineering & Research, blogged recently about Google's collection of Wi-Fi data using its Street View cars:

The engineering team at Google works hard to earn your trust—and we are acutely aware that we failed badly here. We are profoundly sorry for this error and are determined to learn all the lessons we can from our mistake.

I think the idea of learning all the lessons he can from Google's mistake is a really good one, and I accept that Alan really is sorry. But what constituted the mistake?

Last month Google was good enough to provide us with a “refresher FAQ” that dealt with the subject in a particularly specious way, even though it was remarkable in its condescension:

“What do you mean when you talk about WiFi network information?
“WiFi networks broadcast information that identifies the network and how that network operates. That includes SSID data (i.e. the network name) and MAC address (a unique number given to a device like a WiFi router).

“Networks also send information to other computers that are using the network, called payload data, but Google does not collect or store payload data.*

“But doesn’t this information identify people?
“MAC addresses are a simple hardware ID assigned by the manufacturer. And SSIDs are often just the name of the router manufacturer or ISP with numbers and letters added, though some people do also personalize them.

“However, we do not collect any information about householders, we cannot identify an individual from the location data Google collects via its Street View cars.

“Is it, as the German DPA states, illegal to collect WiFi network information?
“We do not believe it is illegal–this is all publicly broadcast information which is accessible to anyone with a WiFi-enabled device…

Let's start with the last point. Is information that can be collected using a WiFi device actually being “broadcast”? Or is it being transmitted for a specific purpose and private use? If everything is deemed to be “broadcast” simply by virtue of being a signal that can be received, then surely payload data – people's surfing behavior, emails and chat – is also being “broadcast”. Once the notion of “broadcast” is accepted, the FAQ implies there can be no possible objection to collecting it.

But Alan's recent post says, “it’s now clear that we have been mistakenly collecting samples of payload data from open (i.e. non-password-protected) WiFi networks.” He adds, “We want to delete this data as soon as possible…” What is the mistake? Does Alan mean Google has now accepted that WiFi information is not by definition being “broadcast” for its use? Or does Alan see the mistake as being the fact they created a PR disaster? I think “learning everything we can” means learning that the initial premises of the Street View WiFi system were wrong (and the behavior perhaps even illegal) because the system collected WiFi information that was intended to be used for private purposes and not intended to include Google.

The FAQ claims – and this is disturbing – that the information collected about network identifiers “doesn't identify people”. The fact is that it identifies devices that are closely associated with people – including their personal computers and phones. MAC addresses are persistent, remaining constant over the lifetime of the device. They are identifiers that are extremely reliable in establishing identity by virtue of being in peoples’ pockets or briefcases.

As a result, Google breaks two Laws of Identity in one go with their Street View boondoggle,

Google breaks Law 3, the Law of Justifiable Parties.

Digital identity systems must limit disclosure of identifying information to parties having a necessary and justifiable place in a given identity relationship

Google is not part of the transactions between my network devices and is not justified in intervening or recording the details of their use and relationship.

Google also breaks Law 4, Directed Identity:

A universal identity metasystem must support both “omnidirectional” identifiers for use by public entities and “unidirectional” identifiers for private entities, thus facilitating discovery while preventing unnecessary release of correlation handles.

My network devices are private entities intended for use in the contexts for which I authorize them. My home network is a part of my home, and Google (or any other company) has not been invited to employ that network for its own purposes. The identifiers in use there are contextually specific, not public, and not intended to be shared across all contexts. They are more private than the IP addresses used in TCP/IP, since they are not shared across end-points in different networks. The same applies to SSIDs.

One can stand in the street, point a directional microphone at a window and record the conversations inside. This doesn't make them public or give anyone the right to use the conversations for commercial purposes. The same applies to recording the information we exchange using digital media – including our identifiers, SSIDs and MAC addresses. It is particularly disingenuous to argue that because information is not encrypted it doesn't belong to anyone and there are no rights associated with it. If lack of encryption meant information is fair game a lot of Google's own intellectual property would be up for grabs,

Google's justification for collecting MAC addresses was that if a stranger walked down your street, the MAC addresses of your computers and routers could be used provide his systems (or Googles’?) with information on where he was. The idea that Google would, without our consent, employ our home networks for its own commercial purposes betrays a problem of ethics and a lack of control. Let's hope this is what Alan means when he says,

“Given the concerns raised, we have decided that it’s best to stop our Street View cars collecting WiFi network data entirely.”

I know there are many people inside Google who will recognize that these problems represent more than a “mistake” – there is clearly the need for a much deeper understanding of identity and privacy within the engineering and business staff. I hope this will be the outcome. The Laws of Identity are a harsh teacher, and it's sad to see the Street View technology sullied by privacy catastrophes.

Meanwhile, there is one more lesson for the rest of us. We tend to be cavalier in pooh poohing the idea that commercial interests would actually abuse our networks and digital privacy in fundamental ways. This episode demonstrates how naive that is. We need to strengthen the networking infrastructure, and protect it from misuse by commercial interests as well as criminals. We need clear legislation that serves as a disincentive to commercial interests contemplating privacy-invasive use of technology. And on a technical note, we need to fix the problems of static MAC addresses precisely because they are strong personal identifiers that ultimately will be used to target individuals physically as criminals begin to understand their possible uses.

U-Prove Minimal Disclosure availability

This blog is about technology issues, problems, plans for the future, speculative possibilities, long term ideas – all things that should make any self-respecting product marketer with concrete goals and metrics run for the hills! But today, just for once, I'm going to pick up an actual Microsoft press release and lay it on you. The reason? Microsoft has just done something very special, and the fact that the announcement was a key part of the RSA Conference Keynote is itself important:

SAN FRANCISCO — March 2, 2010 — Today at RSA Conference 2010, Microsoft Corp. outlined how the company continues to make progress toward its End to End Trust vision. In his keynote address, Scott Charney, corporate vice president of Microsoft’s Trustworthy Computing Group, explained how the company’s vision for End to End Trust applies to cloud computing, detailed progress toward a claims-based identity metasystem, and called for public and private organizations alike to prevent and disrupt cybercrime.

“End to End Trust is our vision for realizing a safer, more trusted Internet,” said Charney. “To enable trust inside, and outside, of cloud computing environments will require security and privacy fundamentals, technology innovations, and social, economic, political and IT alignment.”

Further, Charney explained that identity solutions that provide more secure and private access to both on-site and cloud applications are key to enabling a safer, more trusted enterprise and Internet. As part of that effort, Microsoft today released a community technology preview of the U-Prove technology, which enables online providers to better protect privacy and enhance security through the minimal disclosure of information in online transactions. To encourage broad community evaluation and input, Microsoft announced it is providing core portions of the U-Prove intellectual property under the Open Specification Promise, as well as releasing open source software development kits in C# and Java editions. Charney encouraged the industry, developers and IT professionals to develop identity solutions that help protect individual privacy.

The company also shared details about a new partnership with the Fraunhofer Institute for Open Communication Systems in Berlin on an interoperability prototype project integrating U-Prove and the Microsoft identity platform with the German government’s future use of electronic identity cards.

As further evidence of how the company is enabling a safer, more trusted enterprise, Microsoft also today released Forefront Identity Manager 2010, a part of its Business Ready Security strategy. Forefront Identity Manager enables policy-based identity management across diverse environments, empowers business customers with self-service capabilities, and provides IT professionals with rich administrative tools.

In addition, Charney reviewed company efforts to creatively disrupt and prevent cybercrime. Citing Microsoft’s recently announced Operation b49, a Microsoft-led initiative to neutralize the well-known Waledac botnet, Charney stated that while focusing on security and privacy fundamentals and threat mitigation remains necessary, the industry needs to be more aggressive in blunting the impact of cybercriminals. Operation b49 is an example of how the private sector can get more creative in its collective approach to fighting criminals online.

“We are committed to collaborating with industry and governments worldwide to realize a safer, more trusted Internet through the creative disruption and prevention of cybercrime,” Charney said.

Readers may remember the promise I made when Microsoft's purchase of U-Prove and Credentica was announced in March 2008 and some worried Microsoft might turn minimal disclosure into something proprietary:

[It isn't…] trivial to figure out the best legal mecahnisms for making the intellectual property and even the code available to the ecosystem. Lawyers are needed, and it takes a while. But I can guarantee everyone that I have zero intention of hoarding Minimal Disclosure Tokens or turning U-Prove into a proprietary Microsoft technology silo.

So here are the specifics of today's annoucement:

Microsoft is opening up the entire foundation of the U-Prove intellectual property by way of a cryptographic specification published under the Microsoft Open Specification Promise (OSP).
Microsoft is donating two reference SDKs in source code (a C# and a Java version) under a liberal free software license (BSD); the objective here is to enable the broadest audience of commercial and open source software developers to implement the technology in any way they see fit.
Microsoft is releasing a public Community Technology Preview (CTP) of the integration of the U-Prove technology (as per the crypto spec) with Microsoft’s identity platform technologies (Active Directory Federation Services 2.0, Windows Identity Foundation, and Windows CardSpace v2).
As part of the CTP, Microsoft is releasing a second specification (also under the OSP) that specifies the integration of the U-Prove technology into so-called “identity selectors” using WS-Trust and information cards.

I really want to thank Stefan Brands, Christian Paquin, and Greg Thompson for what they've done for the Internet in bringing this work to its present state. Open source availability is tremendously important. So is the achievement of integrating U-Prove with Microsoft's metasystem components so as to show that this is real, usable technology – not some far-off dream.

At RSA, Scott Charney showed a 4-minute video made with the Fraunhofer FOKUS Institute in Germany that demonstrates interoperability with the German eID card system (scheduled to begin rolling out in November 2010). The video demonstrates how the integration of the U-Prove technology can offer citizens (students, in this case) the ability to minimally disclose authoritative personal information.

There is also a 20-minute video that explains the benefits of integrating the U-Prove technology into online identity management frameworks.

The U-Prove code, whitepaper and specifications, along with the modules that extend ADFS V2, WIF and CardSpace to support the technology, are available here.

Sorry Tomek, but I “win”

As I discussed here, the EFF is running an experimental site demonstrating that browsers ooze an unnecessary “browser fingerprint” allowing users to be identified across sites without their knowledge. One can easily imagine this scenario:

Site “A” offers some service you are interested in and you release your name and address to it. At the same time, the site captures your browser fingerprint.
Site “B” establishes a relationship with site “A” whereby when it sends “A” a browser fingerprint and “A” responds with the matching identifying information.
You are therefore unknowingly identified at site “B”.

I can see browser fingerprints being used for a number of purposes. Some sites might use a fingerprint to keep track of you even after you have cleared your cookies – and rationalize this as providing added security. Others will inevitably employ it for commercial purposes – targeted identifying customer information is high value. And the technology can even be used for corporate espionage and cyber investigations.

It is important to point out that like any fingerprint, the identification is only probabilistic. EFF is studying what these probabilities are. In my original test, my browser was unique in 120,000 other browsers – a number I found very disturbing.

But friends soon wrote back to report that their browser was even “more unique” than mine! And going through my feeds today I saw a post at Tomek's DS World where he reported a staggering fingerprint uniqueness of 1 in 433,751:

It's not that I really think of myself as super competitive, but these results were so extreme I decided to take the test again. My new score is off the scale:

Tomek ends his post this way:

“So a browser can be used to identify a user in the Internet or to harvest some information without his consent. Will it really become a problem and will it be addressed in some way in browsers in the future? This question has to be answered by people responsible for browser development.”

I have to disagree. It is already a problem. A big problem. These outcomes weren't at all obvious in the early days of the browser. But today the writing is on the wall and needs to be addressed. It's a matter right at the core of delivering on a trustworthy computing infrastructure. We need to evolve the world's browsers to employ minimal disclosure, releasing only what is necessary, and never providing a fingerprint without the user's consent.

More unintended consequences of browser leakage

Joerg Resch at Kuppinger Cole points us to new research showing how social networks can be used in conjunction with browser leakage to provide accurate identification of users who think they are browsing anonymously.

Joerg writes:

Thorsten Holz, Gilbert Wondracek, Engin Kirda and Christopher Kruegel from Isec Laboratory for IT Security found a simple and very effective way to identify a person behind a website visitor without asking for any kind of authentication. Identify in this case means: full name, adress, phone numbers and so on. What they do, is just exploiting the browser history to find out, which social networks the user is a member of and to which groups he or she has subscribed within that social network.

The Practical Attack to De-Anonymize Social Network Users begins with what is known as “history stealing”.

Browsers don’t allow web sites to access the user’s “history” of visited sites. But we all know that browsers render sites we have visited in a different color than sites we have not. This is available programmatically through javascript by examining the a:visited style. So malicious sites can play a list of URLs and examine the a:visited style to determine if they have been visited, and can do this without the user being aware of it.

This attack has been known for some time, but what is novel is its use. The authors claim the groups in all major social networks are represented through URLs, so history stealing can be translated into “group membership stealing”. This brings us to the core of this new work. The authors have developed a model for the identification characteristics of group memberships – a model that will outlast this particular attack, as dramatic as it is.

The researchers have created a demonstration site that works with the European social network Xing. Joerg tried it out and, as you can see from the table at left, it identified him uniquely – although he had done nothing to authenticate himself. He says,

“Here is a screenshot from the self-test I did with the de-anonymizer described in my last post. I´m a member in 5 groups at Xing, but only active in just 2 of them. This is already enough to successfully de-anonymize me, at least if I use the Google Chrome Browser. Using Microsoft Internet Explorer did not lead to a result, as the default security settings (I use them in both browsers) seem to be stronger. That´s weird!”

Since I’m not a user of Xing I can’t explore this first hand.

Joerg goes on to ask if history-stealing is a crime? If it’s not, how mainstream is this kind of analysis going to become? What is the right legal framework for considering these issues? One thing for sure: this kind of demonstration, as it becomes widely understood, risks profoundly changing the way people look at the Internet.

To return to the idea of minimal disclosure for the browser, why do sites we visit need to be able to read the a:visited attribute? This should again be thought of as “fingerprinting”, and before a site is able to retrieve the fingerprint, the user must be made aware that it opens the possibility of being uniquely identified without authentication.

Minimal disclosure for browsers

Not five minutes after pressing enter on my previous post a friend wrote back and challenged me to compare IE's behavior with that of Firefox. I don't like doing product comparisons but clearly this is a question others will ask so I'll share the results with you:

Results: behavior of the two browsers are essentially identical. In both cases, my browser was uniquely identified.

Conclusion: we need to work across the industry to align browsers with minimal disclosure principles. How much information needs to be released to a site we don't trust yet? To what extent can the detailed information currently release be collapsed into non-identifying categories? When there is some compelling reason to release detailed information, how do we inform the user that the site wants to obtain a fingerprint?

New EFF Research on Web Browser Tracking

Slashdot's CmdrTaco points us to a research project announced by EFF‘s Peter Eckersley that I expect will provoke both discussion and action:

What fingerprints does your browser leave behind as you surf the web?

Traditionally, people assume they can prevent a website from identifying them by disabling cookies on their web browser. Unfortunately, this is not the whole story.

When you visit a website, you are allowing that site to access a lot of information about your computer's configuration. Combined, this information can create a kind of fingerprint – a signature that could be used to identify you and your computer. But how effective would this kind of online tracking be?

EFF is running an experiment to find out. Our new website Panopticlick will anonymously log the configuration and version information from your operating system, your browser, and your plug-ins, and compare it to our database of five million other configurations. Then, it will give you a uniqueness score – letting you see how easily identifiable you might be as you surf the web.

Adding your information to our database will help EFF evaluate the capabilities of Internet tracking and advertising companies, who are already using techniques of this sort to record people's online activities. They develop these methods in secret, and don't always tell the world what they've found. But this experiment will give us more insight into the privacy risk posed by browser fingerprinting, and help web users to protect themselves.

To join the experiment:
http://panopticlick.eff.org/

To learn more about the theory behind it:
http://www.eff.org/deeplinks/2010/01/primer-information-theory-and-priva…

Interesting that my own browser was especially recognizable:

I know my video configuration is pretty bizarre – but don't understand why I should be broadcasting that when I casually surf the web. I would also like to understand what is so special about my user agent info.

Pixel resolution like 1435 x 810 x 32 seems unnecessarily specific. Applying the concept of minimal disclosure, it would be better to reveal simply that my machine is in some useful “class” of resolution that would not overidentify me.

I would think the provisioning of highly identifying information should be limited to sites with which I have an identity relationship. If we can agree on a shared mechanism for storing information about our trust for various sites (information cards offer this capability) our browsers could automatically adjust to the relationship they were in, releasing information as necessary. This is a good example of how a better identity system is needed to protect privacy while providing increased functionality.

If you try sometimes – you can get what you need

I'll lose a few minutes less sleep each night worrying about Electronic Eternity – thanks to the serendipitous appearance of John Markoff's recent piece on Vanish in the New York Times Science section:

A group of computer scientists at the University of Washington has developed a way to make electronic messages “self destruct” after a certain period of time, like messages in sand lost to the surf. The researchers said they think the new software, called Vanish, which requires encrypting messages, will be needed more and more as personal and business information is stored not on personal computers, but on centralized machines, or servers. In the term of the moment this is called cloud computing, and the cloud consists of the data — including e-mail and Web-based documents and calendars — stored on numerous servers.

The idea of developing technology to make digital data disappear after a specified period of time is not new. A number of services that perform this function exist on the World Wide Web, and some electronic devices like FLASH memory chips have added this capability for protecting stored data by automatically erasing it after a specified period of time.

But the researchers said they had struck upon a unique approach that relies on “shattering” an encryption key that is held by neither party in an e-mail exchange but is widely scattered across a peer-to-peer file sharing system…

The pieces of the key, small numbers, tend to “erode” over time as they gradually fall out of use. To make keys erode, or timeout, Vanish takes advantage of the structure of a peer-to-peer file system. Such networks are based on millions of personal computers whose Internet addresses change as they come and go from the network. This would make it exceedingly difficult for an eavesdropper or spy to reassemble the pieces of the key because the key is never held in a single location. The Vanish technology is applicable to more than just e-mail or other electronic messages. Tadayoshi Kohno, a University of Washington assistant professor who is one of Vanish’s designers, said Vanish makes it possible to control the “lifetime” of any type of data stored in the cloud, including information on Facebook, Google documents or blogs. In addition to Mr. Kohno, the authors of the paper, “Vanish: Increasing Data Privacy with Self-Destructing Data,” include Roxana Geambasu, Amit A. Levy and Henry M. Levy.

[More here]

More precision on the Right to Correlate

Dave Kearns continues to whack me for some of my terminology in discussing data correlation. He says:

‘In responding to my “violent agreement” post, Kim Cameron goes a long way towards beginning to define the parameters for correlating data and transactions. I'd urge all of you to jump into the discussion.

‘But – and it's a huge but – we need to be very careful of the terminology we use.

‘Kim starts: “Let’s postulate that only the parties to a transaction have the right to correlate the data in the transaction, and further, that they only have the right to correlate it with other transactions involving the same parties.” ‘

Dave's right that this was overly restrictive. In fact I changed it within a few minutes of the initial post – but apparently not fast enough to prevent confusion. My edited version stated:

‘Let’s postulate that only the parties to a transaction have the right to correlate the data in the transaction (unless it is fully anonymized).’

This way of putting things eliminates Dave's concern:

‘Which would mean, as I read it, that I couldn't correlate my transactions booking a plane trip, hotel and rental car since different parties were involved in all three transactions!’

That said, I want to be clear that “parties to a transaction” does NOT include what Dave calls “all corporate partners” (aka a corporate information free-for-all!) It just means parties (for example corporations) participating directly in some transaction can correlate it with the other transacitons in which they directly participate (but not with the transactions of some other corporation unless they get approval from the transaction participants to do so).

Dave argues:

‘In the end, it isn't the correlation that's problematic, but the use to which it's put. So let's tie up the usage in a legally binding way, and not worry so much about the tools and technology.

‘In many ways the internet makes anti-social and unethical behavior easier. That doesn't mean (as some would have it) that we need to ban internet access or technological tools. It does mean we need to better educate people about acceptable behavior and step up our policing tools to better enable us to nab the bad guys (while not inconveniencing the good guys).’

To be perfectly clear, I'm not proposing a ban on technology! I don't do banning! I do creation.

So instead, I'm arguing that as we develop our new technologies we should make sure they support the “right to correlation” – and the delegation of that right – in ways that restore balance and give people a fighting chance to prevent unseen software robots from limiting their destinies.