Clarke: Appropriating home network identifiers is the real issue

Here is some background on the Google Street View WiFi issue by Roger Clarke, a well known Australian privacy expert.  Roger points out that Peter Schaar, Germany's Federal Commissioner for Freedom of Information, was concerned about misuse of network identifiers from the very beginning. 

I agree that the identifiers of users’ devices is the real issue.

And your invocation of “It reminds me of an old skit by “Beyond the Fringe” where a police inspector points out that “Once you have identified the criminal's face, the criminal's body is likely to be close by” does hit the spot very nicely!

You ask why the payload is getting all the attention.  After all, it was the device-addresses that Peter Schaar first drew attention to.  As I wrote here,

The third mistake came to light on 22 April 2010, when The Register reported that “[Google's] Street View service is under fire [from the German Data Protection Commissioner, Peter Schaar] for scanning private WLAN networks, and recording users’ unique [device] addresses, as the car trundles along”.

As soon as Peter Fleischer [Google's European privacy advisor – Kim]  published his document of 27 April, I wrote to Schaar, saying:

“Fleischer's document doesn't say anything about whether the surveillance apparatus in the vehicle detects other messages from the router, and messages from other devices…

“In relation to messages other than beacons, on the surface of it, Fleischer might seem to be making an unequivocal statement that Google does *not* collect and store MAC addresses.

“But:

  1. If Google's surveillance apparatus is in a Wifi zone, how does it avoid ‘collecting’ the data?  [Other statements make clear that it does in fact collect that data]
  2. [In the statement “Google does not collect or store payload data”,] the term ‘payload data’ would most sensibly be interpreted as meaning the content, but not including the headers.
  3. The MAC-addresses are in the headers.
  4. So Fleischer's statement is open to the interpretation that header data of messages other than beacons *is* collected, and *is* stored.

“Google has failed to make the statement that connected-device MAC-addresses are *not* collected and stored.

“Because Google has had ample opportunity to make such a statement, and has avoided doing so, I therefore make the conservative assumption that Google *does* collect and store MAC addresses of any devices on networks, not just of routers.”

The document sent to the Commissioners added fuel to the fire, by saying “The equipment is able to receive data from all broadcast frames [i.e. not only beacons are intercepted; any traffic may be intercepted.] This includes, from the header data, SSID and MAC addresses [i.e. consistent with the analysis above, the MAC-addresses of all devices are available to Google's surveillance apparatus.] However, all data payload from data frames are discarded, so Google never collects the content of any communications.

Subsequently, on 14 May, investigations by Hamburg Commissioner Caspar led to the unavoidable conclusion that Fleischer's post on April 27 had been incorrect in a key respect. As Eustace put it, “It's now clear that we have been mistakenly collecting samples of payload data [i.e. message content] from open (i.e. non-password-protected) WiFi networks”.

So I think there are a couple of reasons why the payload aspect is getting most of the press:

  1. The significance of identifiers isn't readily apparent to most people, whereas ‘payload’, like people's Internet Banking passwords, is easier to visualise. (Leave aside that only highly insecure services send authenticators unencrypted. Low-tech reporters have to (over-) simplify stories to communicate to low-tech readers
  2. A corporation appeared to have been caught telling fibs, constructively misleading the public and the media, and regulators
  3. That's what catapulted it into the news, and reporters feed off one another's work, so it's the payload they all focus on
  4. A final factor is that breaches of telecommunications laws may be easier to prove in the case of content than of device-identifiers.

The Australian Privacy Foundation (APF) stepped up the pressure in Australia late this week.

Firstly, we directly requested Google not to delete the data, and gave them notice that we were considering using a little-known part of the TIAA to launch an action.  That was promptly followed by the NYT's report of the Oz Privacy Commissioner saying that the Australian data is in the USA.  (The first useful utterance she's made on the topic – a month after this story broke, there's no mention of the matter on her web-site).

Secondly, we wrote to the relevant regulators, and requested them to contact Google to ensure that the data is not deleted, and to investigate whether Google's actions breached Australian laws.

 

Don't take identities from our homes without our consent

Joerg Resch of Kuppinger Cole in Germany wrote recently about the importance of identity management to the Smart Grid – by which he means the emerging energy infrastructure based on intelligent, distributed renewable resources:

In 10-12 years from now, the whole utilities and energy market will look dramatically different. Decentralization of energy production with consumers converting to prosumers pumping solar energy into the grid and offering  their electric car batteries as storage facilities, spot markets for the masses offering electricity on demand with a fully transparent price setting (energy in a defined region at a defined time can be cheaper, if the sun is shining or the wind is blowing strong), and smart meters in each home being able to automatically contract such energy from spot markets and then tell the washing machine to start working as soon as electricity price falls under a defined line. And – if we think a bit further and apply Google-like business models to the energy market, we can get an idea of the incredible size this market will develop into.

These are just a few examples, which might give you an idea on how the “post fossile energy market” will work. The drivers leading the way into this new age are clear: energy production from oil and gas will become more and more expensive, because pollution is not for free and the resources will not last forever. And the transparency gain from making the grid smarter will make electricity cheaper than it is now.

The drivers are getting stronger every day. Therefore, we will soon see many large scale smart grid initiatives, and we will see questions rising such as who has control over the information collected by the smart meter in my home. Is it my energy provider? How would Kim Cameron´s 7 laws of Identity work in a smart grid? What would a “grid perimeter” look like which keeps information on the usage of whatever electric devices within my 4 walls? By now, we all know what cybercrimes are and how they can affect each of us. But what are the risks of “smart grid hacking”? How might we be affected by “grid crimes”?

In fact at Blackhat 2009, security consultant Mike Davis demonstrated successful hacker attacks on commercially available smart meters.  He told the conference,

“Many of the security vulnerabilities we found are pretty frightening and most smart meters don't even use encryption or ask for authentication before carrying out sensitive functions like running software updates and severing customers from the power grid.”

Privacy commission Ann Cavoukian of Ontario has insisted that industry turn its attention to the security and privacy of these devices:

“The best response is to ensure that privacy is proactively embedded into the design of the Smart Grid, from end to end. The Smart Grid is presently in its infancy worldwide – I’m confident that many jurisdictions will look to our work being done in Ontario as the privacy standard to be met. We are creating the necessary framework with which to address this issue.”

Until recently, no one has talked about drive-by mapping of our home devices.  But from now on we will.  When we think about home devices, we need to reach into the future and come to terms with the huge stakes that are up for grabs here.  

The smart home and the smart grid alert us to just how important the identity and privacy of our devices really is.  We can use technical mechanisms like encryption to protect some information from eavesdroppers.   But not the patterns of our communication or the identities of our devices…  To do that we need a regulatory framework that ensures commercial interests don't enter our “device space” without our consent.

Google's recent Street View WiFi boondoggle is a watershed event in drawing our attention to these matters.

Misuse of network identifiers was done on purpose

Ben Adida has a list of achievements as long as my arm – many of which are related to privacy and security.  His latest post concerns what he calls, “privacy advocacy theater… a problem that my friends and colleagues are guilty of, and I’m sure I’m guilty of it at times, too.  Privacy Advocacy Theater is the act of extreme criticism for an accidental data breach rather than a systemic privacy design flaw. Example: if you’re up in arms over the Google Street View privacy “fiasco” of the last few days, you’re guilty of Privacy Advocacy Theater.”

Ben then proceeds take me to task for this piece:

I also have to be harsh with people I respect deeply, like Kim Cameron who says that Google broke two of his very nicely crafted Laws of Identity. Come on, Kim, this was accidental data collection by code that the Google Street View folks didn’t even realize was running. (I’m giving them the benefit of the doubt. If they are lying, that’s a different problem, but no one’s claiming they’re lying, as far as I know.) The Laws of Identity apply predominantly to the systems that individuals choose to use to manage their data. If anyone is breaking the Laws of Identity, it’s the WiFi access points that don’t actively nudge users towards encrypting their WiFi network.

But let's hold on a minute.  My argument wasn't about the payload data that was collected accidently.  It was about the device identification data that was collected on purpose.  As Google's Alan Eustace put it: 

We said that while Google did collect publicly broadcast SSID information (the WiFi network name) and MAC addresses (the unique number given to a device like a WiFi router) using Street View cars, we did not collect payload data (information sent over the network). But it’s now clear that we have been mistakenly collecting samples of payload data…

Device identifiers were collected on purpose

SSID and MAC addresses are the identifiers of your devices.  They are transmitted as part of the WiFi traffic just like the payload data is.  And they are not “publically broadcast” any more than the payload data is. 

Yet Google consciously decided to abscond with, tabulate and monetize the identities of our personal, business and home devices.  The identifiers are persistent and last for the lifetime of the devices.  Their collection, cataloging and use is, in my view, more dangerous than the payload data that was collected. Why? The payload data, though deeply personal, is transient and represents a single instant.  The identifiers are persistent, and the Street View WiFi plan was to use them for years.  

Let's be clear:  Identity has as much to do with devices, software, services and organizations as with individuals.  And equally important, identity is about the relationships between these things.  In fact identity can only be adequately expressed through the relationships (some call it context).

When Google says, “MAC addresses are a simple hardware ID assigned by the manufacturer” and “We cannot identify an individual” using those “simple hardware IDs”,  it sounds like the devices found in your home and briefcase and pocket have nothing to do with you as a flesh and blood person.  Give me a break!  It reminds me of an old skit by “Beyond the Fringe” where a police inspector points out that “Once you have identified the criminal's face, the criminal's body is likely to be close by…”  Our identities and the identities of our devices are related, and understanding this relationship is essential to getting identity and privacy right.

One great thing about blogging is you find out when you haven't been clear enough.  I hope I'm making progress in expressing the real issues here:  the collection of device identifiers was purposeful, and this represents precisely the kind of “systemic privacy design flaw” to which Ben refers.  

It bothers me that this disturbing systemic privacy design flaw – for which there has been no apology – is being obscured through the widely publicized apology for a completely separate and apparently accidental sin.  

In contemporary networks, the hardware ID of the device is NOT intended to be a “universal identifier”.  It is intended to be a “unidirectional identifier” (see The Fourth Law) employed purely to map between a physical machine and a transient, local logical address.  Many people who read this blog understand why networking works this way.  In Street View WiFi, Google was consciously misusing this unidirectional identifier as a universal identifier, and misappropriating it by insinuating itself, as eavesdropper, into our network conversations.

Ben says, “The Laws of Identity apply predominantly to the systems that individuals choose to use to manage their data.”  But I hope he rethinks this in the context of what identity really is, its use in devices and systems, and the fact that human, device and service identities are tied together in what one day should be a trustworthy system.  I also hope to see Google apologize for its misuse of our device identities, and assure us they will not be used in any of their systems.

Finally, despite Ben's need to rethink this matter,  I do love his blog, and strongly agree with his comments on  Opera Mini, discussed in the same piece.

 

Issuing Information Cards with ADFS 2.0

When  Microsoft released Active Directory Federation Services V2 recently, we indicated we were holding off on shipping CardSpace 2.0 while figuring out how to best integrate Minimal Disclosure Technology (U-Prove) and create maximum synergy with the OpenID and OAuth initiatives.  Some feared the change in plan meant Microsoft was backing away from the idea of Information Cards and a visual identity selector.  Nothing could be further from the truth – the growth in adoption of federation and the shift toward cloud computing both make Information Card technology more important than ever.

This new announcement from Technet identity blog will therefore come as good news:

Today, Microsoft is announcing the availability of the Information Card Issuance Community Technology Preview (CTP) to enable the following scenarios with Active Directory Federation Services 2.0 RTM:

  • Administrators can install an Information Card Issuance component on AD FS 2.0 RTM servers and configure Information Card Issuance policy and parameters.
  • End users with IMI 1.0- or IMI 1.1 (DRAFT)-compliant identity selectors can obtain Information Cards backed by username/password, X.509 digital certificate, or Kerberos.
  • Continued support for Windows CardSpace 1.0 in Windows 7, Windows Vista and Windows XP SP 3 running .NET 3.5 SP1.

We have also added two new mechanisms for interaction and feedback on this topic, an Information Card Issuance Forum and a monitored e-mail alias ici-ctp@microsoft.com

 

Interview on Identity and the Cloud

I just came across a Channel 9 interview Matt Deacon did with me at the Architect Insight Conference in London a couple of weeks ago.  It followed a presentation I gave on the importance of identity in cloud computing.   Matt keeps my explanation almost… comprehensible – readers may therefore find it of special interest.  Video is here.

 

In addition, here are my presenation slides and video .

Sorry Tomek, but I “win”

As I discussed here, the EFF is running an experimental site demonstrating that browsers ooze an unnecessary “browser fingerprint” allowing users to be identified across sites without their knowledge.  One can easily imagine this scenario:

  1. Site “A” offers some service you are interested in and you release your name and address to it.  At the same time, the site captures your browser fingerprint.
  2. Site “B” establishes a relationship with site “A” whereby when it sends “A” a browser fingerprint and “A” responds with the matching identifying information.
  3. You are therefore unknowingly identified at site “B”.

I can see browser fingerprints being used for a number of purposes.  Some sites might use a fingerprint to keep track of you even after you have cleared your cookies – and rationalize this as providing added security.  Others will inevitably employ it for commercial purposes – targeted identifying customer information is high value.  And the technology can even be used for corporate espionage and cyber investigations.

It is important to point out that like any fingerprint, the identification is only probabilistic.  EFF is studying what these probabilities are.  In my original test, my browser was unique in 120,000 other browsers – a number I found very disturbing.

But friends soon wrote back to report that their browser was even “more unique” than mine!  And going through my feeds today I saw a post at Tomek's DS World where he reported a staggering fingerprint uniqueness of 1 in 433,751:

 

It's not that I really think of myself as super competitive, but these results were so extreme I decided to take the test again.  My new score is off the scale:

Tomek ends his post this way:

“So a browser can be used to identify a user in the Internet or to harvest some information without his consent. Will it really become a problem and will it be addressed in some way in browsers in the future? This question has to be answered by people responsible for browser development.”

I have to disagree.  It is already a problem.  A big problem.  These outcomes weren't at all obvious in the early days of the browser.  But today the writing is on the wall and needs to be addressed.  It's a matter right at the core of delivering on a trustworthy computing infrastructure.    We need to evolve the world's browsers to employ minimal disclosure, releasing only what is necessary, and never providing a fingerprint without the user's consent.

 

More unintended consequences of browser leakage

Joerg Resch at Kuppinger Cole points us to new research showing  how social networks can be used in conjunction with browser leakage to provide accurate identification of users who think they are browsing anonymously.

Joerg writes:

Thorsten Holz, Gilbert Wondracek, Engin Kirda and Christopher Kruegel from Isec Laboratory for IT Security found a simple and very effective way to identify a person behind a website visitor without asking for any kind of authentication. Identify in this case means: full name, adress, phone numbers and so on. What they do, is just exploiting the browser history to find out, which social networks the user is a member of and to which groups he or she has subscribed within that social network.

The Practical Attack to De-Anonymize Social Network Users begins with what is known as “history stealing”.  

Browsers don’t allow web sites to access the user’s “history” of visited sites.  But we all know that browsers render sites we have visited in a different color than sites we have not.  This is available programmatically through javascript by examining the a:visited style.  So malicious sites can play a list of URLs and examine the a:visited style to determine if they have been visited, and can do this without the user being aware of it.

This attack has been known for some time, but what is novel is its use.  The authors claim the groups in all major social networks are represented through URLs, so history stealing can be translated into “group membership stealing”.  This brings us to the core of this new work.  The authors have developed a model for the identification characteristics of group memberships – a model that will outlast this particular attack, as dramatic as it is.

The researchers have created a demonstration site that works with the European social network Xing.  Joerg tried it out and, as you can see from the table at left, it identified him uniquely – although he had done nothing to authenticate himself.  He says,

“Here is a screenshot from the self-test I did with the de-anonymizer described in my last post. I´m a member in 5 groups at Xing, but only active in just 2 of them. This is already enough to successfully de-anonymize me, at least if I use the Google Chrome Browser. Using Microsoft Internet Explorer did not lead to a result, as the default security settings (I use them in both browsers) seem to be stronger. That´s weird!”

Since I’m not a user of Xing I can’t explore this first hand.

Joerg goes on to ask if history-stealing is a crime?  If it’s not, how mainstream is this kind of analysis going to become?  What is the right legal framework for considering these issues?  One thing for sure:  this kind of demonstration, as it becomes widely understood, risks profoundly changing the way people look at the Internet.

To return to the idea of minimal disclosure for the browser, why do sites we visit need to be able to read the a:visited attribute?  This should again be thought of as “fingerprinting”, and before a site is able to retrieve the fingerprint, the user must be made aware that it opens the possibility of being uniquely identified without authentication.

Identity Roadmap Presentation at PDC09

Earlier this week I presented the Identity Keynote at the Microsoft Professional Developers Conference (PDC) in LA.  The slide deck is here, and the video is here.

After announcing the release of the Windows Identity Foundation (WIF) as an Extension to .NET, I brought forward three architect/engineers to discuss how claims had helped them solve their development problems.   I chose these particular guests because I wanted the developer audience to be able to benefit from the insights they had previously shared with me about the advantages – and challenges – of adopting the claims based model.  Each guest talks about the approach he took and the lessons learned.

Andrew Bybee, Principal Program Manager from Microsoft Dynamics CRM, talked about the role of identity in delivering the “the Power of Choice” – the ability for his customers to run his software wherever they want, on premises or in the cloud or in combination, and to offer access to anyone they choose.

Venky Veeraraghavan, the Program Manager in charge of identity for SharePoint, talks about what it was like to completely rethink the way identity works in Sharepoint so it takes advantage of the claims based architecture to solve problems that previously had been impossibly difficult.  He explores the problems of “Multi-hop” systems and web farms, especially the “Dreaded Second Hop” – which he admits “really, really scares us…”  I find his explanation riveting and think any developer of large scale systems will agree.

Dmitry Sotnikov, who is Manager of New Product Research at Quest Software, presents a remarkable Azure-based version of a product Quest has previously offered only “on premise”.  The service is a backup system for Active Directory, and involved solving a whole set of hard identity problems involving devices and data as well as people.

Later in the presentation, while discussing future directions, I announce the Community Technical Preview of our new work on REST-based authorization (a profile of OAuth), and then show the prototype of the mutli-protocol identity selector Mike Jones unveiled at the recent IIW.   And finally, I talk for the first time about “System.Identity”, work on user-centric next generation directory that I wanted to take to the community for feedback.  I'll be blogging about this a lot and hopefully others from the blogosphere will find time to discuss it with me.

 

Microsoft: minimum disclosure about minimum disclosure?

Back from vacation and catching up on some blogs I found this piece by Felix Gaehtgens at Kuppinger Cole in Germany:  

A good year ago, Microsoft acquired an innovative company called U-Prove. That company, founded by visionary Stephan Brandt, had come up with a privacy-enabling technology that effectively allows users to safely transmit the minimum required information about themselves when required to – and for those receiving the information, a proof that the information is valid. For example: if a country issued a digital identification card, and a service provider would need to check whether the holder over 18 years of age, the technology would allow to do just that – instead of having to transmit a full data set, including the age of birth. The technology works through a complex set of encryption and signing rules and is a win-win for both users who need to provide information as well as those taking it (also called “relying parties in geek speak”). With the acquisition of U-Prove, Microsoft now owns all of the rights to the technology – and more importantly, the associated patents with it. Stephan Brandt is now part of Microsoft’s identity team, filled with top-notch brilliant minds such as Dick Hardt, Ariel Gordon, Mark Wahl, Kim Cameron and numerous others.

Privacy advocates should (and are) happy about this technology because it effectively allows consumers to protect their information, instead of forcing them to give up unnecessary information to transact business. How many times have we needed to give up personal information for some type of service without any real need for this information? For example, if you’re not shipping anything to me… what’s the point of providing my home or address? If you are legally required to verify that I’m over 18 (or 21), why would you really need to know my credit card details and my home address? If you need to know that I am a customer of one of your partner banks, why would you also need to know my bank account number? Minimum disclosure makes transactions possible with exactly the right fit of personal details being exchanged. For those enterprises taking the data, this is also a very positive thing. Instead of having to “coax” unnecessary information out of potential customers, they can instead make a clear case of what information they do require for fulfilling the transaction, and will ultimately find consumers more willing to do business with them.

So all of this is really great. And what’s even better, Microsoft’s chief identity architect, Kim Cameron has promised not to “hoard” this technology for Microsoft’s own products, but to actually contribute it to society in order to make the Internet a better place. But more than one year down the line, Microsoft has not made a single statement about what will happen to U-Prove: minimum disclosure about its minimum disclose technology (pun intended!). In a post that I made a year ago, I tried making the point that this technology is so incredibly important for the future of the Internet, that Microsoft should announce its plans what do with the technology (and the patents associated for it).

Kim’s response was that Microsoft had no intentions of “hoarding” the technology for its own purposes. He highlighted however that it would take time to do this – time for Microsoft’s lawyers, executives and technologists to irk out the details of doing this.

Well – it’s been a year, and the only “minimum disclosure” that we can see is Microsoft’s unwillingness to talk about it. The debate is heating up around the world about different governments’ proposals for electronic passports and ID cards. Combined with the growing dangers of identity theft and continued news about spectacular leaks and thefts of personal information, this would really make our days. Unless you’re a spammer or identity thief of course.

So it’s about time Microsoft started making some statements to reassure all of us what is going to happen with the U-Prove technology, and – more importantly – with the patents. Microsoft has been reinventing itself and making a continuous effort to turn from the “bad guys of identity” a decade (in the old Hailstorm days with Microsoft Passport) into the “good guys” of identity with its open approach to identity and privacy protection and standardisation. At Kuppinger Cole we have loudly applauded the Identity Metasystem and Infocards as a ground-breaking innovation that we believe will transform the way we use the Internet in the years to come. Now is the time to really start off the transformative wave of innovation that comes when we finally address the dire need for privacy protection. Microsoft has the key in its hands, or rather, locked in a drawer. C’mon guys, when will that drawer finally be opened?

Kuppinger Cole has been an important force in creating awareness about the role of an Identity Metasystem. It has also led in stressing the importance of minimal disclosure technology. I take Felix's concerns very seriously. He's right – I owe people a progress report.

This said, there is no locked drawer. Instead, Felix gets closer to the real explanation in his first paragraph: “the technology works through a complex set of encryption and signing rules.”

The complexity must be tamed for the technology to succeed. There is more to this than brilliant formulas or crypto routines. We need to understand not only how minimal disclosure technology can be used – but how it can be made usable.

There are different kinds of research. Theoretical research is hugely important. But applied research is just as key. Over the last year we've moved from an essentially theoretical grasp of the possibilities to prototypes that demonstrate the feasibility of deploying real, large-scale distributed systems based on minimal disclosure.

I don't have much time for standards and protocols that are NOT built on top of experience with implementation. And if you don't know what your standards and implementations might look like, you can't define the intellectual property requirements.

So we've been working hard on figuring this stuff out. In fact, a lot of progress has been made, and I'll write about that in my next few posts. I'll also reach out to anyone who wants to become more closely involved.

Electronic Eternity

From the Useful Spam Department :  I got an advertisement from a robot at “complianceonline.com” that works for a business addressing the problem of data retention on the web from the corporate point of view. 

We've all read plenty about the dangers of teenagers publishing their party revels only to find themselves rejected by a university snooping on their Facebook account.  But it's important to remember that the same issues affect business and government as well, as the complianceonline robot points out:

“Avoid Documentation ‘Time Bombs’

“Your own communications and documents can be used against you.

“Lab books, project and design history files, correspondence including e-mails, websites, and marketing literature may all contain information that can compromise a company and it's regulatory compliance. Major problems with the U.S. FDA and/or in lawsuits have resulted from careless or inappropriate comments or even inaccurrate opinions being “voiced” by employees in controlled or retained documents. Opinionated or accusatory E-mails have been written and sent, where even if deleted, still remain in the public domain where they can effectively “last forever”.

“In this electronic age of My Space, Face Book, Linked In, Twitter, Blogs and similar instant communication, derogatory information about a company and its products can be published worldwide, and “go viral”, whether based on fact or not. Today one's ‘opinion’ carries the same weight as ‘fact’.”

This is all pretty predictable and even banal, but then we get to the gem:  the company offers a webinar on “Electronic Eternity”.  I like the rubric.  I think “Electronic Eternity” is one of the things we should question.  Do we really need to accept that it is inevitable?  Whose interest does it serve?  I can't see any stakeholder who benefits except, perhaps, the archeologist. 

Perhaps everything should have a half-life unless a good argument can be made for preserviing it.