Microsoft identity guru questions Apple, Google on mobile privacy

Todd Bishop at TechFlash published a comprehensive story this week on device fingerprints and location services: 

Kim Cameron is an expert in digital identity and privacy, so when his iPhone recently prompted him to read and accept Apple's revised terms and conditions before downloading a new app, he was perhaps more inclined than the rest of us to read the entire privacy policy — all 45 pages of tiny text on his mobile screen.

It's important to note that apart from writing his own blog on identity issues — where he told this story — Cameron is Microsoft's chief identity architect and one of its distinguished engineers. So he's not a disinterested industry observer in the broader sense. But he does have extensive expertise.

And he is publicly acknowledging his use of an iPhone, after all, which should earn him at least a few points for neutrality…

At this point I'll butt in and editorialize a little.  I'd like to amplify on Todd's point for the benefit of readers who don't know me very well:  I'm not critical of Street View WiFi because I am anti-Google.  I'm not against anyone who does good technology.  My critique stems from my work as a computer scientist specializing in identity, not as a person playing a role in a particular company.  In short, Google's Street View WiFi is bad technology, and if the company persists in it, it will be one of the identity catastrophes of our time.

When I figured out the Laws of Identity and understood that Microsoft had broken them, I was just as hard on Microsoft as I am on Google today.  In fact, someone recently pointed out the following reference in Wikipedia's article on Microsoft's Passport:

“A prominent critic was Kim Cameron, the author of the Laws of Identity, who questioned Microsoft Passport in its violations of those laws. He has since become Microsoft's Chief Identity Architect and helped address those violations in the design of the Windows Live ID identity meta-system. As a consequence, Windows Live ID is not positioned as the single sign-on service for all web commerce, but as one choice of many among identity systems.”

I hope this has earned me some right to comment on the current abuse of personal device identifiers by Google and Apple – which, if their FAQs and privacy policies represent what is actually going on, is at least as significant as the problems I discussed long ago with Passport.  

But back to Todd: 

At any rate, as Cameron explained on his IdentityBlog over the weekend, his epic mobile reading adventure uncovered something troubling on Page 37 of Apple's revised privacy policy, under the heading of “Collection and Use of Non-Personal Information.” Here's an excerpt from Apple's policy, Cameron's emphasis in bold.

We also collect non-personal information — data in a form that does not permit direct association with any specific individual. We may collect, use, transfer, and disclose non-personal information for any purpose. The following are some examples of non-personal information that we collect and how we may use it:

We may collect information such as occupation, language, zip code, area code, unique device identifier, location, and the time zone where an Apple product is used so that we can better understand customer behavior and improve our products, services, and advertising.

Here's what Cameron had to say about that.

Maintaining that a personal device fingerprint has “no direct association with any specific individual” is unbelievably specious in 2010 — and even more ludicrous than it used to be now that Google and others have collected the information to build giant centralized databases linking phone MAC addresses to house addresses. And — big surprise — my iPhone, at least, came bundled with Google’s location service.

The irony here is a bit fantastic. I was, after all, using an “iPhone”. I assume Apple’s lawyers are aware there is an ‘I’ in the word “iPhone”. We’re not talking here about a piece of shared communal property that might be picked up by anyone in the village. An iPhone is carried around by its owner. If a link is established between the owner’s natural identity and the device (as Google’s databases have done), its “unique device identifier” becomes a digital fingerprint for the person using it.

MAC in this context refers to Media Access Control addresses associated with specific devices, one type of data that Google has acknowledged collecting. However, in a response to an Atlantic magazine piece that quoted an earlier Cameron blog post, Google says that it hasn't gone as far Cameron is suggesting. The company says it has collected only the MAC addresses of WiFi routers, not of laptops or phones.

The distinction is important because it speaks to how far the companies could go in linking together a specific device with a specific person in a particular location.

Google's FAQ, for the record, says its location-based services (such as Google Maps for Mobile) figure out the location of a device when that device “sends a request to the Google location server with a list of MAC addresses which are currently visible to the device” — not distinguishing between MAC addresses from phones or computers and those from wireless routers.

Here's what Cameron said when I asked about that topic via email.

I have suggested that the author ask Google if it will therefore correct its FAQ, since the portion of the FAQ on “how the system works” continues to say it behaves in the way I described. If Google does correct its FAQ then it will be likely that data protection authorities ask Google to demonstrate that its shipped software behaving in the way described in the correction.

I would of course feel better about things if Google’s FAQ is changed to say something like, “The user’s device sends a request to the Google location server with the list of MAC addresses found in Beacon Frames announcing a Network Access Point SSID and excluding the addresses of end user devices.”

However, I would still worry that the commercially irresistible feature of tracking end user devices could be turned on at any second by Google or others. Is that to be prevented? If so, how?

So a statement from Google that its FAQ was incorrect would be good news – and I would welcome it – but not the end of the problem for the industry as a whole.

The privacy statement for Microsoft's Location Finder service, for the record, is more specific in saying that the service uses MAC addresses from wireless access points, making no reference to those from individual devices.

In any event, the basic question about Apple is whether its new privacy policy is ultimately correct in saying that the company is only collecting “data in a form that does not permit direct association with any specific individual” — if that data includes such information as the phone's unique device identifier and location.

Cameron isn't the only one raising questions.

The Consumerist blog picked up on this issue last week, citing a separate portion of the revised privacy policy that says Apple and its partners and licensees “may collect, use, and share precise location data, including the real-time geographic location of your Apple computer or device.” The policy adds, “This location data is collected anonymously in a form that does not personally identify you and is used by Apple and our partners and licensees to provide and improve location-based products and services.”

The Consumerist called the language “creepy” and said it didn't find Apple's assurances about the lack of personal identification particularly comforting. Cameron, in a follow-up post, agreed with that sentiment.

SF Weekly and the Hypebot music technology blog also noted the new location-tracking language, and the fact that users must agree to the new privacy policy if they want to use the service.

“Though Apple states that the data is anonymous and does not enable the personal identification of users, they are left with little choice but to agree if they want to continue buying from iTunes,” Hypebot wrote.

We've left messages with Apple and Google to comment on any of this, and we'll update this post depending on the response.

And for the record, there is an option to email the Apple privacy policy from the phone to a computer for reading, and it's also available here, so you don't necessarily need to duplicate Cameron's feat by reading it all on your phone.

National Strategy for Trusted Identities in Cyberspace

Friday saw what I think is a historic post by Howard Schmidt on The Whitehouse Blog:

“Today, I am pleased to announce the latest step in moving our Nation forward in securing our cyberspace with the release of the draft National Strategy for Trusted Identities in Cyberspace (NSTIC).  This first draft of NSTIC was developed in collaboration with key government agencies, business leaders and privacy advocates. What has emerged is a blueprint to reduce cybersecurity vulnerabilities and improve online privacy protections through the use of trusted digital identities. “

I say the current draft is historic because of the grasp of identity issues it achieves

At the core of the document is a recognition that we need a solution supporting privacy-enhancing technologies and built by harnessing a user-centric Identity Ecosystem offering citizens and private enterprise plenty of choice.  

Finally we have before us a proposal that can move society forward in  protecting individual privacy and simultaneously create a secure and trustworthy infrastructure with enough protections to be resistant to insider attacks.  

Further, the work appears to have support from multiple government agencies – the Department of Homeland Security was a key partner in its creation. 

Here are the guiding principles (beginning page 8):

  • Identity solutions will be secure and resilient
  • Identity solutions will be interoperable
  • Identity solutions will be privacy enhancing and voluntary for the public
  • Identity solutions will be cost-effective and easy to use

Let's start with the final “s” on the word “solutions” – a major achievement.  The authors understand society needs a spectrum of approaches suitable for different use cases but fitting within a common interoperable framework – what I and others have called an identity metasystem. 

The report embraces the need for anonymous access as well as that for strong identification.  It stands firmly in favor of minimal disclosure.  The authors call out the requirement that solutions be privacy enhancing and voluntary for the public, rather than attempting to ram something bureaucratic down peoples’ throats.  And they are fully cognisant of the practicality and usability requirements for the initiative to be successful.  A few years ago I would not have believed this kind of progress would be possible.

Nor is the report just a theoretical treatment devoid of concrete proposals.  The section on “Commitment to Action” includes:

  • Designate a federal agency to lead the public/private sector efforts to advance the vision
  • Develop a shared, comprehensive public/private sector implementation plan
  • Accelerate the expansion of government services, pilots and policies that align with the identity ecosystem
  • Work to implement enhanced privacy protections
  • Coordinate the development and refinement of risk management and interoperability standards
  • Address liability concerns of service providers and individuals
  • Perform outreach and awareness across all stakeholders
  • Continue collaborating in international efforts
  • Identify other means to drive adoption

Readers should dive into the report – it is in a draft stage and “Public ideas and recommendations to further refine this Strategy are encouraged.”  

A number of people and organizations in the identity world have participated in getting this right, working closely with policy thinkers and those leading this initiative in government.  I don't hesitate to say that congratulations are due all round for getting this effort off to such a good start.

We can expect suggestions to be made strengthening various aspects of the report – mainly in terms of making it more internally consistent.  

For example, the report contains good vignettes about minimal disclosure and the use of claims to gain access to resources.  Yet it also retains the traditional notion that authentication is dependent on identification.  What is meant by identification?  Many will assume it means “unique identification” in the old-fashioned sense of associating someone with an identifier.  That doesn't jive with the notion of minimal disclosure present throughout the report.  Why? For many purposes association with an identifier is over-identification or unhelpful, and a simple proof of some set of claims would suffice to control access.  

But these refinements can be made fairly easily.  The real challenge will be to actually live up to the guiding principles as we move from high level statements to a widely deployed system – making it truly secure, resilient and privacy enhancing.  These are guiding principles we can use to measure our success and help select between alternatives.


Harvesting phone and laptop fingerprints for its database

In The core of the matter at hand I gave the example of someone attending a conference while subscribed to a geo-location service.  I argued that the subscriber's cell phone would pick up all the MAC addresses (which serve as digital fingerprints) of nearby phones and laptops and send them in to the centralized database service, which would look them up and potentially use the harvested addresses to further increase its knowledge of people's behavior – for example, generating a list of those attending the conference.

A reader wrote to express disbelief that the MAC addresses of non-subscribers would be collected by a company like Google.  So I close this series on WiFi device identifiers with this quote from what Google calls its “refresher FAQ” (emphasis in the quote below is mine):  

How does this location database work?

Google location based services using WiFi access point data work as follows:

  • The user’s device sends a request to the Google location server with a list of MAC addresses which are currently visible to the device;
  • The location server compares the MAC addresses seen by the user’s device with its list of known MAC addresses, and identifies associated geocoded locations (i.e. latitude / longitude);
  • The location server then uses the geocoded locations associated with visible MAC address to triangulate the approximate location of the user;
  • and this approximate location is geocoded and sent back to the user’s device.

So certainly the MAC addresses of all nearby phones and laptops are sent in to the geo-location server – not simply the MAC addresses of wireless access points that are broadcasting SSIDs.  And this is significant from a technical point of view.

Why not edit out the MAC addresses you don't need prior to transmission, reducing transmission size, cost and the amount of work that the central database server must do? Clearly, it was considered useful to collect all the phone fingerprints – including those of non-subscribers.  Of course Google's  WiFi cars also collect the same fingerprints – while driving past peoples’ homes.  So it is clearly possible for their system to match the fingerprints of non-subscribers to their home locations, and thus to their natural identities. 

Is this matching of non-subscribers being done today?  I have no idea.  But Google has put in place all the machinery to do it and pays a premium to operate its geolocation service so as to gather this information.  Further, if allowed to mature, the market for the extra intelligence collected about our behaviors will be immense.

So there is nothing unlikely about the scenario I describe.   I have now examined all the issues I wanted to bring to light and I'll move on to other matters for a while.


Trip down memory lane

Joe Mansfield's comment that Bluetooth “doesn’t appear to be all that bad from a privacy leakage perspective” left me rummaging through memory lane – awakening memories that may help explain why I now believe that world-wide databases of MAC addresses constitute a central socio-technical problem of our time.

I was taken back to an unforgettable experience I had in 2005 while working on the Laws of Identity.  I had finished the Fourth Law and understood theoretically why technical systems should use “unidirectional identifiers” (meaning identifiers limited to a defined context) rather than “universal identifiers” (things like social security numbers) unless the goal was to be completely public.  But there is a difference between understanding something theoretically and right in the gut.

Rather than retell the story, here is what I wrote on my blog in Just a few scanning machines on Tuesday 6 September 2005:

Since I seem to be on the subject of Bluetooth again, I want to tell you about an experience I had recently that put a gnarly visceral edge on my opposition to technologies that serve as tracking beacons for us as private individuals.

I was having lunch in San Diego with Paul Trevithick, Stefan Brands and Mary Rundle. Everyone knows Paul for his work with Social Physics and the Berkman identity wiki; Stefan is a tremendously innovative privacy cryptographer; and Mary is pushing the envelope on cyber law with Berkman and Stanford.

Suddenly Mary recalled the closing plenary at the Computers, Freedom and PrivacyPanopticon Conference” in Seattle.

She referred off-handedly to “the presentation where they flashed a slide tracking your whereabouts throughout the conference using your Bluetooth phone.”

Essentially I was flabbergasted. I had missed the final plenary, and had no idea this had happened.

MAC Name Room Time Talk
Kim Cameron Mobile
Grand I (G1) Wed 09:32 09:32 ????
Grand Crescent (gc) Wed 09:35 09:35 Adware and Privacy: Finding a Common Ground
Grand I (G1) Wed 09:37 09:37 ????
Grand Crescent (gc) Wed 09:41 09:42 Adware and Privacy: Finding a Common Ground
Grand I (G1) Wed 09:46 09:47 ????
Grand III (g3) Wed 10:18 10:30 Intelligent Video Surveillance
Baker (ol) Wed 10:33 10:42 Reforming E-mail and Digital Telephonic Privacy
Grand III (g3) Wed 10:47 10:48 Intelligent Video Surveillance
Grand Crescent (gc) Wed 11:25 11:26 Adware and Privacy: Finding a Common Ground
Grand III (g3) Wed 11:46 12:22 Intelligent Video Surveillance
5th Avenue (5a) Wed 12:33 12:55 ????
Grand III (g3) Wed 13:08 14:34 Plenary: Government CPOs: Are they worth fighting for?

Of course, to some extent I'm a public figure when it comes to identity matters, and tracking my participation at a privacy conference is, I suspect, fair game. Or at any rate, it's good theatre, and drives home the message of the Fourth Law, which makes the point that private individuals must not be subjected – without their knowledge or against their will – to technologies that create tracking beacons.

A picture named kim_cameron.JPGLater Mary introduced me to Paul Holman from The Shmoo Group. He was the person who had put this presentation together, and given our mutual friends I don't doubt his motives. In fact, I look forward to meeting him in person.

He told me:

“I take it you missed our quick presentation, but essentially, we just put Bluetooth scanning machines in a few of the conference rooms and had them log the devices they saw. This was a pretty unsophisticated exercise, showing only devices in discoverable mode. To get them all would be a lot more work. You could do the same kind of thing just monitoring for cell phones or WiFi devices or whatever. We were trying to illustrate a crude version of what will be possible with RFIDs.”

The Bluetooth tracking was tied in to the conference session titles, and by clicking on a link you could see the information represented graphically – including my escape to a conference center window so I could take a phone call.

Anyway, I think I have had a foretaste of how people will feel when networks of billboards and posters start tracking their locations and behaviors. They won't like it one bit. They'll push back.

A foretaste indeed

One of my readers wrote to say I should turn my Bluetooth broadcast off, and I responded:

You’re right, and I have turned it off. Which bothers me. Because I like some of the convenience I used to enjoy.

So I write about this because I’d rather leave my Bluetooth phone enabled, interacting only with devices run by entities I’ve told it to cooperate with.

We have a lot of work to do to get things to this point. I see our work on identity as being directed to that end, at least in part.

We need to be able to easily express and select the relationships we want to participate in – and avoid – as cyberspace progressively penetrates the world of physical things.

The problems of Bluetooth all exist in current Wifi too. My portable computer broadcasts another tracking beacon. I’m not picking on Bluetooth versus other technologies. Incredibly, they all need to be fixed. They’re all misdesigned.

If anything has shocked me while working on the Laws of Identity, it has been the discovery of how naive we’ve been in the design of these systems to date – a product of our failure to understand the Fourth Law of Identity. The potential for abuse of these systems is collosal – enterprises like the UK’s Filter are just the most benign tip of an ugly iceberg.

For everyone’s sake I try to refrain from filling in what the underside of this iceberg might look like

Google's Street View group, which has been assembling a massive central registry of WiFi MAC addresses, has definitely crawled out from under this iceberg, and the project is more sinister than any I imagined only a few years ago.

But so as not to leave everyone feeling completely depressed, all the dreams of Billboards that recognize you from your Bluetooth phone have now been abandoned by Bluetooth manufacturers, and the specification has been greatly improved in light of the criticism it received.  Let's hope that geo-location providers, and Google in particular, see the same light, and assure us they will no longer collect or store the MAC address of any device unless that collection is approved by the subscriber.

What does a MAC address tell you?

Joe Mansfield at Peccavi has published a nice, clear and abridged explanation of the issues I've been discussing over the last few weeks.  

But before doing that he makes an important and novel point about why regulation may be useful even if it can't “prevent all abuses”:

I’d discounted the payload snooping issue as a distraction because I’d believed (and still do) that it was almost certainly an unfortunate error. I’d then made the point that a legal barrier to a technical problem was insufficient to prevent the bad guys doing bad things but I used that as an excuse to ignore the problem – small scale abuses of this sort of thing are not good but systematic large scale abuses “benefit” from network scaling effects. You might not be able to prevent small scale\illegal abuse through legal means but just because you can’t does not mean that you can’t control large scale abuses this way. The benefits and dangers inherent in this data become exponentially worse as the scale of the database that contains it increases. Large scale means companies and companies react to regulation by being much more careful about what they do. If a technology that is already out there has major privacy issues the regulatory approach is the only way to keep a lid on the problem while the technologists argue about how to fix the bits. Even if we assume that the law was OK about companies creating Geo-location databases using WiFi SSID\MAC mapping, effective regulation would have made the additional mistake made by Google (assuming it was a mistake) much less likely.

Next he explains how WiFi works as a layered protocol in which MAC addresses are exposed despite encryption and SSID suppression:

Now the obvious question is should scanning for identifiers that are broadcast openly by all WiFi radio signals be acceptable and legal?

802.11 WiFi signals are pretty complex things – Wikipedia has a brief overview here for those who want to see the alphabet soup of standards involved. Despite the range of encoding\modulation schemes and the number of frequency bands and channels almost all 802.11 devices revert to a couple of basic communication modes. This makes it easy for devices to connect to each other, and it’s what makes public WiFi hotspots practical. However it also makes configuring a device to monitor WiFi traffic trivially easy – the hardware does all the heavy lifting and the standards don’t really do anything to stop it happening. An important feature of WiFi is that, even though the payload encryption standards can now be pretty robust, the data link layer is not protected from snooping. This means that the content (my Google searches, the video clip I’m streaming down from Youtube etc) can be pretty well kept away from prying eyes but, at what the Ethernet folks call layer 2, the logical structures called frames that carry your encrypted data transmit some control data in the open.

So even with WPA2’s thorough key management and AES encryption your WiFi traffic still contains quite a bit of chatter that isn’t hidden away. The really critical thing for me is that the layer 2 addresses, the Media Access Control (MAC) addresses, of the sender and receiver (generally your PC\Phone’s WiFi adaptor and your Access Point) for each frame are always visible. And remember that MAC addresses are globally unique identifiers by design. Individual WiFi networks are defined by another identifier, the Service Set Identifier or SSID – when you set up your home WiFi AP and call the network “MyWLAN” you are choosing an SSID. SSID’s are very important, you can’t connect to a wireless LAN without knowing the relevant SSID, but they are not secure even though they can be sort of hidden they are never protected and can always be seen by someone just watching your wireless traffic. Interestingly SSID’s are not globally unique – there’s generally no real issue so long as my chosen SSID doesn’t match that of another network that’s relatively close by.

So SSID’s are possibly visible but MAC addresses are definitely visible, and MAC addresses are unique. While driving along a street or sitting in a coffee shop, hotel lobby or conference room your WiFi adaptor will see dozens if not hundreds of WiFi packets all of which will contain globally unique MAC addresses. It is possible to hack some WiFi hardware to change the MAC address but that practice is rare. Your PC has a couple (one for the wired Ethernet adaptor which isn’t important here, and usually one for WiFi these days), your Wii\PS3\XBox-360 has one, so does your Nintendo DS, iPhone, PSP … you get the picture. Another feature of MAC addresses is that it is very easy to differentiate between the MAC address of a Linksys Access Point, an iPhone and a Nintendo DS – Network protocol analyzers have been doing that trick for decades.

So the systematic scanners out there (Google, Navizon, Skyhook and the rest) can drive around or recruit volunteers and gather location data and build databases of unique identifiers, device types, timestamps, signal strengths and possibly other data. The simplest (and most) benign use of that would be to pull out the ID’s of devices that are known to be fixed to one place (Access Points say) and use that for enabling Geo-location.

Joe then looks at what it means to start collecting and analyzing the MAC addresses of mobile devices.

It’s not a big leap to also track the MAC addresses that are more mobile. Get enough data points over a couple of months or years and the database will certainly contain many repeat detections of mobile MAC addresses at many different locations, with a decent chance of being able to identify a home or work address to go with it. Kim Cameron describes the start of this cascade effect in his most recent post, mapping the attendees at a conference to home addresses even when they’ve never consented to any such tracking is not going to be hard if you’ve gone to the trouble of scanning every street in every city in the country. With a minor bit of further analysis the same techniques could be used to get a good idea of the travel or shopping habits of almost everyone sitting in an airport departure lounge or the home addresses of everyone participating in a Stop The War protest.

And remember that even though you can only effectively use WiFi to send and receive data over a range of a few 10’s to maybe a 100m you can detect and read WiFi signals easily from 100’s to 1000’s of metres away without any special equipment.

The plans to blanket London with “Free WiFi” start to sound quite disturbing when you think about those possibilities.

To answer my own title question – MAC addresses can tell far more about you than you think and keeping databases of where and when they’ve been seen can be extremely dangerous in terms of privacy.

Finally, he compares WiFi to Bluetooth:

Bluetooth is a slightly different animal. It’s also a short range radio standard for data communications but it was developed from the ground up to replace wires and the folks building the standard got a lot of stuff right. It doesn’t appear to be all that bad from a privacy leakage perspective – when implemented correctly nothing is sent in clear text (the entire frame is encoded, not just the payload) and the frequency hopping RF behaviour makes it much harder to casually snoop on specific conversations. Bluetooth devices have a Bluetooth Device ID that is very like a MAC address (48 bits), with a manufacturer ID that enables broad classification of devices if the ID can be discovered but most Bluetooth devices keep that hidden most of the time by defaulting to a “not visible” mode even when Bluetooth is enabled. When actively communicating (paired) all data is encrypted so the device ID’s are not visible to a third party. Almost all modern Bluetooth devices only allow themselves to remain openly visible in this way for a short period of time before they revert to a safer non broadcasting mode. The main weakness is that when devices are set to “visible” the unique identifiers and other data can be scanned remotely and used in just the same way as scanned WiFi MAC addresses. That’s not to say that Bluetooth doesn’t have its share of security problems but they made an attempt to get some of the fundamentals right. It does also show that there is a practical way to approach the wireless privacy challenge which is good to see.

All in all a very nice explanation of the issues involved here.   The only thing I would add is that the early versions of Bluetooth had few of the privacy-respecting behaviors present in the recent specifications.  The consortium has really worked to clean up its act and we should all congratulate it.  This came about because privacy concerns came to be perceived as an adoption blocker. 

Title 18 – Part II – Chapter 206 – § 3121

Former federal prosecutor Paul Ohm says Google “likely” breached a U.S. federal criminal statute in connection with its accidental Wi-Fi sniffing — but not for siphoning private data from internet surfers using unsecured networks.

Instead, Mr. Ohm  thinks Google might have breached the Pen Register and Trap and Traces Device Act for intercepting the metadata and address information alongside the content.

According to Wikipedia, a “pen register is an electronic device that records all numbers dialed from a particular telephone line. The term has come to include any device or program that performs similar functions to an original pen register, including programs monitoring Internet communications.”

I'll expand on the identity implications in my next post, but to prepare the discussion, here is the statute to which Mr. Ohm is referring:

Title 18 – Part II – Chapter 206 – § 3121

  1. In General.— Except as provided in this section, no person may install or use a pen register or a trap and trace device without first obtaining a court order under section 3123 of this title or under the Foreign Intelligence Surveillance Act of 1978 (50 U.S.C. 1801 et seq.).
  2. Exception.— The prohibition of subsection (a) does not apply with respect to the use of a pen register or a trap and trace device by a provider of electronic or wire communication service—
    1. relating to the operation, maintenance, and testing of a wire or electronic communication service or to the protection of the rights or property of such provider, or to the protection of users of that service from abuse of service or unlawful use of service; or
    2. to record the fact that a wire or electronic communication was initiated or completed in order to protect such provider, another provider furnishing service toward the completion of the wire communication, or a user of that service, from fraudulent, unlawful or abusive use of service; or
    3. where the consent of the user of that service has been obtained.
  3. Limitation.— A government agency authorized to install and use a pen register or trap and trace device under this chapter or under State law shall use technology reasonably available to it that restricts the recording or decoding of electronic or other impulses to the dialing, routing, addressing, and signaling information utilized in the processing and transmitting of wire or electronic communications so as not to include the contents of any wire or electronic communications.
  4. Penalty.— Whoever knowingly violates subsection (a) shall be fined under this title or imprisoned not more than one year, or both.

Why location services have to be done right…

uTest describes itself as the world's largest marketplace for software testing services. Recently it held a Bug Battle to test the web and mobile applications of the leading “check-in” location services. A Bug Battle is a quarterly app testing competition, where “software professionals from around the world compete to find bugs and rank today's popular applications” (previous Bug Battles have focused on browsers, search engines, social networking sites, etc.

When evaluating location-based check-in services, testers were given ten days in May to report the most interesting and severe bugs, and to rank these applications based on

  • geo-location accuracy,
  • social media integration,
  • friend connectivity,
  • status recognition features and
  • ease-of use

uTest offered nearly $4,000 in prize money to those who submitted the best bugs for feedback.
The results of the battle, which rated Foursquare, Gowalla and Brightkite, are detailed here.

The report includes comments by people who clearly love the service. For example:

“The Gowalla app and web interface themselves are easy on the eyes, and venues get their own snazzy icon depicting what type of establishment it is. I feed my Gowalla check-ins to Facebook, and having an image that catches attention in a cluttered news feed matters. The user can see everyone who has checked in at a particular venue and how many times.”

But one clear outcome is that many testers reported serious bugs related to privacy and security – a category not present in the original list:

“The impact of check-in services on personal privacy and security took on a prominent role in this study. 80% of respondents responded “Yes” when asked if they were concerned about how location-based check-in services could impact their personal privacy and safety. Nearly half of respondents (49%) chose “privacy and security concerns” as the top reason they do not use check-in services more often.”

VentureBeat, which wrote about the report, concludes:

In addition to appreciating easy-to-use services and bemoaning the lack of Frappuccino deals, the testers seem to be concerned about the privacy and security implications of check-in services in general. 49% of testers said privacy and security concerns were the top reason they don’t use check-in services more often. This is something the check-in services need to address if they want to avoid privacy flames like the ones Facebook is constantly fighting.

These services can be built so as to respect and enhance privacy.  Things like giant world databases linking our devices to our home locations don't help convince anyone that we are doing that.

Rethink things in light of Google's Gstumbler report

A number of technical people have given Google the benefit of the doubt in the Street View Wifi case and as a result published information that Google's new “Gstumbler” report shows is completely incorrect.  It is important that people re-evaluate what they are saying in light of this report. 

I'll pick on Conor's recent posting on our discussion as an example – it contains a number of statements and implies a number of things explicitly contradicted by Google's new report.  Once he reads the report and applies the logic he has put forward, logic will require Conor to change his conclusions.

Conor begins with a bunch of statements that are true:

  • MAC addresses typically are persistent identifiers that by the definition of the protocols used in wireless APs can't be hidden from snoopers, even if you turn on encryption.
  • By themselves, MAC addresses are not all that useful except to communicate with a local network entity (so you need to be nearby on the same local network to use them.
  • When you combine MAC addresses with other information (locality, user identity, etc.) you can be creating worrisome data aggregations that when exposed publicly could have a detrimental impact on a user's privacy.
  • SSIDs have some of these properties as well, though the protocol clearly gives the user control over whether or not to broadcast (publicize) their SSID. The choice of the SSID value can have a substantial impact on it's use as a privacy invading value — a generic value such as “home” or “linksys” is much less likely to be a privacy issue than “ConorCahillsHomeAP”.

Wishful thinking and completely wrong

 These are followed by a statement that is just plain wishful thinking.  Conor continues:

  • Google purposely collected SSID and MAC Addresses from APs which were configured in SSID broadcast mode and inadvertently collected some network traffic data from those same APs. Google did not collect information from APs configured to not broadcast SSIDs.

Google's report says Conor is wrong about this, explicitly saying in paragraph 26, “Kismet can also detect the existence of networks with non-broadcast SSIDs, and will capture, parse, and record data from such networks“.   Conor continues:

  • Google associated the SSID and MAC information with some location information (probably the GPS vehicle location at the time the AP signal was strongest).

This is true, but it is important to indicate that this was not limited to access points.  Google's report says that it recorded the association between the MAC address and geographic location of all the active devices on the network.  When it did this, the MAC addresses became, according to Conor's own earlier definition, “worrisome data aggregations”.

  • There is no AP protocol defined means to differentiate between open wireless hotspots and closed hotspots which broadcast their SSIDs. 

This is true, but Google's report indicates this would not have mattered – it collected MACs regardless of whether SSIDs were broadcast.

  • I have not found out if Google used the encryption status of the APs in its decision about recording the SSID/MAC information for the AP.

Google's report indicates it did not.  It only used that status to decide whether or not to record the payload – and only recorded the payload of unencrypted frames…

I like Conor's logic that, “When you combine MAC addresses with other information (locality, user identity, etc.) you can be creating worrisome data aggregations that when exposed publicly could have a detrimental impact on a user's privacy.”   I urge Conor to read the Gstumbler report.  Once he knows what was actually happening, I hope he'll tell the world about it.


Gstumbler tells all

The third party commissioned by Google to review the software used in its Street View WiFi cars has completed its report, called Source Code Analysis of ‘Gstumbler’.  I will resist commenting on the name, since Google did the right thing in publishing the report:  there will no longer be any ambiguity about what was being collected. 

As we have discussed over the last week, two issues are of importance – collection of device identity data, and collection of payload data.  One thing I like about te report is that it has a begins with a a number of technical “descriptions and definitions”.  For example, in paragraph 7 it explains enveloping:

“Each packet is comprised of a packet header which contains network administrative information and the addressing information (or “envelope” information) necessary to transmit the data packet from one device to another along the path to its final destination.  Each packet also contains a “payload” which is a fragment of the “content” of the communication or data transmission sent or received over the internet…”

It explains that in 802.11 packets are encapsulated in frames, describes the types of frames and presents the standard diagram showing how a frame is structured.

Readers should understand that when network encryption is turned on, it is only the Frame Body (Payload) of data frames that is encrypted.

In paragraph 19, the report provides an overview of its findings:

“While running in memory, the program parses frame header information, such as frame type, MAC addresses, and other network administrative data from each of the captured frames.  The parsing separates the information into discreet fields for easier analysis… All available MAC addresses contained in a frame are also parsed.  All of this parsed header information is written to disk for frames transmitted over both encrypted and unencrypted wireless networks [emphasis mine – Kim].”

In paragraph 20, the report explains that the software discards the content of encrypted bodies (which of course it can't analyse anyway) whereas unencrypted bodies are also written to disk.  I have not discussed the issue of collecting the frame bodies in these pages – there is no need to do so since it is intuitively easy for people to understand what it means to collect payloads.

In paragraph 22 the report concludes that “all wireless frame data was recorded except for the bodies of 802.11 Data frames from encypted networks.” 

All device identifiers were recorded

As a result, there is no longer any question.  The MAC addresses of all the WiFi laptops and phones in the homes, businesses, enterprises and government buildings were recorded by the driveby mapping cars, as were the wireless access points, and this regardless of the use of encryption. 

My one quibble with the otherwise excellent report is that it calls the MAC addresses “network administrative data”.  In fact they are the device identifiers of the network devices – both of the network access point and the devices connecting to that access point – phones and laptops.

It is also worth, given some of the previous conversations about supposed “broadcasting”, drawing attention to paragraph 26, which explains,

“Kismet captures wireless frames using wireless network interface cards set to monitoring mode.  The use of monitoring mode means that Kismet directs the wireless hardware to listen for and process all wireless traffic regardless of its intended destination… Through the use of passive packet sniffing, Kismet can also detect the existence of netwrks with non-broadcast SSIDs, and will capture, parse, and record data from such networks.”


It is all Metcalfe’s fault

Christian Huitema, author of IPv6: The New Internet Protocol (2nd Edition) and one of the leading architects of IPV6, had this to tell us:

It is all Metcalfe’s fault. There is no real functional need to have the MAC addresses unique worldwide, but it certainly is very convenient. If they weren’t unique, we would have to add a protocol to detect address collisions and somehow resolve them. That’s hard enough for static attachments, but becomes really hairy when dealing with a high mobility environment, e.g. Wi-Fi enabled smart phones that connect to different base stations as we roam the corridors of the buildings. Making MAC addresses unique really simplified the design, but it did not actually spare the need for detecting duplicates. Simply, we treat that as an error, to be corrected by network management systems.

The initial design of IPv6 called for embedding the MAC addresses in the IPv6 address. A host IPv6 address would be built as the combination of a 64 bit network prefix and a MAC address expanded to 64 bits. Our Windows Networking team saw that as a serious issue, and we proposed an alternative design in which host pick randomized “host identifiers”, that vary each time they connect to a new network. That’s what you get by default in Vista and in Windows 7, although managers can still force the old “standard compliant” behavior and request that the identifier be set to the MAC address. I believe that most other operating systems just build IPv6 address using the MAC address.

The worldwide database of MAC addresses would be even more valuable if we had kept using MAC addresses in IPv6 addresses. In fact, it may be valuable enough as most smart phone stacks still do that. Web sites and other services see the incoming IPv6 address, extract the database, and, voila, precise identification of the caller identity, location, you name it. Picture Bill Joy’s smirk, “told you so.”

Your understanding of the Wi-Fi protocol is correct. Only the payload is encrypted, not the MAC header.  The 802.11 MAC header actually differs from the Ethernet MAC header and carries up to 4 MAC addresses: the immediate wireless sender, the intended wireless receiver, the original source and the final destination. The final destination is used for example when sending a packet from one mobile to another through a base station. Depending of scenarios, headers carry 2, 3 or 4 Mac addresses – addresses are not repeated, for example, when the original source is the same as the wireless sender, or when the intended destination is the same as the wireless destination.

The MAC header itself was not protected, at least not initially. This can lead to possible spoofing of control frames, e.g. disconnection requests. 802.11 in 2009 defined 802.11w to add protection to management frames, but this is essentially an anti-spoofing standard. It may optionally encrypt some management data, but it cannot encrypt the wireless MAC addresses.

These are very important points.  The problem of moving between multiple base stations in the same network would make MAC encryption a non-starter unless we took a heavy dependence on communication between the base stations, introducing the reliability concerns that implies.  In other words, the problem is not quite as simple as Hal Berenson suggested here. 

Yet Christian has found an elegant and simple alternative.  I really take my hat off to him  for having been visionary enough – and sufficiently tuned into identity issues – to generate, by default, a different IPV6 MAC address for each network a device connects to.   I remember Christian discussing the issues and telling me he saw this as a possibility but had no idea until now he had succeeded in getting it out the door and onto millions of devices. 

This approach solves the linking problem I've been describing, because the MAC address snooped in your home would be different from the MAC address generated should you go to your workplace or attend a conference.   In essence, Christian has made the IPV6 MAC addresses properly unidirectional, in the sense of being contextually specific identifiers, and in this sense has brought IP into conformance with the Fourth Law of Identity.

Although this benefit only kicks in as the infrastructure evolves to IPV6, it establishes the fact that the end-state we will reach is one in which WiFi snooping won't provide the ability to link people across contexts as various commercial interests are currently attempting to do. 

It also, in my view, gives me confidence that regulation preventing collection and linking of MAC addresses would be totally consistent with the direction technological evolution will take us in anyway.   This is really key, since we never want regulation to tell technologists what to do – only, in protecting the public, to tell us what not to do

There is, however, a macabre side to Christian's comment. 

Implementations of IPV6 that do always include a persistent and unchanging MAC address in their IPV6 address need to be fixed.  They make the problem of unique identification across contexts worse, not better, since the MAC address moves up the stack to the IP layer…   We need the people responsible for these implementations to understand the issues and provide privacy-friendly alternatives just as Christian did.  Looks like there is more work to be done…