Trip down memory lane

Joe Mansfield's comment that Bluetooth “doesn’t appear to be all that bad from a privacy leakage perspective” left me rummaging through memory lane – awakening memories that may help explain why I now believe that world-wide databases of MAC addresses constitute a central socio-technical problem of our time.

I was taken back to an unforgettable experience I had in 2005 while working on the Laws of Identity.  I had finished the Fourth Law and understood theoretically why technical systems should use “unidirectional identifiers” (meaning identifiers limited to a defined context) rather than “universal identifiers” (things like social security numbers) unless the goal was to be completely public.  But there is a difference between understanding something theoretically and right in the gut.

Rather than retell the story, here is what I wrote on my blog in Just a few scanning machines on Tuesday 6 September 2005:

Since I seem to be on the subject of Bluetooth again, I want to tell you about an experience I had recently that put a gnarly visceral edge on my opposition to technologies that serve as tracking beacons for us as private individuals.

I was having lunch in San Diego with Paul Trevithick, Stefan Brands and Mary Rundle. Everyone knows Paul for his work with Social Physics and the Berkman identity wiki; Stefan is a tremendously innovative privacy cryptographer; and Mary is pushing the envelope on cyber law with Berkman and Stanford.

Suddenly Mary recalled the closing plenary at the Computers, Freedom and PrivacyPanopticon Conference” in Seattle.

She referred off-handedly to “the presentation where they flashed a slide tracking your whereabouts throughout the conference using your Bluetooth phone.”

Essentially I was flabbergasted. I had missed the final plenary, and had no idea this had happened.

MAC Name Room Time Talk
Kim Cameron Mobile
00:09:2D:02:9A:68
Grand I (G1) Wed 09:32 09:32 ????
Grand Crescent (gc) Wed 09:35 09:35 Adware and Privacy: Finding a Common Ground
Grand I (G1) Wed 09:37 09:37 ????
Grand Crescent (gc) Wed 09:41 09:42 Adware and Privacy: Finding a Common Ground
Grand I (G1) Wed 09:46 09:47 ????
Grand III (g3) Wed 10:18 10:30 Intelligent Video Surveillance
Baker (ol) Wed 10:33 10:42 Reforming E-mail and Digital Telephonic Privacy
Grand III (g3) Wed 10:47 10:48 Intelligent Video Surveillance
Grand Crescent (gc) Wed 11:25 11:26 Adware and Privacy: Finding a Common Ground
Grand III (g3) Wed 11:46 12:22 Intelligent Video Surveillance
5th Avenue (5a) Wed 12:33 12:55 ????
Grand III (g3) Wed 13:08 14:34 Plenary: Government CPOs: Are they worth fighting for?

Of course, to some extent I'm a public figure when it comes to identity matters, and tracking my participation at a privacy conference is, I suspect, fair game. Or at any rate, it's good theatre, and drives home the message of the Fourth Law, which makes the point that private individuals must not be subjected – without their knowledge or against their will – to technologies that create tracking beacons.

A picture named kim_cameron.JPGLater Mary introduced me to Paul Holman from The Shmoo Group. He was the person who had put this presentation together, and given our mutual friends I don't doubt his motives. In fact, I look forward to meeting him in person.

He told me:

“I take it you missed our quick presentation, but essentially, we just put Bluetooth scanning machines in a few of the conference rooms and had them log the devices they saw. This was a pretty unsophisticated exercise, showing only devices in discoverable mode. To get them all would be a lot more work. You could do the same kind of thing just monitoring for cell phones or WiFi devices or whatever. We were trying to illustrate a crude version of what will be possible with RFIDs.”

The Bluetooth tracking was tied in to the conference session titles, and by clicking on a link you could see the information represented graphically – including my escape to a conference center window so I could take a phone call.

Anyway, I think I have had a foretaste of how people will feel when networks of billboards and posters start tracking their locations and behaviors. They won't like it one bit. They'll push back.

A foretaste indeed

One of my readers wrote to say I should turn my Bluetooth broadcast off, and I responded:

You’re right, and I have turned it off. Which bothers me. Because I like some of the convenience I used to enjoy.

So I write about this because I’d rather leave my Bluetooth phone enabled, interacting only with devices run by entities I’ve told it to cooperate with.

We have a lot of work to do to get things to this point. I see our work on identity as being directed to that end, at least in part.

We need to be able to easily express and select the relationships we want to participate in – and avoid – as cyberspace progressively penetrates the world of physical things.

The problems of Bluetooth all exist in current Wifi too. My portable computer broadcasts another tracking beacon. I’m not picking on Bluetooth versus other technologies. Incredibly, they all need to be fixed. They’re all misdesigned.

If anything has shocked me while working on the Laws of Identity, it has been the discovery of how naive we’ve been in the design of these systems to date – a product of our failure to understand the Fourth Law of Identity. The potential for abuse of these systems is collosal – enterprises like the UK’s Filter are just the most benign tip of an ugly iceberg.

For everyone’s sake I try to refrain from filling in what the underside of this iceberg might look like

Google's Street View group, which has been assembling a massive central registry of WiFi MAC addresses, has definitely crawled out from under this iceberg, and the project is more sinister than any I imagined only a few years ago.

But so as not to leave everyone feeling completely depressed, all the dreams of Billboards that recognize you from your Bluetooth phone have now been abandoned by Bluetooth manufacturers, and the specification has been greatly improved in light of the criticism it received.  Let's hope that geo-location providers, and Google in particular, see the same light, and assure us they will no longer collect or store the MAC address of any device unless that collection is approved by the subscriber.

What does a MAC address tell you?

Joe Mansfield at Peccavi has published a nice, clear and abridged explanation of the issues I've been discussing over the last few weeks.  

But before doing that he makes an important and novel point about why regulation may be useful even if it can't “prevent all abuses”:

I’d discounted the payload snooping issue as a distraction because I’d believed (and still do) that it was almost certainly an unfortunate error. I’d then made the point that a legal barrier to a technical problem was insufficient to prevent the bad guys doing bad things but I used that as an excuse to ignore the problem – small scale abuses of this sort of thing are not good but systematic large scale abuses “benefit” from network scaling effects. You might not be able to prevent small scale\illegal abuse through legal means but just because you can’t does not mean that you can’t control large scale abuses this way. The benefits and dangers inherent in this data become exponentially worse as the scale of the database that contains it increases. Large scale means companies and companies react to regulation by being much more careful about what they do. If a technology that is already out there has major privacy issues the regulatory approach is the only way to keep a lid on the problem while the technologists argue about how to fix the bits. Even if we assume that the law was OK about companies creating Geo-location databases using WiFi SSID\MAC mapping, effective regulation would have made the additional mistake made by Google (assuming it was a mistake) much less likely.

Next he explains how WiFi works as a layered protocol in which MAC addresses are exposed despite encryption and SSID suppression:

Now the obvious question is should scanning for identifiers that are broadcast openly by all WiFi radio signals be acceptable and legal?

802.11 WiFi signals are pretty complex things – Wikipedia has a brief overview here for those who want to see the alphabet soup of standards involved. Despite the range of encoding\modulation schemes and the number of frequency bands and channels almost all 802.11 devices revert to a couple of basic communication modes. This makes it easy for devices to connect to each other, and it’s what makes public WiFi hotspots practical. However it also makes configuring a device to monitor WiFi traffic trivially easy – the hardware does all the heavy lifting and the standards don’t really do anything to stop it happening. An important feature of WiFi is that, even though the payload encryption standards can now be pretty robust, the data link layer is not protected from snooping. This means that the content (my Google searches, the video clip I’m streaming down from Youtube etc) can be pretty well kept away from prying eyes but, at what the Ethernet folks call layer 2, the logical structures called frames that carry your encrypted data transmit some control data in the open.

So even with WPA2’s thorough key management and AES encryption your WiFi traffic still contains quite a bit of chatter that isn’t hidden away. The really critical thing for me is that the layer 2 addresses, the Media Access Control (MAC) addresses, of the sender and receiver (generally your PC\Phone’s WiFi adaptor and your Access Point) for each frame are always visible. And remember that MAC addresses are globally unique identifiers by design. Individual WiFi networks are defined by another identifier, the Service Set Identifier or SSID – when you set up your home WiFi AP and call the network “MyWLAN” you are choosing an SSID. SSID’s are very important, you can’t connect to a wireless LAN without knowing the relevant SSID, but they are not secure even though they can be sort of hidden they are never protected and can always be seen by someone just watching your wireless traffic. Interestingly SSID’s are not globally unique – there’s generally no real issue so long as my chosen SSID doesn’t match that of another network that’s relatively close by.

So SSID’s are possibly visible but MAC addresses are definitely visible, and MAC addresses are unique. While driving along a street or sitting in a coffee shop, hotel lobby or conference room your WiFi adaptor will see dozens if not hundreds of WiFi packets all of which will contain globally unique MAC addresses. It is possible to hack some WiFi hardware to change the MAC address but that practice is rare. Your PC has a couple (one for the wired Ethernet adaptor which isn’t important here, and usually one for WiFi these days), your Wii\PS3\XBox-360 has one, so does your Nintendo DS, iPhone, PSP … you get the picture. Another feature of MAC addresses is that it is very easy to differentiate between the MAC address of a Linksys Access Point, an iPhone and a Nintendo DS – Network protocol analyzers have been doing that trick for decades.

So the systematic scanners out there (Google, Navizon, Skyhook and the rest) can drive around or recruit volunteers and gather location data and build databases of unique identifiers, device types, timestamps, signal strengths and possibly other data. The simplest (and most) benign use of that would be to pull out the ID’s of devices that are known to be fixed to one place (Access Points say) and use that for enabling Geo-location.

Joe then looks at what it means to start collecting and analyzing the MAC addresses of mobile devices.

It’s not a big leap to also track the MAC addresses that are more mobile. Get enough data points over a couple of months or years and the database will certainly contain many repeat detections of mobile MAC addresses at many different locations, with a decent chance of being able to identify a home or work address to go with it. Kim Cameron describes the start of this cascade effect in his most recent post, mapping the attendees at a conference to home addresses even when they’ve never consented to any such tracking is not going to be hard if you’ve gone to the trouble of scanning every street in every city in the country. With a minor bit of further analysis the same techniques could be used to get a good idea of the travel or shopping habits of almost everyone sitting in an airport departure lounge or the home addresses of everyone participating in a Stop The War protest.

And remember that even though you can only effectively use WiFi to send and receive data over a range of a few 10’s to maybe a 100m you can detect and read WiFi signals easily from 100’s to 1000’s of metres away without any special equipment.

The plans to blanket London with “Free WiFi” start to sound quite disturbing when you think about those possibilities.

To answer my own title question – MAC addresses can tell far more about you than you think and keeping databases of where and when they’ve been seen can be extremely dangerous in terms of privacy.

Finally, he compares WiFi to Bluetooth:

Bluetooth is a slightly different animal. It’s also a short range radio standard for data communications but it was developed from the ground up to replace wires and the folks building the standard got a lot of stuff right. It doesn’t appear to be all that bad from a privacy leakage perspective – when implemented correctly nothing is sent in clear text (the entire frame is encoded, not just the payload) and the frequency hopping RF behaviour makes it much harder to casually snoop on specific conversations. Bluetooth devices have a Bluetooth Device ID that is very like a MAC address (48 bits), with a manufacturer ID that enables broad classification of devices if the ID can be discovered but most Bluetooth devices keep that hidden most of the time by defaulting to a “not visible” mode even when Bluetooth is enabled. When actively communicating (paired) all data is encrypted so the device ID’s are not visible to a third party. Almost all modern Bluetooth devices only allow themselves to remain openly visible in this way for a short period of time before they revert to a safer non broadcasting mode. The main weakness is that when devices are set to “visible” the unique identifiers and other data can be scanned remotely and used in just the same way as scanned WiFi MAC addresses. That’s not to say that Bluetooth doesn’t have its share of security problems but they made an attempt to get some of the fundamentals right. It does also show that there is a practical way to approach the wireless privacy challenge which is good to see.

All in all a very nice explanation of the issues involved here.   The only thing I would add is that the early versions of Bluetooth had few of the privacy-respecting behaviors present in the recent specifications.  The consortium has really worked to clean up its act and we should all congratulate it.  This came about because privacy concerns came to be perceived as an adoption blocker. 

Does the non-content trump the content?

In my previous post I referred to an interesting Wired story in which former U.S. federal prosecutor Paul Ohm says Google “likely” breached a U.S. federal criminal statute by intercepting the metadata and address information on residential and business WiFi networks.  The statute refers to a “pen register” – an electronic device that records all numbers dialed from a particular telephone line.  Wikipedia tells us the term has come to include any device or program that performs similar functions to an original pen register, including programs monitoring Internet communications.”  The story continues:

“I think it’s likely they committed a criminal misdemeanor of the Pen Register and Trap and Traces Device Act,” said Ohm, a prosecutor from 2001 to 2005 in the Justice Department’s Computer Crime and Intellectual Property Section. “For every packet they intercepted, not only did they get the content, they also have your IP address and destination IP address that they intercepted. The e-mail message from you to somebody else, the ‘to’ and ‘from’ line is also intercepted.”

“This is a huge irony, that this might come down to the non-content they acquired,” (.pdf) said Ohm, a professor at the University of Colorado School of Law.

I understand how people unacquainted with the emerging role of identity in the Internet can see this as an irony – a kind of side-effect – whereas in reality Google's plan to establish a vast centralized database of device identifiers has much longer-term consequences than the misappropriation of content.  Metadata is no less important than other data –  and “addresses” being referred to are really device identifiers clearly associated with individual users, much like the telephone numbers to which the statute applies.  Given the similarity to issues that arose with pre-Internet communication, we should perhaps not be surprised that there may already be regulation in place that prevents “registering” of the identifiers.

The Wired article continues:

Google said it was a coding error that led it to sniff as much as 600 gigabytes of data across dozens of countries as it was snapping photos for its Street View project. The data likely included webpages users visited and pieces of e-mail, video and document files…

The pen register act described by Ohm, which he said is rarely prosecuted, is usually thought of in terms of preventing unauthorized monitoring of outbound and inbound telephone numbers.

Violations are a misdemeanor and cannot be prosecuted by private lawyers in civil court, Ohm said. He said the act requires that Google “knew, or should have known” of the activity in question.

Google denies any wrongdoing.

In fact, Google knew about the collection of MAC addresses, and has never said otherwise or stated that their collection of these addresses was done accidently.  In fact they have been careful to never state explicitly that their collection was limited to Wireless Access Points.  The Gstumbler report makes it clear they were parsing and recording both the source and destination MAC addresses in all the WiFi frames they intercepted. 

The Wired article explains:

As far as a criminal court goes, it is not considered wiretapping “to intercept or access an electronic communication made through an electronic communication system that is configured so that such electronic communication is readily accessible to the general public.”

It is not known how many non-password-protected Wi-Fi networks there are in the United States.

What makes this especially interesting is the fact that it is not possible to configure a WiFi network so that the MAC addresses are hidden.  Use of passwords protects the communication content carried by the network, but does not protect the MAC addresses.  Configuring the WIreless Access Point not to broadcast an SSID does not prevent eavesdropping on MAC addresses either.   Yet we can hardly say the metadata is readily accessible to the general public, since it cannot be detected except acquiring and using very specialized programs. 

Wired draws the conclusion that,  “The U.S. courts have not clearly addressed the issue involved in the Google flap.”

 

Title 18 – Part II – Chapter 206 – § 3121

Former federal prosecutor Paul Ohm says Google “likely” breached a U.S. federal criminal statute in connection with its accidental Wi-Fi sniffing — but not for siphoning private data from internet surfers using unsecured networks.

Instead, Mr. Ohm  thinks Google might have breached the Pen Register and Trap and Traces Device Act for intercepting the metadata and address information alongside the content.

According to Wikipedia, a “pen register is an electronic device that records all numbers dialed from a particular telephone line. The term has come to include any device or program that performs similar functions to an original pen register, including programs monitoring Internet communications.”

I'll expand on the identity implications in my next post, but to prepare the discussion, here is the statute to which Mr. Ohm is referring:

Title 18 – Part II – Chapter 206 – § 3121

  1. In General.— Except as provided in this section, no person may install or use a pen register or a trap and trace device without first obtaining a court order under section 3123 of this title or under the Foreign Intelligence Surveillance Act of 1978 (50 U.S.C. 1801 et seq.).
  2. Exception.— The prohibition of subsection (a) does not apply with respect to the use of a pen register or a trap and trace device by a provider of electronic or wire communication service—
    1. relating to the operation, maintenance, and testing of a wire or electronic communication service or to the protection of the rights or property of such provider, or to the protection of users of that service from abuse of service or unlawful use of service; or
    2. to record the fact that a wire or electronic communication was initiated or completed in order to protect such provider, another provider furnishing service toward the completion of the wire communication, or a user of that service, from fraudulent, unlawful or abusive use of service; or
    3. where the consent of the user of that service has been obtained.
  3. Limitation.— A government agency authorized to install and use a pen register or trap and trace device under this chapter or under State law shall use technology reasonably available to it that restricts the recording or decoding of electronic or other impulses to the dialing, routing, addressing, and signaling information utilized in the processing and transmitting of wire or electronic communications so as not to include the contents of any wire or electronic communications.
  4. Penalty.— Whoever knowingly violates subsection (a) shall be fined under this title or imprisoned not more than one year, or both.

Conor changes his mind

Conor Cahill has taken a look at the Gstumbler report.  His conclusion is:

Given this new information I would have to agree that Google has clearly stepped into the arena of doing something that could be detrimental to the user's privacy.

Conor explains that, “the information in the report is quite different than the information that had been published at the time I expressed my opinions on the events at hand.”

He argues:

  1. “We had been led to believe that Google had only captured data on open wireless networks (networks that broadcast their SSIDs and/or were unencrypted). The analysis of the software shows that to be incorrect — Google captured data on every network regardless of the state of openness. So no matter what the user did to try to protect their network, Google captured data that the underlying protocols required to be transmitted in the clear.
  2. “We had been led to believe that Google had only captured data from wireless access points (APs). Again the analysis shows that this was incorrect — Google captured data on any device for which it was able to capture the wireless traffic for (AP or user device). So portable devices that were currently transmitting as the Street View vehicle passed would have their data captured.”

Anyone who knows Conor knows he is a gentlemanly model of how people should behave towards each other in our industry.  I understand his position fully, and respect it.  He says:

[Kim] seems to have a particular fondness for the phrase “wrong,” “completely wrong,” and “wishful thinking” when referring to my comments on the topic.  In my defense, I will say that there was no “wishful thinking” going on in my mind. I was just examining the published information rather than jumping to conclusions — something that I will always advocate. In this case, after examining the published report, it does appear that those who jumped to conclusions happened to be closer to the mark, but I still think they were wrong to jump to those conclusions until the actual facts had been published.

I can't disagree that Google's public relations messages may well have been crafted to leave the impression that their wireless eavesdropping was only directed at network access points.  But if you read them extremely carefully you see they refrain from making any such claims. 

At any rate, Conor needs no defense and I accept his point.  People who took the view that Google couldn't possibly have been doing what I claimed were acting based on the messages the company conveyed.  Sadly, if people of Conor's undisputed technical sophistication are misled by this kind of public relations campaign, the crafting of the information might also be considered suspect.

[More of Conor's post here]

 

Latitude privacy policy doesn't fess up to what Google stores

Never one to mince words, Jackson Shaw asks, “To the Google privacy core – Is it rotten?”  He writes,

“I read Kim’s post and immediately decided to turn off Google’s Latitude service on my phone but, as Kim illustrates, it probably won’t make any difference…

“I took a few minutes to check out Google’s privacy policy around Latitude and found out this much:

“If you choose to ‘Hide your location’, you can hide from your Latitude friends all at once, so they won't be able to see your location. If you hide in Latitude, we don't store your location.

“I’m not worried about hiding in Latitude. I wish I could hide from Google!”

The funny thing here is that Google already stores our residential locations through association with our devices, as indicated by its Gstumbler report, contradicting the Latitude privacy policy.

Jackson then directs us to a Wired article that is tremendously germane to this discussion – partly because of what it says about the current legal environment in the US, and partly because it reflects the very real problem that, in general, neither technologists nor policy makers understand that tapping of device identifiers is as serious as theft of content. 

See:  “Former Prosecutor: Google Wi-Fi Snafu ‘Likely’ Illegal ” – I'll discuss it next.  

Rethink things in light of Google's Gstumbler report

A number of technical people have given Google the benefit of the doubt in the Street View Wifi case and as a result published information that Google's new “Gstumbler” report shows is completely incorrect.  It is important that people re-evaluate what they are saying in light of this report. 

I'll pick on Conor's recent posting on our discussion as an example – it contains a number of statements and implies a number of things explicitly contradicted by Google's new report.  Once he reads the report and applies the logic he has put forward, logic will require Conor to change his conclusions.

Conor begins with a bunch of statements that are true:

  • MAC addresses typically are persistent identifiers that by the definition of the protocols used in wireless APs can't be hidden from snoopers, even if you turn on encryption.
  • By themselves, MAC addresses are not all that useful except to communicate with a local network entity (so you need to be nearby on the same local network to use them.
  • When you combine MAC addresses with other information (locality, user identity, etc.) you can be creating worrisome data aggregations that when exposed publicly could have a detrimental impact on a user's privacy.
  • SSIDs have some of these properties as well, though the protocol clearly gives the user control over whether or not to broadcast (publicize) their SSID. The choice of the SSID value can have a substantial impact on it's use as a privacy invading value — a generic value such as “home” or “linksys” is much less likely to be a privacy issue than “ConorCahillsHomeAP”.

Wishful thinking and completely wrong

 These are followed by a statement that is just plain wishful thinking.  Conor continues:

  • Google purposely collected SSID and MAC Addresses from APs which were configured in SSID broadcast mode and inadvertently collected some network traffic data from those same APs. Google did not collect information from APs configured to not broadcast SSIDs.

Google's report says Conor is wrong about this, explicitly saying in paragraph 26, “Kismet can also detect the existence of networks with non-broadcast SSIDs, and will capture, parse, and record data from such networks“.   Conor continues:

  • Google associated the SSID and MAC information with some location information (probably the GPS vehicle location at the time the AP signal was strongest).

This is true, but it is important to indicate that this was not limited to access points.  Google's report says that it recorded the association between the MAC address and geographic location of all the active devices on the network.  When it did this, the MAC addresses became, according to Conor's own earlier definition, “worrisome data aggregations”.

  • There is no AP protocol defined means to differentiate between open wireless hotspots and closed hotspots which broadcast their SSIDs. 

This is true, but Google's report indicates this would not have mattered – it collected MACs regardless of whether SSIDs were broadcast.

  • I have not found out if Google used the encryption status of the APs in its decision about recording the SSID/MAC information for the AP.

Google's report indicates it did not.  It only used that status to decide whether or not to record the payload – and only recorded the payload of unencrypted frames…

I like Conor's logic that, “When you combine MAC addresses with other information (locality, user identity, etc.) you can be creating worrisome data aggregations that when exposed publicly could have a detrimental impact on a user's privacy.”   I urge Conor to read the Gstumbler report.  Once he knows what was actually happening, I hope he'll tell the world about it.

 

Gstumbler tells all

The third party commissioned by Google to review the software used in its Street View WiFi cars has completed its report, called Source Code Analysis of ‘Gstumbler’.  I will resist commenting on the name, since Google did the right thing in publishing the report:  there will no longer be any ambiguity about what was being collected. 

As we have discussed over the last week, two issues are of importance – collection of device identity data, and collection of payload data.  One thing I like about te report is that it has a begins with a a number of technical “descriptions and definitions”.  For example, in paragraph 7 it explains enveloping:

“Each packet is comprised of a packet header which contains network administrative information and the addressing information (or “envelope” information) necessary to transmit the data packet from one device to another along the path to its final destination.  Each packet also contains a “payload” which is a fragment of the “content” of the communication or data transmission sent or received over the internet…”

It explains that in 802.11 packets are encapsulated in frames, describes the types of frames and presents the standard diagram showing how a frame is structured.

Readers should understand that when network encryption is turned on, it is only the Frame Body (Payload) of data frames that is encrypted.

In paragraph 19, the report provides an overview of its findings:

“While running in memory, the program parses frame header information, such as frame type, MAC addresses, and other network administrative data from each of the captured frames.  The parsing separates the information into discreet fields for easier analysis… All available MAC addresses contained in a frame are also parsed.  All of this parsed header information is written to disk for frames transmitted over both encrypted and unencrypted wireless networks [emphasis mine – Kim].”

In paragraph 20, the report explains that the software discards the content of encrypted bodies (which of course it can't analyse anyway) whereas unencrypted bodies are also written to disk.  I have not discussed the issue of collecting the frame bodies in these pages – there is no need to do so since it is intuitively easy for people to understand what it means to collect payloads.

In paragraph 22 the report concludes that “all wireless frame data was recorded except for the bodies of 802.11 Data frames from encypted networks.” 

All device identifiers were recorded

As a result, there is no longer any question.  The MAC addresses of all the WiFi laptops and phones in the homes, businesses, enterprises and government buildings were recorded by the driveby mapping cars, as were the wireless access points, and this regardless of the use of encryption. 

My one quibble with the otherwise excellent report is that it calls the MAC addresses “network administrative data”.  In fact they are the device identifiers of the network devices – both of the network access point and the devices connecting to that access point – phones and laptops.

It is also worth, given some of the previous conversations about supposed “broadcasting”, drawing attention to paragraph 26, which explains,

“Kismet captures wireless frames using wireless network interface cards set to monitoring mode.  The use of monitoring mode means that Kismet directs the wireless hardware to listen for and process all wireless traffic regardless of its intended destination… Through the use of passive packet sniffing, Kismet can also detect the existence of netwrks with non-broadcast SSIDs, and will capture, parse, and record data from such networks.”

 

It is all Metcalfe’s fault

Christian Huitema, author of IPv6: The New Internet Protocol (2nd Edition) and one of the leading architects of IPV6, had this to tell us:

It is all Metcalfe’s fault. There is no real functional need to have the MAC addresses unique worldwide, but it certainly is very convenient. If they weren’t unique, we would have to add a protocol to detect address collisions and somehow resolve them. That’s hard enough for static attachments, but becomes really hairy when dealing with a high mobility environment, e.g. Wi-Fi enabled smart phones that connect to different base stations as we roam the corridors of the buildings. Making MAC addresses unique really simplified the design, but it did not actually spare the need for detecting duplicates. Simply, we treat that as an error, to be corrected by network management systems.

The initial design of IPv6 called for embedding the MAC addresses in the IPv6 address. A host IPv6 address would be built as the combination of a 64 bit network prefix and a MAC address expanded to 64 bits. Our Windows Networking team saw that as a serious issue, and we proposed an alternative design in which host pick randomized “host identifiers”, that vary each time they connect to a new network. That’s what you get by default in Vista and in Windows 7, although managers can still force the old “standard compliant” behavior and request that the identifier be set to the MAC address. I believe that most other operating systems just build IPv6 address using the MAC address.

The worldwide database of MAC addresses would be even more valuable if we had kept using MAC addresses in IPv6 addresses. In fact, it may be valuable enough as most smart phone stacks still do that. Web sites and other services see the incoming IPv6 address, extract the database, and, voila, precise identification of the caller identity, location, you name it. Picture Bill Joy’s smirk, “told you so.”

Your understanding of the Wi-Fi protocol is correct. Only the payload is encrypted, not the MAC header.  The 802.11 MAC header actually differs from the Ethernet MAC header and carries up to 4 MAC addresses: the immediate wireless sender, the intended wireless receiver, the original source and the final destination. The final destination is used for example when sending a packet from one mobile to another through a base station. Depending of scenarios, headers carry 2, 3 or 4 Mac addresses – addresses are not repeated, for example, when the original source is the same as the wireless sender, or when the intended destination is the same as the wireless destination.

The MAC header itself was not protected, at least not initially. This can lead to possible spoofing of control frames, e.g. disconnection requests. 802.11 in 2009 defined 802.11w to add protection to management frames, but this is essentially an anti-spoofing standard. It may optionally encrypt some management data, but it cannot encrypt the wireless MAC addresses.

These are very important points.  The problem of moving between multiple base stations in the same network would make MAC encryption a non-starter unless we took a heavy dependence on communication between the base stations, introducing the reliability concerns that implies.  In other words, the problem is not quite as simple as Hal Berenson suggested here. 

Yet Christian has found an elegant and simple alternative.  I really take my hat off to him  for having been visionary enough – and sufficiently tuned into identity issues – to generate, by default, a different IPV6 MAC address for each network a device connects to.   I remember Christian discussing the issues and telling me he saw this as a possibility but had no idea until now he had succeeded in getting it out the door and onto millions of devices. 

This approach solves the linking problem I've been describing, because the MAC address snooped in your home would be different from the MAC address generated should you go to your workplace or attend a conference.   In essence, Christian has made the IPV6 MAC addresses properly unidirectional, in the sense of being contextually specific identifiers, and in this sense has brought IP into conformance with the Fourth Law of Identity.

Although this benefit only kicks in as the infrastructure evolves to IPV6, it establishes the fact that the end-state we will reach is one in which WiFi snooping won't provide the ability to link people across contexts as various commercial interests are currently attempting to do. 

It also, in my view, gives me confidence that regulation preventing collection and linking of MAC addresses would be totally consistent with the direction technological evolution will take us in anyway.   This is really key, since we never want regulation to tell technologists what to do – only, in protecting the public, to tell us what not to do

There is, however, a macabre side to Christian's comment. 

Implementations of IPV6 that do always include a persistent and unchanging MAC address in their IPV6 address need to be fixed.  They make the problem of unique identification across contexts worse, not better, since the MAC address moves up the stack to the IP layer…   We need the people responsible for these implementations to understand the issues and provide privacy-friendly alternatives just as Christian did.  Looks like there is more work to be done… 

“We could all be wrong about the way 802.11 works…”

I received a comment from a reader who plays an important role in the network protection industry which reads:

“I was a bit surprised by you going on about Google getting the MAC addresses of devices in people's home. I asked a few other security folks, and none of us could figure out why you thought that Google had these addresses.

“Of course, we could all be wrong about the way that 802.11 works, but I would have thought that the only way that the Google Car could see anything other than the MAC address of the WAP would be if both:
– the car quickly impersonated the WAP by forging its SSID
– the computers in the house tried to re-attach to the device forging the SSID Is this the scenario you think happened? If so, where did you see this? If not, what am I am misunderstanding about Wifi where just receiving signals without looking like a WAP allows me to see any MACs other than those of WAPs?

“I look forward to hearing more on this, even if my understanding of WiFi (and that of the folks I asked) is wrong.”

Unfortunately, the assumptions made by my reader, even though supported by other experts, are wrong. 

Few technologies are more ubiquitous or foundational than 802.11 wireless (WiFi).  The security experts in this domain understand perfectly its security characteristics relative to protection of the data payload.  But in the past the device identity aspects of the system have not been on the front burner.  No wonder.  I imagine that anyone worried about some information agency accumulating all the MAC addresses in the world and mapping them to the houses people live in would have been sent off to the looney bin a few years ago: “Sure, and pigs might fall from the sky and crush us too!  Now let's get this thing deployed!”

Of course I come at this from a different direction since I'm an “identity guy” and the identity of the devices is something I have had to understand and deal with.  But given the importance of the discussion I turned to two colleagues in other disciplines to verify that my own understanding remains correct despite the evolution of the standards.  One is Khaja Ahmed, an expert in network security; the other is Christian Huitema, an expert in all aspects of networking.

I'll share Christian's comments in a separate post.  Khaja responded:   

“Yes, the senders MAC address is in the clear. Of course the recipients (WiFi access point) MAC address has to be in the clear so it knows that the packet is intended for it. The client’s MAC address is needed so the WiFi access point knows which session key and state to use to process the frame. Just as the SA in IPsec cannot be identified without the IP address of the sender.

“One more point re the four fields you are talking about… There are 3 or 4 MAC addresses in each 802.11 frame depending on who is sending the packet to who on whose behalf.

“The sender and destination addresses are always there, so that’s two. The third address is typically the Base Station Identifier. In cases where the packets are being relayed by some other part of the infrastructure there may be addresses of some intermediate transmitter and receiver. That gives you the 4 addresses. The MAC address of the original sender / client is just one field.