Gstumbler tells all

The third party commissioned by Google to review the software used in its Street View WiFi cars has completed its report, called Source Code Analysis of ‘Gstumbler’.  I will resist commenting on the name, since Google did the right thing in publishing the report:  there will no longer be any ambiguity about what was being collected. 

As we have discussed over the last week, two issues are of importance – collection of device identity data, and collection of payload data.  One thing I like about te report is that it has a begins with a a number of technical “descriptions and definitions”.  For example, in paragraph 7 it explains enveloping:

“Each packet is comprised of a packet header which contains network administrative information and the addressing information (or “envelope” information) necessary to transmit the data packet from one device to another along the path to its final destination.  Each packet also contains a “payload” which is a fragment of the “content” of the communication or data transmission sent or received over the internet…”

It explains that in 802.11 packets are encapsulated in frames, describes the types of frames and presents the standard diagram showing how a frame is structured.

Readers should understand that when network encryption is turned on, it is only the Frame Body (Payload) of data frames that is encrypted.

In paragraph 19, the report provides an overview of its findings:

“While running in memory, the program parses frame header information, such as frame type, MAC addresses, and other network administrative data from each of the captured frames.  The parsing separates the information into discreet fields for easier analysis… All available MAC addresses contained in a frame are also parsed.  All of this parsed header information is written to disk for frames transmitted over both encrypted and unencrypted wireless networks [emphasis mine – Kim].”

In paragraph 20, the report explains that the software discards the content of encrypted bodies (which of course it can't analyse anyway) whereas unencrypted bodies are also written to disk.  I have not discussed the issue of collecting the frame bodies in these pages – there is no need to do so since it is intuitively easy for people to understand what it means to collect payloads.

In paragraph 22 the report concludes that “all wireless frame data was recorded except for the bodies of 802.11 Data frames from encypted networks.” 

All device identifiers were recorded

As a result, there is no longer any question.  The MAC addresses of all the WiFi laptops and phones in the homes, businesses, enterprises and government buildings were recorded by the driveby mapping cars, as were the wireless access points, and this regardless of the use of encryption. 

My one quibble with the otherwise excellent report is that it calls the MAC addresses “network administrative data”.  In fact they are the device identifiers of the network devices – both of the network access point and the devices connecting to that access point – phones and laptops.

It is also worth, given some of the previous conversations about supposed “broadcasting”, drawing attention to paragraph 26, which explains,

“Kismet captures wireless frames using wireless network interface cards set to monitoring mode.  The use of monitoring mode means that Kismet directs the wireless hardware to listen for and process all wireless traffic regardless of its intended destination… Through the use of passive packet sniffing, Kismet can also detect the existence of netwrks with non-broadcast SSIDs, and will capture, parse, and record data from such networks.”

 

It is all Metcalfe’s fault

Christian Huitema, author of IPv6: The New Internet Protocol (2nd Edition) and one of the leading architects of IPV6, had this to tell us:

It is all Metcalfe’s fault. There is no real functional need to have the MAC addresses unique worldwide, but it certainly is very convenient. If they weren’t unique, we would have to add a protocol to detect address collisions and somehow resolve them. That’s hard enough for static attachments, but becomes really hairy when dealing with a high mobility environment, e.g. Wi-Fi enabled smart phones that connect to different base stations as we roam the corridors of the buildings. Making MAC addresses unique really simplified the design, but it did not actually spare the need for detecting duplicates. Simply, we treat that as an error, to be corrected by network management systems.

The initial design of IPv6 called for embedding the MAC addresses in the IPv6 address. A host IPv6 address would be built as the combination of a 64 bit network prefix and a MAC address expanded to 64 bits. Our Windows Networking team saw that as a serious issue, and we proposed an alternative design in which host pick randomized “host identifiers”, that vary each time they connect to a new network. That’s what you get by default in Vista and in Windows 7, although managers can still force the old “standard compliant” behavior and request that the identifier be set to the MAC address. I believe that most other operating systems just build IPv6 address using the MAC address.

The worldwide database of MAC addresses would be even more valuable if we had kept using MAC addresses in IPv6 addresses. In fact, it may be valuable enough as most smart phone stacks still do that. Web sites and other services see the incoming IPv6 address, extract the database, and, voila, precise identification of the caller identity, location, you name it. Picture Bill Joy’s smirk, “told you so.”

Your understanding of the Wi-Fi protocol is correct. Only the payload is encrypted, not the MAC header.  The 802.11 MAC header actually differs from the Ethernet MAC header and carries up to 4 MAC addresses: the immediate wireless sender, the intended wireless receiver, the original source and the final destination. The final destination is used for example when sending a packet from one mobile to another through a base station. Depending of scenarios, headers carry 2, 3 or 4 Mac addresses – addresses are not repeated, for example, when the original source is the same as the wireless sender, or when the intended destination is the same as the wireless destination.

The MAC header itself was not protected, at least not initially. This can lead to possible spoofing of control frames, e.g. disconnection requests. 802.11 in 2009 defined 802.11w to add protection to management frames, but this is essentially an anti-spoofing standard. It may optionally encrypt some management data, but it cannot encrypt the wireless MAC addresses.

These are very important points.  The problem of moving between multiple base stations in the same network would make MAC encryption a non-starter unless we took a heavy dependence on communication between the base stations, introducing the reliability concerns that implies.  In other words, the problem is not quite as simple as Hal Berenson suggested here. 

Yet Christian has found an elegant and simple alternative.  I really take my hat off to him  for having been visionary enough – and sufficiently tuned into identity issues – to generate, by default, a different IPV6 MAC address for each network a device connects to.   I remember Christian discussing the issues and telling me he saw this as a possibility but had no idea until now he had succeeded in getting it out the door and onto millions of devices. 

This approach solves the linking problem I've been describing, because the MAC address snooped in your home would be different from the MAC address generated should you go to your workplace or attend a conference.   In essence, Christian has made the IPV6 MAC addresses properly unidirectional, in the sense of being contextually specific identifiers, and in this sense has brought IP into conformance with the Fourth Law of Identity.

Although this benefit only kicks in as the infrastructure evolves to IPV6, it establishes the fact that the end-state we will reach is one in which WiFi snooping won't provide the ability to link people across contexts as various commercial interests are currently attempting to do. 

It also, in my view, gives me confidence that regulation preventing collection and linking of MAC addresses would be totally consistent with the direction technological evolution will take us in anyway.   This is really key, since we never want regulation to tell technologists what to do – only, in protecting the public, to tell us what not to do

There is, however, a macabre side to Christian's comment. 

Implementations of IPV6 that do always include a persistent and unchanging MAC address in their IPV6 address need to be fixed.  They make the problem of unique identification across contexts worse, not better, since the MAC address moves up the stack to the IP layer…   We need the people responsible for these implementations to understand the issues and provide privacy-friendly alternatives just as Christian did.  Looks like there is more work to be done… 

“We could all be wrong about the way 802.11 works…”

I received a comment from a reader who plays an important role in the network protection industry which reads:

“I was a bit surprised by you going on about Google getting the MAC addresses of devices in people's home. I asked a few other security folks, and none of us could figure out why you thought that Google had these addresses.

“Of course, we could all be wrong about the way that 802.11 works, but I would have thought that the only way that the Google Car could see anything other than the MAC address of the WAP would be if both:
– the car quickly impersonated the WAP by forging its SSID
– the computers in the house tried to re-attach to the device forging the SSID Is this the scenario you think happened? If so, where did you see this? If not, what am I am misunderstanding about Wifi where just receiving signals without looking like a WAP allows me to see any MACs other than those of WAPs?

“I look forward to hearing more on this, even if my understanding of WiFi (and that of the folks I asked) is wrong.”

Unfortunately, the assumptions made by my reader, even though supported by other experts, are wrong. 

Few technologies are more ubiquitous or foundational than 802.11 wireless (WiFi).  The security experts in this domain understand perfectly its security characteristics relative to protection of the data payload.  But in the past the device identity aspects of the system have not been on the front burner.  No wonder.  I imagine that anyone worried about some information agency accumulating all the MAC addresses in the world and mapping them to the houses people live in would have been sent off to the looney bin a few years ago: “Sure, and pigs might fall from the sky and crush us too!  Now let's get this thing deployed!”

Of course I come at this from a different direction since I'm an “identity guy” and the identity of the devices is something I have had to understand and deal with.  But given the importance of the discussion I turned to two colleagues in other disciplines to verify that my own understanding remains correct despite the evolution of the standards.  One is Khaja Ahmed, an expert in network security; the other is Christian Huitema, an expert in all aspects of networking.

I'll share Christian's comments in a separate post.  Khaja responded:   

“Yes, the senders MAC address is in the clear. Of course the recipients (WiFi access point) MAC address has to be in the clear so it knows that the packet is intended for it. The client’s MAC address is needed so the WiFi access point knows which session key and state to use to process the frame. Just as the SA in IPsec cannot be identified without the IP address of the sender.

“One more point re the four fields you are talking about… There are 3 or 4 MAC addresses in each 802.11 frame depending on who is sending the packet to who on whose behalf.

“The sender and destination addresses are always there, so that’s two. The third address is typically the Base Station Identifier. In cases where the packets are being relayed by some other part of the infrastructure there may be addresses of some intermediate transmitter and receiver. That gives you the 4 addresses. The MAC address of the original sender / client is just one field.