June 2010 – Page 2 – Kim Cameron's Identity Weblog

Title 18 – Part II – Chapter 206 – § 3121

Former federal prosecutor Paul Ohm says Google “likely” breached a U.S. federal criminal statute in connection with its accidental Wi-Fi sniffing — but not for siphoning private data from internet surfers using unsecured networks.

Instead, Mr. Ohm thinks Google might have breached the Pen Register and Trap and Traces Device Act for intercepting the metadata and address information alongside the content.

According to Wikipedia, a “pen register is an electronic device that records all numbers dialed from a particular telephone line. The term has come to include any device or program that performs similar functions to an original pen register, including programs monitoring Internet communications.”

I'll expand on the identity implications in my next post, but to prepare the discussion, here is the statute to which Mr. Ohm is referring:

Title 18 – Part II – Chapter 206 – § 3121

In General.— Except as provided in this section, no person may install or use a pen register or a trap and trace device without first obtaining a court order under section 3123 of this title or under the Foreign Intelligence Surveillance Act of 1978 (50 U.S.C. 1801 et seq.).
Exception.— The prohibition of subsection (a) does not apply with respect to the use of a pen register or a trap and trace device by a provider of electronic or wire communication service—
1. relating to the operation, maintenance, and testing of a wire or electronic communication service or to the protection of the rights or property of such provider, or to the protection of users of that service from abuse of service or unlawful use of service; or
2. to record the fact that a wire or electronic communication was initiated or completed in order to protect such provider, another provider furnishing service toward the completion of the wire communication, or a user of that service, from fraudulent, unlawful or abusive use of service; or
3. where the consent of the user of that service has been obtained.
Limitation.— A government agency authorized to install and use a pen register or trap and trace device under this chapter or under State law shall use technology reasonably available to it that restricts the recording or decoding of electronic or other impulses to the dialing, routing, addressing, and signaling information utilized in the processing and transmitting of wire or electronic communications so as not to include the contents of any wire or electronic communications.
Penalty.— Whoever knowingly violates subsection (a) shall be fined under this title or imprisoned not more than one year, or both.

Conor changes his mind

Conor Cahill has taken a look at the Gstumbler report. His conclusion is:

Given this new information I would have to agree that Google has clearly stepped into the arena of doing something that could be detrimental to the user's privacy.

Conor explains that, “the information in the report is quite different than the information that had been published at the time I expressed my opinions on the events at hand.”

He argues:

“We had been led to believe that Google had only captured data on open wireless networks (networks that broadcast their SSIDs and/or were unencrypted). The analysis of the software shows that to be incorrect — Google captured data on every network regardless of the state of openness. So no matter what the user did to try to protect their network, Google captured data that the underlying protocols required to be transmitted in the clear.
“We had been led to believe that Google had only captured data from wireless access points (APs). Again the analysis shows that this was incorrect — Google captured data on any device for which it was able to capture the wireless traffic for (AP or user device). So portable devices that were currently transmitting as the Street View vehicle passed would have their data captured.”

Anyone who knows Conor knows he is a gentlemanly model of how people should behave towards each other in our industry. I understand his position fully, and respect it. He says:

[Kim] seems to have a particular fondness for the phrase “wrong,” “completely wrong,” and “wishful thinking” when referring to my comments on the topic. In my defense, I will say that there was no “wishful thinking” going on in my mind. I was just examining the published information rather than jumping to conclusions — something that I will always advocate. In this case, after examining the published report, it does appear that those who jumped to conclusions happened to be closer to the mark, but I still think they were wrong to jump to those conclusions until the actual facts had been published.

I can't disagree that Google's public relations messages may well have been crafted to leave the impression that their wireless eavesdropping was only directed at network access points. But if you read them extremely carefully you see they refrain from making any such claims.

At any rate, Conor needs no defense and I accept his point. People who took the view that Google couldn't possibly have been doing what I claimed were acting based on the messages the company conveyed. Sadly, if people of Conor's undisputed technical sophistication are misled by this kind of public relations campaign, the crafting of the information might also be considered suspect.

[More of Conor's post here]

Identityblog Information Card registration working again

A while ago I added “just one more feature” and, not having a test department, broke the process for registering a new information card here at Identityblog. Predictably, I didn't see the problem because I've had my cards for, er, a while… Anyway, I got a whole pack of emails telling me about it – and I appreciate that – but didn't have the time to track it down. Thanks to all.

The problem was this:

People submitting a new card you would be sent an email asking them to click on a link to verify their email address. When they did so, rather than having their membership approved, they would receive a message that not a few readers qualified as “meaningless”. Others had saltier descriptions.

Anyway, I apologize – the problem was not with the Pamela Project's PHP code, but with a tweak I added when I changed the code to accept some experimental versions of claims selectors. I still don't have a test department, but as far as I can tell everything now works. Hint: login before leaving your comment.

I currently accept managed cards from federatedidentity.net without the need to validate. It's a slick experience. I'd be happy to accept other providers too – just send me the distinguished name used in your signing certificate.

Why location services have to be done right…

uTest describes itself as the world's largest marketplace for software testing services. Recently it held a Bug Battle to test the web and mobile applications of the leading “check-in” location services. A Bug Battle is a quarterly app testing competition, where “software professionals from around the world compete to find bugs and rank today's popular applications” (previous Bug Battles have focused on browsers, search engines, social networking sites, etc.

When evaluating location-based check-in services, testers were given ten days in May to report the most interesting and severe bugs, and to rank these applications based on

geo-location accuracy,
social media integration,
friend connectivity,
status recognition features and
ease-of use

uTest offered nearly $4,000 in prize money to those who submitted the best bugs for feedback.
The results of the battle, which rated Foursquare, Gowalla and Brightkite, are detailed here.

The report includes comments by people who clearly love the service. For example:

“The Gowalla app and web interface themselves are easy on the eyes, and venues get their own snazzy icon depicting what type of establishment it is. I feed my Gowalla check-ins to Facebook, and having an image that catches attention in a cluttered news feed matters. The user can see everyone who has checked in at a particular venue and how many times.”

But one clear outcome is that many testers reported serious bugs related to privacy and security – a category not present in the original list:

“The impact of check-in services on personal privacy and security took on a prominent role in this study. 80% of respondents responded “Yes” when asked if they were concerned about how location-based check-in services could impact their personal privacy and safety. Nearly half of respondents (49%) chose “privacy and security concerns” as the top reason they do not use check-in services more often.”

VentureBeat, which wrote about the report, concludes:

In addition to appreciating easy-to-use services and bemoaning the lack of Frappuccino deals, the testers seem to be concerned about the privacy and security implications of check-in services in general. 49% of testers said privacy and security concerns were the top reason they don’t use check-in services more often. This is something the check-in services need to address if they want to avoid privacy flames like the ones Facebook is constantly fighting.

These services can be built so as to respect and enhance privacy. Things like giant world databases linking our devices to our home locations don't help convince anyone that we are doing that.

Claims based identity tops IT security concerns

John Fontana has crossed the floor, moving from the critics box to the stage.

Many people will remember John's articles at Network World, where he served as one of the most talented and informed journalists in the world writing about network infrastructure for as long as I can remember.

While it is sad to lose him in his role as journalist, I really look forward to what he will do at his new digs: Ping.

I know his ability to explain identity and its issues and technology in words people can actually understand will benefit the whole identity ecosystem. Kudos to Andre Durand and Ping for having the wisdom to bring him over.

Meanwhile, there's exciting news in this post from the new John:

If I could say it better than my former Network World colleague Ellen Messmer I would, but I can’t so I’m just going to link to her story on Gartner’s survey that shows identity management projects rank first in the top five priorities for IT's security spending.
The results of the survey come from interviews with IT professionals at 308 companies.

But let me highlight two paragraphs from Ellen's story:

“Identity management appears to be taking the lead as a top priority as businesses look to deploy some of the more advanced federated identity technologies both within the enterprise for single sign-on and as a way to potentially extend identity-based access control into cloud-computing environments.”

And this one:

“But in terms of firewalls as a priority, [Gartner] notes that there's a movement to install next-generation firewalls.”

On that last point, check this link to From Firewall to IdentityWall.

[John is on Twitter where he also puts together a Tweet list]

Latitude privacy policy doesn't fess up to what Google stores

Never one to mince words, Jackson Shaw asks, “To the Google privacy core – Is it rotten?” He writes,

“I read Kim’s post and immediately decided to turn off Google’s Latitude service on my phone but, as Kim illustrates, it probably won’t make any difference…

“I took a few minutes to check out Google’s privacy policy around Latitude and found out this much:

“If you choose to ‘Hide your location’, you can hide from your Latitude friends all at once, so they won't be able to see your location. If you hide in Latitude, we don't store your location.

“I’m not worried about hiding in Latitude. I wish I could hide from Google!”

The funny thing here is that Google already stores our residential locations through association with our devices, as indicated by its Gstumbler report, contradicting the Latitude privacy policy.

Jackson then directs us to a Wired article that is tremendously germane to this discussion – partly because of what it says about the current legal environment in the US, and partly because it reflects the very real problem that, in general, neither technologists nor policy makers understand that tapping of device identifiers is as serious as theft of content.

See: “Former Prosecutor: Google Wi-Fi Snafu ‘Likely’ Illegal ” – I'll discuss it next.

Rethink things in light of Google's Gstumbler report

A number of technical people have given Google the benefit of the doubt in the Street View Wifi case and as a result published information that Google's new “Gstumbler” report shows is completely incorrect. It is important that people re-evaluate what they are saying in light of this report.

I'll pick on Conor's recent posting on our discussion as an example – it contains a number of statements and implies a number of things explicitly contradicted by Google's new report. Once he reads the report and applies the logic he has put forward, logic will require Conor to change his conclusions.

Conor begins with a bunch of statements that are true:

MAC addresses typically are persistent identifiers that by the definition of the protocols used in wireless APs can't be hidden from snoopers, even if you turn on encryption.
By themselves, MAC addresses are not all that useful except to communicate with a local network entity (so you need to be nearby on the same local network to use them.
When you combine MAC addresses with other information (locality, user identity, etc.) you can be creating worrisome data aggregations that when exposed publicly could have a detrimental impact on a user's privacy.
SSIDs have some of these properties as well, though the protocol clearly gives the user control over whether or not to broadcast (publicize) their SSID. The choice of the SSID value can have a substantial impact on it's use as a privacy invading value — a generic value such as “home” or “linksys” is much less likely to be a privacy issue than “ConorCahillsHomeAP”.

Wishful thinking and completely wrong

These are followed by a statement that is just plain wishful thinking. Conor continues:

Google purposely collected SSID and MAC Addresses from APs which were configured in SSID broadcast mode and inadvertently collected some network traffic data from those same APs. Google did not collect information from APs configured to not broadcast SSIDs.

Google's report says Conor is wrong about this, explicitly saying in paragraph 26, “Kismet can also detect the existence of networks with non-broadcast SSIDs, and will capture, parse, and record data from such networks“. Conor continues:

Google associated the SSID and MAC information with some location information (probably the GPS vehicle location at the time the AP signal was strongest).

This is true, but it is important to indicate that this was not limited to access points. Google's report says that it recorded the association between the MAC address and geographic location of all the active devices on the network. When it did this, the MAC addresses became, according to Conor's own earlier definition, “worrisome data aggregations”.

There is no AP protocol defined means to differentiate between open wireless hotspots and closed hotspots which broadcast their SSIDs.

This is true, but Google's report indicates this would not have mattered – it collected MACs regardless of whether SSIDs were broadcast.

I have not found out if Google used the encryption status of the APs in its decision about recording the SSID/MAC information for the AP.

Google's report indicates it did not. It only used that status to decide whether or not to record the payload – and only recorded the payload of unencrypted frames…

I like Conor's logic that, “When you combine MAC addresses with other information (locality, user identity, etc.) you can be creating worrisome data aggregations that when exposed publicly could have a detrimental impact on a user's privacy.” I urge Conor to read the Gstumbler report. Once he knows what was actually happening, I hope he'll tell the world about it.

Gstumbler tells all

The third party commissioned by Google to review the software used in its Street View WiFi cars has completed its report, called Source Code Analysis of ‘Gstumbler’. I will resist commenting on the name, since Google did the right thing in publishing the report: there will no longer be any ambiguity about what was being collected.

As we have discussed over the last week, two issues are of importance – collection of device identity data, and collection of payload data. One thing I like about te report is that it has a begins with a a number of technical “descriptions and definitions”. For example, in paragraph 7 it explains enveloping:

“Each packet is comprised of a packet header which contains network administrative information and the addressing information (or “envelope” information) necessary to transmit the data packet from one device to another along the path to its final destination. Each packet also contains a “payload” which is a fragment of the “content” of the communication or data transmission sent or received over the internet…”

It explains that in 802.11 packets are encapsulated in frames, describes the types of frames and presents the standard diagram showing how a frame is structured.

Readers should understand that when network encryption is turned on, it is only the Frame Body (Payload) of data frames that is encrypted.

In paragraph 19, the report provides an overview of its findings:

“While running in memory, the program parses frame header information, such as frame type, MAC addresses, and other network administrative data from each of the captured frames. The parsing separates the information into discreet fields for easier analysis… All available MAC addresses contained in a frame are also parsed. All of this parsed header information is written to disk for frames transmitted over both encrypted and unencrypted wireless networks [emphasis mine – Kim].”

In paragraph 20, the report explains that the software discards the content of encrypted bodies (which of course it can't analyse anyway) whereas unencrypted bodies are also written to disk. I have not discussed the issue of collecting the frame bodies in these pages – there is no need to do so since it is intuitively easy for people to understand what it means to collect payloads.

In paragraph 22 the report concludes that “all wireless frame data was recorded except for the bodies of 802.11 Data frames from encypted networks.”

All device identifiers were recorded

As a result, there is no longer any question. The MAC addresses of all the WiFi laptops and phones in the homes, businesses, enterprises and government buildings were recorded by the driveby mapping cars, as were the wireless access points, and this regardless of the use of encryption.

My one quibble with the otherwise excellent report is that it calls the MAC addresses “network administrative data”. In fact they are the device identifiers of the network devices – both of the network access point and the devices connecting to that access point – phones and laptops.

It is also worth, given some of the previous conversations about supposed “broadcasting”, drawing attention to paragraph 26, which explains,

“Kismet captures wireless frames using wireless network interface cards set to monitoring mode. The use of monitoring mode means that Kismet directs the wireless hardware to listen for and process all wireless traffic regardless of its intended destination… Through the use of passive packet sniffing, Kismet can also detect the existence of netwrks with non-broadcast SSIDs, and will capture, parse, and record data from such networks.”

It is all Metcalfe’s fault

Christian Huitema, author of IPv6: The New Internet Protocol (2nd Edition) and one of the leading architects of IPV6, had this to tell us:

It is all Metcalfe’s fault. There is no real functional need to have the MAC addresses unique worldwide, but it certainly is very convenient. If they weren’t unique, we would have to add a protocol to detect address collisions and somehow resolve them. That’s hard enough for static attachments, but becomes really hairy when dealing with a high mobility environment, e.g. Wi-Fi enabled smart phones that connect to different base stations as we roam the corridors of the buildings. Making MAC addresses unique really simplified the design, but it did not actually spare the need for detecting duplicates. Simply, we treat that as an error, to be corrected by network management systems.

The initial design of IPv6 called for embedding the MAC addresses in the IPv6 address. A host IPv6 address would be built as the combination of a 64 bit network prefix and a MAC address expanded to 64 bits. Our Windows Networking team saw that as a serious issue, and we proposed an alternative design in which host pick randomized “host identifiers”, that vary each time they connect to a new network. That’s what you get by default in Vista and in Windows 7, although managers can still force the old “standard compliant” behavior and request that the identifier be set to the MAC address. I believe that most other operating systems just build IPv6 address using the MAC address.

The worldwide database of MAC addresses would be even more valuable if we had kept using MAC addresses in IPv6 addresses. In fact, it may be valuable enough as most smart phone stacks still do that. Web sites and other services see the incoming IPv6 address, extract the database, and, voila, precise identification of the caller identity, location, you name it. Picture Bill Joy’s smirk, “told you so.”

Your understanding of the Wi-Fi protocol is correct. Only the payload is encrypted, not the MAC header. The 802.11 MAC header actually differs from the Ethernet MAC header and carries up to 4 MAC addresses: the immediate wireless sender, the intended wireless receiver, the original source and the final destination. The final destination is used for example when sending a packet from one mobile to another through a base station. Depending of scenarios, headers carry 2, 3 or 4 Mac addresses – addresses are not repeated, for example, when the original source is the same as the wireless sender, or when the intended destination is the same as the wireless destination.

The MAC header itself was not protected, at least not initially. This can lead to possible spoofing of control frames, e.g. disconnection requests. 802.11 in 2009 defined 802.11w to add protection to management frames, but this is essentially an anti-spoofing standard. It may optionally encrypt some management data, but it cannot encrypt the wireless MAC addresses.

These are very important points. The problem of moving between multiple base stations in the same network would make MAC encryption a non-starter unless we took a heavy dependence on communication between the base stations, introducing the reliability concerns that implies. In other words, the problem is not quite as simple as Hal Berenson suggested here.

Yet Christian has found an elegant and simple alternative. I really take my hat off to him for having been visionary enough – and sufficiently tuned into identity issues – to generate, by default, a different IPV6 MAC address for each network a device connects to. I remember Christian discussing the issues and telling me he saw this as a possibility but had no idea until now he had succeeded in getting it out the door and onto millions of devices.

This approach solves the linking problem I've been describing, because the MAC address snooped in your home would be different from the MAC address generated should you go to your workplace or attend a conference. In essence, Christian has made the IPV6 MAC addresses properly unidirectional, in the sense of being contextually specific identifiers, and in this sense has brought IP into conformance with the Fourth Law of Identity.

Although this benefit only kicks in as the infrastructure evolves to IPV6, it establishes the fact that the end-state we will reach is one in which WiFi snooping won't provide the ability to link people across contexts as various commercial interests are currently attempting to do.

It also, in my view, gives me confidence that regulation preventing collection and linking of MAC addresses would be totally consistent with the direction technological evolution will take us in anyway. This is really key, since we never want regulation to tell technologists what to do – only, in protecting the public, to tell us what not to do.

There is, however, a macabre side to Christian's comment.

Implementations of IPV6 that do always include a persistent and unchanging MAC address in their IPV6 address need to be fixed. They make the problem of unique identification across contexts worse, not better, since the MAC address moves up the stack to the IP layer… We need the people responsible for these implementations to understand the issues and provide privacy-friendly alternatives just as Christian did. Looks like there is more work to be done…

“We could all be wrong about the way 802.11 works…”

I received a comment from a reader who plays an important role in the network protection industry which reads:

“I was a bit surprised by you going on about Google getting the MAC addresses of devices in people's home. I asked a few other security folks, and none of us could figure out why you thought that Google had these addresses.

“Of course, we could all be wrong about the way that 802.11 works, but I would have thought that the only way that the Google Car could see anything other than the MAC address of the WAP would be if both:
– the car quickly impersonated the WAP by forging its SSID
– the computers in the house tried to re-attach to the device forging the SSID Is this the scenario you think happened? If so, where did you see this? If not, what am I am misunderstanding about Wifi where just receiving signals without looking like a WAP allows me to see any MACs other than those of WAPs?

“I look forward to hearing more on this, even if my understanding of WiFi (and that of the folks I asked) is wrong.”

Unfortunately, the assumptions made by my reader, even though supported by other experts, are wrong.

Few technologies are more ubiquitous or foundational than 802.11 wireless (WiFi). The security experts in this domain understand perfectly its security characteristics relative to protection of the data payload. But in the past the device identity aspects of the system have not been on the front burner. No wonder. I imagine that anyone worried about some information agency accumulating all the MAC addresses in the world and mapping them to the houses people live in would have been sent off to the looney bin a few years ago: “Sure, and pigs might fall from the sky and crush us too! Now let's get this thing deployed!”

Of course I come at this from a different direction since I'm an “identity guy” and the identity of the devices is something I have had to understand and deal with. But given the importance of the discussion I turned to two colleagues in other disciplines to verify that my own understanding remains correct despite the evolution of the standards. One is Khaja Ahmed, an expert in network security; the other is Christian Huitema, an expert in all aspects of networking.

I'll share Christian's comments in a separate post. Khaja responded:

“Yes, the senders MAC address is in the clear. Of course the recipients (WiFi access point) MAC address has to be in the clear so it knows that the packet is intended for it. The client’s MAC address is needed so the WiFi access point knows which session key and state to use to process the frame. Just as the SA in IPsec cannot be identified without the IP address of the sender.

“One more point re the four fields you are talking about… There are 3 or 4 MAC addresses in each 802.11 frame depending on who is sending the packet to who on whose behalf.

“The sender and destination addresses are always there, so that’s two. The third address is typically the Base Station Identifier. In cases where the packets are being relayed by some other part of the infrastructure there may be addresses of some intermediate transmitter and receiver. That gives you the 4 addresses. The MAC address of the original sender / client is just one field.