Privacy – Page 5 – Kim Cameron's Identity Weblog

Latitude privacy policy doesn't fess up to what Google stores

Never one to mince words, Jackson Shaw asks, “To the Google privacy core – Is it rotten?” He writes,

“I read Kim’s post and immediately decided to turn off Google’s Latitude service on my phone but, as Kim illustrates, it probably won’t make any difference…

“I took a few minutes to check out Google’s privacy policy around Latitude and found out this much:

“If you choose to ‘Hide your location’, you can hide from your Latitude friends all at once, so they won't be able to see your location. If you hide in Latitude, we don't store your location.

“I’m not worried about hiding in Latitude. I wish I could hide from Google!”

The funny thing here is that Google already stores our residential locations through association with our devices, as indicated by its Gstumbler report, contradicting the Latitude privacy policy.

Jackson then directs us to a Wired article that is tremendously germane to this discussion – partly because of what it says about the current legal environment in the US, and partly because it reflects the very real problem that, in general, neither technologists nor policy makers understand that tapping of device identifiers is as serious as theft of content.

See: “Former Prosecutor: Google Wi-Fi Snafu ‘Likely’ Illegal ” – I'll discuss it next.

Rethink things in light of Google's Gstumbler report

A number of technical people have given Google the benefit of the doubt in the Street View Wifi case and as a result published information that Google's new “Gstumbler” report shows is completely incorrect. It is important that people re-evaluate what they are saying in light of this report.

I'll pick on Conor's recent posting on our discussion as an example – it contains a number of statements and implies a number of things explicitly contradicted by Google's new report. Once he reads the report and applies the logic he has put forward, logic will require Conor to change his conclusions.

Conor begins with a bunch of statements that are true:

MAC addresses typically are persistent identifiers that by the definition of the protocols used in wireless APs can't be hidden from snoopers, even if you turn on encryption.
By themselves, MAC addresses are not all that useful except to communicate with a local network entity (so you need to be nearby on the same local network to use them.
When you combine MAC addresses with other information (locality, user identity, etc.) you can be creating worrisome data aggregations that when exposed publicly could have a detrimental impact on a user's privacy.
SSIDs have some of these properties as well, though the protocol clearly gives the user control over whether or not to broadcast (publicize) their SSID. The choice of the SSID value can have a substantial impact on it's use as a privacy invading value — a generic value such as “home” or “linksys” is much less likely to be a privacy issue than “ConorCahillsHomeAP”.

Wishful thinking and completely wrong

These are followed by a statement that is just plain wishful thinking. Conor continues:

Google purposely collected SSID and MAC Addresses from APs which were configured in SSID broadcast mode and inadvertently collected some network traffic data from those same APs. Google did not collect information from APs configured to not broadcast SSIDs.

Google's report says Conor is wrong about this, explicitly saying in paragraph 26, “Kismet can also detect the existence of networks with non-broadcast SSIDs, and will capture, parse, and record data from such networks“. Conor continues:

Google associated the SSID and MAC information with some location information (probably the GPS vehicle location at the time the AP signal was strongest).

This is true, but it is important to indicate that this was not limited to access points. Google's report says that it recorded the association between the MAC address and geographic location of all the active devices on the network. When it did this, the MAC addresses became, according to Conor's own earlier definition, “worrisome data aggregations”.

There is no AP protocol defined means to differentiate between open wireless hotspots and closed hotspots which broadcast their SSIDs.

This is true, but Google's report indicates this would not have mattered – it collected MACs regardless of whether SSIDs were broadcast.

I have not found out if Google used the encryption status of the APs in its decision about recording the SSID/MAC information for the AP.

Google's report indicates it did not. It only used that status to decide whether or not to record the payload – and only recorded the payload of unencrypted frames…

I like Conor's logic that, “When you combine MAC addresses with other information (locality, user identity, etc.) you can be creating worrisome data aggregations that when exposed publicly could have a detrimental impact on a user's privacy.” I urge Conor to read the Gstumbler report. Once he knows what was actually happening, I hope he'll tell the world about it.

Gstumbler tells all

The third party commissioned by Google to review the software used in its Street View WiFi cars has completed its report, called Source Code Analysis of ‘Gstumbler’. I will resist commenting on the name, since Google did the right thing in publishing the report: there will no longer be any ambiguity about what was being collected.

As we have discussed over the last week, two issues are of importance – collection of device identity data, and collection of payload data. One thing I like about te report is that it has a begins with a a number of technical “descriptions and definitions”. For example, in paragraph 7 it explains enveloping:

“Each packet is comprised of a packet header which contains network administrative information and the addressing information (or “envelope” information) necessary to transmit the data packet from one device to another along the path to its final destination. Each packet also contains a “payload” which is a fragment of the “content” of the communication or data transmission sent or received over the internet…”

It explains that in 802.11 packets are encapsulated in frames, describes the types of frames and presents the standard diagram showing how a frame is structured.

Readers should understand that when network encryption is turned on, it is only the Frame Body (Payload) of data frames that is encrypted.

In paragraph 19, the report provides an overview of its findings:

“While running in memory, the program parses frame header information, such as frame type, MAC addresses, and other network administrative data from each of the captured frames. The parsing separates the information into discreet fields for easier analysis… All available MAC addresses contained in a frame are also parsed. All of this parsed header information is written to disk for frames transmitted over both encrypted and unencrypted wireless networks [emphasis mine – Kim].”

In paragraph 20, the report explains that the software discards the content of encrypted bodies (which of course it can't analyse anyway) whereas unencrypted bodies are also written to disk. I have not discussed the issue of collecting the frame bodies in these pages – there is no need to do so since it is intuitively easy for people to understand what it means to collect payloads.

In paragraph 22 the report concludes that “all wireless frame data was recorded except for the bodies of 802.11 Data frames from encypted networks.”

All device identifiers were recorded

As a result, there is no longer any question. The MAC addresses of all the WiFi laptops and phones in the homes, businesses, enterprises and government buildings were recorded by the driveby mapping cars, as were the wireless access points, and this regardless of the use of encryption.

My one quibble with the otherwise excellent report is that it calls the MAC addresses “network administrative data”. In fact they are the device identifiers of the network devices – both of the network access point and the devices connecting to that access point – phones and laptops.

It is also worth, given some of the previous conversations about supposed “broadcasting”, drawing attention to paragraph 26, which explains,

“Kismet captures wireless frames using wireless network interface cards set to monitoring mode. The use of monitoring mode means that Kismet directs the wireless hardware to listen for and process all wireless traffic regardless of its intended destination… Through the use of passive packet sniffing, Kismet can also detect the existence of netwrks with non-broadcast SSIDs, and will capture, parse, and record data from such networks.”

It is all Metcalfe’s fault

Christian Huitema, author of IPv6: The New Internet Protocol (2nd Edition) and one of the leading architects of IPV6, had this to tell us:

It is all Metcalfe’s fault. There is no real functional need to have the MAC addresses unique worldwide, but it certainly is very convenient. If they weren’t unique, we would have to add a protocol to detect address collisions and somehow resolve them. That’s hard enough for static attachments, but becomes really hairy when dealing with a high mobility environment, e.g. Wi-Fi enabled smart phones that connect to different base stations as we roam the corridors of the buildings. Making MAC addresses unique really simplified the design, but it did not actually spare the need for detecting duplicates. Simply, we treat that as an error, to be corrected by network management systems.

The initial design of IPv6 called for embedding the MAC addresses in the IPv6 address. A host IPv6 address would be built as the combination of a 64 bit network prefix and a MAC address expanded to 64 bits. Our Windows Networking team saw that as a serious issue, and we proposed an alternative design in which host pick randomized “host identifiers”, that vary each time they connect to a new network. That’s what you get by default in Vista and in Windows 7, although managers can still force the old “standard compliant” behavior and request that the identifier be set to the MAC address. I believe that most other operating systems just build IPv6 address using the MAC address.

The worldwide database of MAC addresses would be even more valuable if we had kept using MAC addresses in IPv6 addresses. In fact, it may be valuable enough as most smart phone stacks still do that. Web sites and other services see the incoming IPv6 address, extract the database, and, voila, precise identification of the caller identity, location, you name it. Picture Bill Joy’s smirk, “told you so.”

Your understanding of the Wi-Fi protocol is correct. Only the payload is encrypted, not the MAC header. The 802.11 MAC header actually differs from the Ethernet MAC header and carries up to 4 MAC addresses: the immediate wireless sender, the intended wireless receiver, the original source and the final destination. The final destination is used for example when sending a packet from one mobile to another through a base station. Depending of scenarios, headers carry 2, 3 or 4 Mac addresses – addresses are not repeated, for example, when the original source is the same as the wireless sender, or when the intended destination is the same as the wireless destination.

The MAC header itself was not protected, at least not initially. This can lead to possible spoofing of control frames, e.g. disconnection requests. 802.11 in 2009 defined 802.11w to add protection to management frames, but this is essentially an anti-spoofing standard. It may optionally encrypt some management data, but it cannot encrypt the wireless MAC addresses.

These are very important points. The problem of moving between multiple base stations in the same network would make MAC encryption a non-starter unless we took a heavy dependence on communication between the base stations, introducing the reliability concerns that implies. In other words, the problem is not quite as simple as Hal Berenson suggested here.

Yet Christian has found an elegant and simple alternative. I really take my hat off to him for having been visionary enough – and sufficiently tuned into identity issues – to generate, by default, a different IPV6 MAC address for each network a device connects to. I remember Christian discussing the issues and telling me he saw this as a possibility but had no idea until now he had succeeded in getting it out the door and onto millions of devices.

This approach solves the linking problem I've been describing, because the MAC address snooped in your home would be different from the MAC address generated should you go to your workplace or attend a conference. In essence, Christian has made the IPV6 MAC addresses properly unidirectional, in the sense of being contextually specific identifiers, and in this sense has brought IP into conformance with the Fourth Law of Identity.

Although this benefit only kicks in as the infrastructure evolves to IPV6, it establishes the fact that the end-state we will reach is one in which WiFi snooping won't provide the ability to link people across contexts as various commercial interests are currently attempting to do.

It also, in my view, gives me confidence that regulation preventing collection and linking of MAC addresses would be totally consistent with the direction technological evolution will take us in anyway. This is really key, since we never want regulation to tell technologists what to do – only, in protecting the public, to tell us what not to do.

There is, however, a macabre side to Christian's comment.

Implementations of IPV6 that do always include a persistent and unchanging MAC address in their IPV6 address need to be fixed. They make the problem of unique identification across contexts worse, not better, since the MAC address moves up the stack to the IP layer… We need the people responsible for these implementations to understand the issues and provide privacy-friendly alternatives just as Christian did. Looks like there is more work to be done…

“We could all be wrong about the way 802.11 works…”

I received a comment from a reader who plays an important role in the network protection industry which reads:

“I was a bit surprised by you going on about Google getting the MAC addresses of devices in people's home. I asked a few other security folks, and none of us could figure out why you thought that Google had these addresses.

“Of course, we could all be wrong about the way that 802.11 works, but I would have thought that the only way that the Google Car could see anything other than the MAC address of the WAP would be if both:
– the car quickly impersonated the WAP by forging its SSID
– the computers in the house tried to re-attach to the device forging the SSID Is this the scenario you think happened? If so, where did you see this? If not, what am I am misunderstanding about Wifi where just receiving signals without looking like a WAP allows me to see any MACs other than those of WAPs?

“I look forward to hearing more on this, even if my understanding of WiFi (and that of the folks I asked) is wrong.”

Unfortunately, the assumptions made by my reader, even though supported by other experts, are wrong.

Few technologies are more ubiquitous or foundational than 802.11 wireless (WiFi). The security experts in this domain understand perfectly its security characteristics relative to protection of the data payload. But in the past the device identity aspects of the system have not been on the front burner. No wonder. I imagine that anyone worried about some information agency accumulating all the MAC addresses in the world and mapping them to the houses people live in would have been sent off to the looney bin a few years ago: “Sure, and pigs might fall from the sky and crush us too! Now let's get this thing deployed!”

Of course I come at this from a different direction since I'm an “identity guy” and the identity of the devices is something I have had to understand and deal with. But given the importance of the discussion I turned to two colleagues in other disciplines to verify that my own understanding remains correct despite the evolution of the standards. One is Khaja Ahmed, an expert in network security; the other is Christian Huitema, an expert in all aspects of networking.

I'll share Christian's comments in a separate post. Khaja responded:

“Yes, the senders MAC address is in the clear. Of course the recipients (WiFi access point) MAC address has to be in the clear so it knows that the packet is intended for it. The client’s MAC address is needed so the WiFi access point knows which session key and state to use to process the frame. Just as the SA in IPsec cannot be identified without the IP address of the sender.

“One more point re the four fields you are talking about… There are 3 or 4 MAC addresses in each 802.11 frame depending on who is sending the packet to who on whose behalf.

“The sender and destination addresses are always there, so that’s two. The third address is typically the Base Station Identifier. In cases where the packets are being relayed by some other part of the infrastructure there may be addresses of some intermediate transmitter and receiver. That gives you the 4 addresses. The MAC address of the original sender / client is just one field.

More input and points of view

Dave Nikolejsin, CIO of British Columbia and a man who sees identity as the key to efficient government, writes:

“I agree with your comments and focus on the MAC layer data collection going on with Google. One observation I would have re all the “other” similar type activity would be that no others have Google’s resources and thus no others are doing systematic sweep of the western world on such a data gathering mission. As we all know the value of data increases in an N-squared manner and the “N” once Google is done will be a big number.”

Dave goes on to compare wirelesstapping with Facebook's privacy problems and makes what I think is a very insightful comment:

At least (for all its warts) we actually willingly give our info to Facebook!

Heavy duty SOA architect Gunnar Peterson (an expert on Service-Oriented Security) condenses our discussion to date and comes out strongly in favor of the arguments I've been making with regards to the wirelesstapping of MAC addresses:

Google's Macondo Street View team cannot seem to get the right combination of top kill or cap to fit on its MAC spillage. Your MAC is not like a house number (which everyone knows and are used for many purposes), MAC address is scoped to one use. There's no harm in collecting MACs, the hell you say, there's a number of evil emergent cocktails that can come out of this. Its not so much the MAC itself, its the association of the MAC and the gelocation and time – combining something unique like MAC with geolocation.

This looked like a rogue team (or as Google put it last week a “rogue software engineer“) until this shocking announcement that Google is patenting (emphasis added) – “The invention pertains to location approximation of devices, e.g., wireless access points and client devices in a wireless network.”

It seems pretty obvious that any number of permutations of problems will result by combining private client data and geolocation. Maybe Google books should scan a copy of J.C. Cannon's book “Privacy: What Developers and IT Professionals Should Know” and Stefan Brands Primer on User Identification. In both works you see the risks of promiscuously mixing identification cocktails and the unexpected leakages that result. In addition, what does benefit to the user who is being spied upon does all this spying create…?

[More here]

It's true that J.C. and Stefan both do a great job of helping explain the issues at play here, and I advise people who want to understand the issues better to check out their work.

Hal Berenson got back to me after my response to him, saying,

One thing I would suggest is that you write language attempting to ban what you want to ban and let the rest of us poke holes in it – meaning, show all the legitimate scenarios you would make illegal or accidental criminals you would create… 🙂

I have total sympathy with this concern of course. We want the minimum intervention necessary. But our society has come to realize there are many instances where consumers and citizens need be protected in various ways. In introducing these protections, lawmakers had to deal with exactly the same difficult concerns about balancing rights. The good news here: our legal system seems to handle this just fine. In fact, that's what it is about. So I will leave the crafting of the appropriate disincentives to professionals.

Ted Howard, who has broad experience including in the Games industry, commented,

“Regarding the issues of Google's collection of MAC addresses and wireless SSIDs:

“If I leave my blinds open so that I can get sun into my home, that means I have no problem with anyone walking past my house pausing to watch me in my home. When I talk with a friend while walking in the park, that means I cannot be bothered by a stranger walking alongside us listening to every word. In both cases, am I “publicly broadcasting” something or is the broadcast just a side effect of my activities? The analogy should be clear.

“Do a survey to see how many wireless router owners understand what MAC and SSID are. I suspect that very few (< 5%) people know. If they don't know what these are, then how can anyone claim that these people have been intentionally publicly broadcasting these with an understanding that the broadcast has become publicly-available knowledge? Government regulation exists to, among other reasons, protect the public when the public needs protection and would otherwise be unprotected. This seems to me like a good case for protecting the truly ignorant public.

Journalist Mary Branscombe comments:

For me there are 3 big questions.

How much info did or could someone capture; and by the way Google is in the data capture and data mining/machine learning business and that data *has* been used because if they didn't know it was there they didn't know they had to exclude it.
How personally identifiable or anonymised that information is (i don't think my phone has a bunch of captured MAC addresses and if it does I don't think it has any pii about them but I may be wrong.
How much people care. What is public or private about my SSID or about my MAC address? (I honestly don't know is how much you can find out about me from my MAC address but I'm assuming not much; if I'm wrong that's a data point! I know in 2007 Skyhook told me they were confident they had privacy cracked but they would and they're not the only people and they're the good guys…)

Location services is a *huge* business and people are oversharing location information for trivial rewards (see Foursquare). 8000 apps use Skyhook location data. It's Facebook with co-ordinates. I don't think we can not do this – but I think we can regulate and do it more safely.

In answer to Mary's question two, systems like the Street View Wifi system exist to map peoples’ device MAC addresses to their residential address, as described in the Google patent. Her point about big business is a BIG POINT. But to me it just increases the urgency to give people the geo-location capabilities they want without creating a privacy chernoble that will explode down the line.

Jan and Susan Huffman write,

“Standing naked in my front yard is like broadcasting my MAC address. If I don’t want people to look at me naked in my front yard, I wear clothes. I don’t ask that the law punish those who might take a look through the bushes.

My response: your MAC address is visible in your wireless packets no matter what you do. Turning on encryption doesn't help. So there are no clothes to put on. The analogy with clothes and bushes therefore just doesn't stand up. Furthermore, in most neighborhoods, if you spend your time trawling the neighborhood and peeking through bushes you end up with people giving you a pretty hard time… Maybe that's what's happening here.

MAC addresses will be used to reveal where you live

Conor has responded to my comments on why house numbers don't make a good metaphor for MAC addresses.

He writes that when I characterized house number as a “universal identifier”,

[Kim's argument] “confuses house address with house number. A house number is not able to be used as a universal identifier (I presume that there are many houses out there with the number 15, even in the same town, many times even on the same street in the same zip code (where the only difference is the N.W. and S.E. on the end of the street name).

“Like SSIDs and MAC addresses, the house number is only usable as an identifier once you get to the neighborhood and very often only once you get to the street.”

I like Conor's distinction between house number and house address. It's true there are many houses with the number 15, thus the house number is a local identiier, and only becomes universal when combined with the street name, the city, and so on. I hadn't understood that this is what he was trying to say.

Then Conor continues:

“I will admit that there are some differences with the MAC address because of how basic Ethernet networking was designed. The MAC address is designed to be unique (though, those in networking know that this isn't always the case and in fact most devices let you override the mac address anytime you want). So this could be claimed to be some form of a universal identifier. However, it's not at all usable outside of the local neighborhood. There is no way for me to talk to a particular MAC address unless I am locally on the same network with that device.”

Conor is completely right here. In networking as we have known it, the MAC address is not usable outside the “local network neighborhood”. But that is exactly what this WiFi snooping is about to change. In fact this is very much the core of what I'm talking about.

MAC addresses will be used to reveal where you live

Once you have snooped peoples’ MAC addresses, and put them into a database linking them to “where they live” (literally), you have dramatically changed the way network identifiers work.

In this new world, armed with such a database, if you see a MAC address somewhere – anywhere – you can look it up in your database – precisely because it is unique – and see where “it lives”. When I say, “where it lives”, I don't mean what network it belongs to. I mean where it is normally located in physical space – as a street address.

Is there some way to opt out of this? No – other than turning everything off. Unfortunately, given the way networks are designed, we have no choice but to reveal our MAC address when we use our Wireless. So anyone who is physically near us and has access to a linking database has access to where we live. I'll explore the implications of this going forward.

Conor concludes,

“I do believe that a more privacy enabled design of networking would have allowed for scenarios where MAC addresses were more dynamic and thus reducing the universal-ness and persistence of the MAC address itself…”

We both agree on this. And IPV6 has plenty of options that could make this possible. However, the current infrastructure is the one we live in, and one which is sorely in need of protections, mores and regulations. The fact that current technology allows the creation of Dr. No technology like that which Google StreetView WiFi has laid on the world doesn't mean that society should or will.

Google patent is a shocker

There are many who have assumed Google's WiFi snooping was “limited” to mapping of routers. However an article in Computerworld reporting on new developments in an Oregon class action law suit links to a patent application that speaks volumes about what is at stake here. The abstract begins (emphasis is mine):

“The invention pertains to location approximation of devices, e.g., wireless access points and client devices in a wireless network. “

By “client” the patent is referring to devices being used by you and your family. This interest in the family devices is exactly what I supposed – it is the natural conclusion you reach using the kind of thinking that drove the Street View WiFi initiative. The abstract continues,

“Location estimates may be obtained by observation/analysis of packets transmitted or received by the access point. For instance, data rate information associated with a packet is used to approximate the distance between a client device and the access point. This may be coupled with known positioning information to arrive at an approximate location for the access point. Confidence information and metrics about whether a device is an access point and the location of that device may also be determined…

“A location information database of access points may employ measurements from various devices over time. Such information may identify the location of client devices and provide location-based services to them. “

The system is actually doing measurements inside your house or business.

We will refer to these aspects of the plan when examining in further detail the potential harm the construction of massive MAC address databases can bring.

[Read the whole patent here]

What harm can possibly come from a MAC address?

If you are new to doing privacy threat analysis, I should explain that to do it, you need to be thoroughly pessimistic. A privacy threat analysis is in this sense no different from any other security threat analysis.

In our pessimistic frame of mind we then need to brainstorm from two different vantage points. The first is the vantage point of the party being attacked. How can people in various situations potentially be endangered by the new technology? The second is the vantage point of the attacker. How can people with different motivations misuse the technology? This is a long and complicated job, and should be done in great detail by those who propose a technology. The results should be published and vetted.

I haven't seen such publication or vetting by the proponents of world-wide WiFi packet collection and giant central databases of device identifiers. Perhaps the Street View team or someone else has such a study at hand – it would be great for them to share it.

In the meantime I'm just going to throw out a few simple initial ideas – things that are pretty obvious, by constructing a few scenerios.

SCENARIO: Collecting MAC Addresses is Legal and Morally Acceptable

In this scenario it is legal and socially acceptable to drive up and down the streets recording people's MAC addresses and other network traffic.

It is also fine for anyone to use a geolocation service to build his own database of MAC addresses and street addresses.

How could a collector could possibly get the software to do this? No problem. In this scenario, since the activity is legal and there is a demand, the software is freely available. In fact it is widely advertised on various Internet sites.

The collector builds his collection in the evenings, when people are at home with their WiFi enabled phones and computers. It doesn't take very long to assemble a really detailed map of all the devices used by the people who live in an entire neighborhood – perhaps a rich neighborhood.

Note that it would not matter whether people in the neighborhood have their WiFi encryption turned on or off – the drive by collector would be able to map their devices, since WiFi encryption does not hide the MAC address.

SCENARIO 2 – Collector is a sexual predator

In Scenario 1, anyone can be “a MAC collector”. In this scenario, the collector is a sexual predator.

When children pass him in the park, they have their phones and WiFi turned on and their MAC addresses are discernable by his laptop software. Normally the MAC addresses would be meaningless random numbers, but the collector has a complete database of what MAC addresses are associated with a given house address. It is therefore simple for the collection software on his laptop to automatically convert the WiFi packets emitted from the childrens’ phones into the street addresses where the children live, showing the locations on a map.

There is thus no need for the collector to go up to the children and ask them where they live. And it won't matter that their parents have taught them never to reveal that to a stranger. Their devices will have revealed it for them.

I can easily understand that some people might have problems with this example simply because so many questionable things have been justified through reference to predators. That's not a bandwagon I'm trying to get on.

I chose the example not only because I think it's real and exposes a threat, but because it reveals two important things germane to a threat analysis:

The motivations people have to abuse the technical mechanisms we put in place are pretty much unlimited.
We need to be able to empathize with people who are vulnerable – like children – rather than taking a “people deserve what they get” attitude.

Finally, I hope it is obvious I am not arguing Google is doing anything remotely on a par this example, I'm talking about something different: the matter of whether we want WiFi snooping to be something our society condones, and what some of the software that might come into being if we do.

Ben Adida releases me from the theatre

When I published Misuse of network identifiers was done on purpose, Ben Adida twittered that “Kim Cameron answers my latest post with some good points I need to think about…”. And he came through on that promise, even offering me a “Get out of theatre free” card:

“A few days ago, I wrote about Privacy Advocacy Theater and lamented how some folks, including EPIC and Kim Cameron, are attacking Google in a needlessly harsh way for what was an accidental collection of data. Kim Cameron responded, and he is right to point out that my argument, in the Google case, missed an important issue.

“Kim points out that two issues got confused in the flurry of press activity: the accidental collection of payload data, i.e. the URLs and web content you browsed on unsecured wifi at the moment the Google Street View car was driving by, and the intentional collection of device identifiers, i.e. the network hardware identifiers and network names of public wifi access points. Kim thinks the network identifiers are inherently more problematic than the payload, because they last for quite a bit of time, while payload data, collected for a few randomly chosen milliseconds, are quite ephemeral and unlikely to be problematic. [Just for the record, I didn't actually say “unlikely to be problematic” – Kim]

“Kim’s right on both points. Discussion of device identifiers, which I missed in my first post, is necessary, because the data collection, in this case, was intentional, and apparently was not disclosed, as documented in EPIC’s letter to the FCC. If Google is collecting public wifi data, they should at least disclose it. In their blog post on this topic, Google does not clarify that issue.

“So, Google, please tell us how long you’ve been collecting network identifiers, and how long you failed to disclose it. It may have been an oversight, but, given how much other data you’re collecting, it would really improve the public’s trust in you to be very precise here.”

Ben also says my initial post seems “to weave back and forth between both issues”. In fact I see payload and header being two parts of the same WiFi packet. Google “accidently” collected one part of the packet but collected the other part on purpose. I think it is really bizarre that a lot of technical people consider one part of the packet (emails and instant messages) to be private, and then for some irrational reason assume the other part of the same packet (the MAC address) is public. This makes no sense and as an architect it drives me nuts. Stealing one part of the WiFi packet is as bad as stealing another.

Ben also says,

“I agree that device privacy can be a big deal, especially when many people are walking around with RFIDs in their passports, pants, and with bluetooth headsets. But, in this particular case, is it a problem? If Google really only did collect the SSIDs of open, public networks that effectively invite anyone to connect to them and thus discover network name and device identifier, is that a violation of privacy, or of the Laws of Identity? I’m having trouble seeing the harm or the questionable act. Once again, these are public/open WiFi networks.”

Let me be clear: If Google or any other operator only collected the SSIDs of “open, public networks that invite anyone to connect to them” there would be zero problem from the point of view of the Laws of Identity. They would, in the terminology of Law Four, be collecting “universal identifiers”.

But when you drive down a street, the vast majority of networks you encounter are NOT public, and are NOT inviting just anyone to connect to them. The routers emit packets so the designated users of the network can connect to them, not so others can connect to them, hack them, map them or use them for commercial purposes. If one is to talk about intent, the intent is for private, unidirectional identifiers to be used within a constrained scope.

In other words, as much as I wish I didn't have to do so, I must strongly dispute Ben's assertion that “Once again, these are public/open WiFi networks” and insist that private identifiers are being misappropriated.

In matters of eavesdropping I subscribe to EPIC's argument that proving harm is not essential – it is the eavesdropping itself which is problematic. However, in my next post I'll talk about harm, and the problems of a vast world-wide system capable of inference based on use of device identifiers.