June 6, 2010 – Kim Cameron's Identity Weblog

There is a fundamental problem here

Joe Mansfield at Peccavi has done a very cogent post where, though he agrees with my concerns, he criticizes me for picking almost exclusively on Google when there are lots of others who have been doing the same thing. He's right – I have been too narrowly focused.

Let me be clear: I have great respect for Google and many of its accomplishments. I have a disagreement with a particular Google team.

I find the Google Street View team's abuse of identifiers especially worrisome because they have not only been collecting info about WiFi access points, but the MAC addresses of peoples’ personal devices (laptops and phones).

This bothers me because I see it as dangerous. It's like going over to visit a neighbor and finding out he's been building a nuclear reactor in his basement.

I'm not an expert on the geolocation industry and I have no knowledge of whether this kind of end-user-device-snooping is commonplace. If it is, then let me know. Everything I have said about Google applies equally to any similar practitioners.

But let's get to Peccavi which makes the point better than I do:

I’ve been following Kim Cameron’s increasingly critical analysis of Google’s StreetView WiFi mapping data privacy debacle with some interest of late.

Some background might be in order for those interested in reading where he’s been coming from – start here and work forward. He’s been quite vocal and directed in his criticism and I have been surprised that his focus has been almost entirely on Google rather than on the underlying technical root cause. My initial view on the issue was that it was a stupid over-reaction to something that everyone has been doing for years, and that at least Google were being open about having logged too much data. I’m still of the opinion that the targeting of Google specifically is off base here, although I think Kim is right that there is a fundamental problem here.

Kim is probably the pre-eminent proponent and defender of strong authentication and privacy on the net at the moment. His Laws of Identity should be mandatory reading for anyone working with user data in any sort of context but especially for anyone working with online systems. He’s a hugely influential thought leader for doing the right thing and as a key technical leader within Microsoft he’s doing more than almost anyone else to lay the groundwork for a move away from our current reliance on insecure, privacy leaking methods of authentication. Let’s just say that I’m a fan.

For obvious reasons he has spotted the huge privacy problems associated with the practice of gathering WiFi SSID and MAC addresses and using them to create large scale geo-location databases. There are serious privacy issues here and despite my initial cynicism about this perhaps it’s a good thing that there has been a huge furore over what Google were doing.

Note that there were two issues in play here – the intentional data (the SSID’s, MAC addresses and geo-location info) and the unintentional data (actual user payloads). I’m only going to talk about the intentionally harvested data right now because that is the much trickier problem – few people would argue that having Google (or anyone) logging actual WiFi traffic from their homes is OK.

The problem that I see with Kim’s general position on this and the focus on Google’s activities alone is that he’s not seeing the wood for the trees. The problem of companies or individuals harvesting this data is minor compared to the problem that enables it. The technical standards that we all use to connect wirelessly with the endless array of devices that we all now have in our homes, use at work and carry on our person every day are promiscuous communicators of identifiers that can be easily and extensively misused. Even if Google are prevented by law from doing it, if the standards aren’t changed then someone else will…

I agree with almost every point made except, “The problem of companies or individuals harvesting this data is minor compared to the problem that enables it.” I would put it differently. I would say, “There are two problems. Both are bad.”

We're technologists so we immediately look to technology to prevent abuse. This is the right instinct for us to have. But societly can use disincentives too. I've come to believe that technology must belong to society as a whole. And we need a combination of technical solutions and those society can impose.

I actually think I see at least some of the woods as well as the trees. That is what the Fourth Law is all about. Of course I want to change the underlying technology as fast as we can.

But I don't think that will happen unless there is a MUCH greater understanding of the issues, and I've been trying with this set of posts to get them onto the table.

[More Peccavi here.]

How to prevent wirelesstapping

Responding to “What harm can possibly come from a MAC address“, Hal Berenson writes:

“The real problem here is technological not legal. You could ban collecting SSIDs and MAC addresses and why would it matter? Your sexual predator scenario wouldn’t be prevented (as (s)he is already committing a far more heinous crime it just isn’t going to deter them). The real problem is that WIFI (a) still doesn’t encrypt properly and (b) nearly all public hotspots avoid encryption altogether. I’ll almost leave (b) alone because it is so obvious, yet despite that we have companies like AT&T pushing us (by eliminating unlimited data plans) to use hotspots rather than their (better) protected 3G access.

“Sure my iPad connects nicely via WIFI when I’m in the United Red Carpet Club, but it also leaves much of my communications easily intercepted (3G may be vulnerable, but it does take some expertise and special equipment to set up my own cell). But what the *&#$#&*^$ is going on with encrypted WIFI not encrypting the MAC addresses? If something needs to be exposed it should be a locally unique address, not a globally unique one! I seem to recall that when I first looked at cryptography in the early 70s I read articles about how traffic analysis on encrypted data was nearly as useful as being able to decrypt the data itself. There were all kinds of examples of tracking troop movements, launch orders, etc. using traffic analysis. It is almost 40 years later and we still haven’t learned our lesson.”

I assume Hal is using “*&#$#&*^$” as a form of encryption. Anyway, I totally agree with the technical points being made. WIreless networks used the static MAC concept they inherited from wired systems in order to facilitate interoperability with them. Designers didn't think the fact that the MAC addresses would be visible to eavesdroppers would be very important – the payload was all they cared about. As I said in the Fourth Law of Identity:

Bluetooth and other wireless technologies have not so far conformed to the fourth law. They use public beacons for private entities.

I'd love to figure out how we would get agreement on “fixing” the wireless infrastructure. But one thing is for sure: it is really hard and would take a while! I don't think, in the meantime, we should simply allow our private space to be invaded. Just because technology allows theft of the identifiers doesn't mean society should.

Similarly, in reference to the predator scenario, the fact that laws don't prevent crime has never meant there shouldn't be laws. Regulation of “wirelesstapping” would make the emergence of this new kind of crime less likely.

MAC addresses will be used to reveal where you live

Conor has responded to my comments on why house numbers don't make a good metaphor for MAC addresses.

He writes that when I characterized house number as a “universal identifier”,

[Kim's argument] “confuses house address with house number. A house number is not able to be used as a universal identifier (I presume that there are many houses out there with the number 15, even in the same town, many times even on the same street in the same zip code (where the only difference is the N.W. and S.E. on the end of the street name).

“Like SSIDs and MAC addresses, the house number is only usable as an identifier once you get to the neighborhood and very often only once you get to the street.”

I like Conor's distinction between house number and house address. It's true there are many houses with the number 15, thus the house number is a local identiier, and only becomes universal when combined with the street name, the city, and so on. I hadn't understood that this is what he was trying to say.

Then Conor continues:

“I will admit that there are some differences with the MAC address because of how basic Ethernet networking was designed. The MAC address is designed to be unique (though, those in networking know that this isn't always the case and in fact most devices let you override the mac address anytime you want). So this could be claimed to be some form of a universal identifier. However, it's not at all usable outside of the local neighborhood. There is no way for me to talk to a particular MAC address unless I am locally on the same network with that device.”

Conor is completely right here. In networking as we have known it, the MAC address is not usable outside the “local network neighborhood”. But that is exactly what this WiFi snooping is about to change. In fact this is very much the core of what I'm talking about.

MAC addresses will be used to reveal where you live

Once you have snooped peoples’ MAC addresses, and put them into a database linking them to “where they live” (literally), you have dramatically changed the way network identifiers work.

In this new world, armed with such a database, if you see a MAC address somewhere – anywhere – you can look it up in your database – precisely because it is unique – and see where “it lives”. When I say, “where it lives”, I don't mean what network it belongs to. I mean where it is normally located in physical space – as a street address.

Is there some way to opt out of this? No – other than turning everything off. Unfortunately, given the way networks are designed, we have no choice but to reveal our MAC address when we use our Wireless. So anyone who is physically near us and has access to a linking database has access to where we live. I'll explore the implications of this going forward.

Conor concludes,

“I do believe that a more privacy enabled design of networking would have allowed for scenarios where MAC addresses were more dynamic and thus reducing the universal-ness and persistence of the MAC address itself…”

We both agree on this. And IPV6 has plenty of options that could make this possible. However, the current infrastructure is the one we live in, and one which is sorely in need of protections, mores and regulations. The fact that current technology allows the creation of Dr. No technology like that which Google StreetView WiFi has laid on the world doesn't mean that society should or will.

Google patent is a shocker

There are many who have assumed Google's WiFi snooping was “limited” to mapping of routers. However an article in Computerworld reporting on new developments in an Oregon class action law suit links to a patent application that speaks volumes about what is at stake here. The abstract begins (emphasis is mine):

“The invention pertains to location approximation of devices, e.g., wireless access points and client devices in a wireless network. “

By “client” the patent is referring to devices being used by you and your family. This interest in the family devices is exactly what I supposed – it is the natural conclusion you reach using the kind of thinking that drove the Street View WiFi initiative. The abstract continues,

“Location estimates may be obtained by observation/analysis of packets transmitted or received by the access point. For instance, data rate information associated with a packet is used to approximate the distance between a client device and the access point. This may be coupled with known positioning information to arrive at an approximate location for the access point. Confidence information and metrics about whether a device is an access point and the location of that device may also be determined…

“A location information database of access points may employ measurements from various devices over time. Such information may identify the location of client devices and provide location-based services to them. “

The system is actually doing measurements inside your house or business.

We will refer to these aspects of the plan when examining in further detail the potential harm the construction of massive MAC address databases can bring.

[Read the whole patent here]

What harm can possibly come from a MAC address?

If you are new to doing privacy threat analysis, I should explain that to do it, you need to be thoroughly pessimistic. A privacy threat analysis is in this sense no different from any other security threat analysis.

In our pessimistic frame of mind we then need to brainstorm from two different vantage points. The first is the vantage point of the party being attacked. How can people in various situations potentially be endangered by the new technology? The second is the vantage point of the attacker. How can people with different motivations misuse the technology? This is a long and complicated job, and should be done in great detail by those who propose a technology. The results should be published and vetted.

I haven't seen such publication or vetting by the proponents of world-wide WiFi packet collection and giant central databases of device identifiers. Perhaps the Street View team or someone else has such a study at hand – it would be great for them to share it.

In the meantime I'm just going to throw out a few simple initial ideas – things that are pretty obvious, by constructing a few scenerios.

SCENARIO: Collecting MAC Addresses is Legal and Morally Acceptable

In this scenario it is legal and socially acceptable to drive up and down the streets recording people's MAC addresses and other network traffic.

It is also fine for anyone to use a geolocation service to build his own database of MAC addresses and street addresses.

How could a collector could possibly get the software to do this? No problem. In this scenario, since the activity is legal and there is a demand, the software is freely available. In fact it is widely advertised on various Internet sites.

The collector builds his collection in the evenings, when people are at home with their WiFi enabled phones and computers. It doesn't take very long to assemble a really detailed map of all the devices used by the people who live in an entire neighborhood – perhaps a rich neighborhood.

Note that it would not matter whether people in the neighborhood have their WiFi encryption turned on or off – the drive by collector would be able to map their devices, since WiFi encryption does not hide the MAC address.

SCENARIO 2 – Collector is a sexual predator

In Scenario 1, anyone can be “a MAC collector”. In this scenario, the collector is a sexual predator.

When children pass him in the park, they have their phones and WiFi turned on and their MAC addresses are discernable by his laptop software. Normally the MAC addresses would be meaningless random numbers, but the collector has a complete database of what MAC addresses are associated with a given house address. It is therefore simple for the collection software on his laptop to automatically convert the WiFi packets emitted from the childrens’ phones into the street addresses where the children live, showing the locations on a map.

There is thus no need for the collector to go up to the children and ask them where they live. And it won't matter that their parents have taught them never to reveal that to a stranger. Their devices will have revealed it for them.

I can easily understand that some people might have problems with this example simply because so many questionable things have been justified through reference to predators. That's not a bandwagon I'm trying to get on.

I chose the example not only because I think it's real and exposes a threat, but because it reveals two important things germane to a threat analysis:

The motivations people have to abuse the technical mechanisms we put in place are pretty much unlimited.
We need to be able to empathize with people who are vulnerable – like children – rather than taking a “people deserve what they get” attitude.

Finally, I hope it is obvious I am not arguing Google is doing anything remotely on a par this example, I'm talking about something different: the matter of whether we want WiFi snooping to be something our society condones, and what some of the software that might come into being if we do.