The core of the matter at hand

We've explored many of the basic issues of WiFi snooping.  I would now like to go directly to the core of the matter: why do large centralized databases of MAC addresses linked to our street addresses have really serious consequences for peoples’ privacy?  I'd like to approach this through an example:

Consider the case of someone attending a conference at which people are using laptops and phones over a wireless network.  We picture the devices within range of a given attendee in Figure 1:

The green dot represents the WiFi access point through which conference attendees gain access to the Internet.  For now, let's assume this is a permanent WiFi network.  Let's therefore assume its MAC address and location are present within the linking database that also contains our residential MAC to street address mapping.

Now suppose one or more people at the conference have opted into a geo-location service that makes use of the database.  And let's assume that the way this service works is to listen for nearby MAC addresses (all the little circles in the figure) and submit them to the geo-location system for analysis.

The geo-location system will learn that the opted-in user (let's call him Red) is near the given WiFi point, and thus will know Red is at a given location.  If the geo-location system is also capable of searching the web (as one would expect that Google's could), it will also be able to infer that Red is in a given hotel, and that the hotel is hosting a conference C on the date in question. 

If Red stays in the same location for some time, and is surrounded by a number of other people who are in the same location (discernable because their MAC addresses continue to be near by), the smart service will be able to infer that Red is attending conference C being held in the hotel. 

So far, there's nothing wrong with this, since Red has opted in to the geo-location service, and presumably been told that's how it works.

However, note that the geo-location system also learns about the MAC addresses of all the attendees within range who have NOT opted into the system (Green).  And if they remain within range over time, it can also deduce that they too are present at conference C.  Further, it can look up their MAC addresses in the database to discover their street addresses.  This in turn can be used to make many inferences about who the attendees at the conference are, since a lot of information is keyed to their street addresses.  That can itself become further profile information.

Opting out doesn't help

The problem here is this:  The geo-location system is perfectly capable of tracking your location and associating it with your home street address whether you opt in or not.  Home address is a key to many aspects of your identity.  Presto – you have linked many aspects of your identity to your location, and this becomes intellectual property that the geo-location can service sell and benefit from in a myriad of ways.

Is this the way any particular geo-location services would actually work?  I have no idea.  But that's not the point.  The point is that this is the capability one enables by building the giant central database of laptop and phone MAC addresses linked to street addresses.

Commercial interest will naturally tend towards maximal use of these capabilities and the information at hand. 

That is why we need to fully understand the implications of wirelesstapping on a massive scale and figure out if and where we want to draw the line.  How does the collection of MAC addresses using WiFi trucks relate to the regulations involving data collection, proportionality and consent?  Are there limits on the usage of this data? 

One thing for sure.  Breaking the Fourth Law, and turning a unidirectional identifier into a universal identifier is like the story of the Sorcerer's Apprentice.  All the brooms have started dancing.  I wonder if Mickey will get out of this one?

 

More input and points of view

Dave Nikolejsin, CIO of British Columbia and a man who sees identity as the key to efficient government, writes:

“I agree with your comments and focus on the MAC layer data collection going on with Google. One observation I would have re all the “other” similar type activity would be that no others have Google’s resources and thus no others are doing systematic sweep of the western world on such a data gathering mission. As we all know the value of data increases in an N-squared manner and the “N” once Google is done will be a big number.”

Dave goes on to compare wirelesstapping with Facebook's privacy problems and makes what I think is a very insightful comment:

At least (for all its warts) we actually willingly give our info to Facebook!

Heavy duty SOA architect Gunnar Peterson (an expert on Service-Oriented Security) condenses our discussion to date and comes out strongly in favor of the arguments I've been making with regards to the wirelesstapping of MAC addresses: 

Google's Macondo Street View team cannot seem to get the right combination of top kill or cap to fit on its MAC spillage. Your MAC is not like a house number (which everyone knows and are used for many purposes), MAC address is scoped to one use. There's no harm in collecting MACs, the hell you say, there's a number of evil emergent cocktails that can come out of this. Its not so much the MAC itself, its the association of the MAC and the gelocation and time – combining something unique like MAC with geolocation.

This looked like a rogue team (or as Google put it last week a “rogue software engineer“) until this shocking announcement that Google is patenting (emphasis added) – “The invention pertains to location approximation of devices, e.g., wireless access points and client devices in a wireless network.”

It seems pretty obvious that any number of permutations of problems  will result by combining private client data and geolocation. Maybe Google books should scan a copy of J.C. Cannon's book “Privacy: What Developers and IT Professionals Should Know” and Stefan Brands Primer on User Identification.  In both works you see the risks of promiscuously mixing identification cocktails and the unexpected leakages that result. In addition, what does benefit to the user who is being spied upon does all this spying create…?

[More here]

It's true that J.C. and Stefan both do a great job of helping explain the issues at play here, and I advise people who want to understand the issues better to check out their work.

Hal Berenson got back to me after my response to him, saying,

One thing I would suggest is that you write language attempting to ban what you want to ban and let the rest of us poke holes in it  – meaning, show all the legitimate scenarios you would make illegal or accidental criminals you would create… 🙂

I have total sympathy with this concern of course.  We want the minimum intervention necessary.  But our society has come to realize there are many instances where consumers and citizens need be protected in various ways.  In introducing these protections, lawmakers had to deal with exactly the same difficult concerns about balancing rights.  The good news here:  our legal system seems to handle this just fine.  In fact, that's what it is about.   So I will leave the crafting of the appropriate disincentives to professionals.

Ted Howard, who has broad experience including in the Games industry, commented,

Regarding the issues of Google's collection of MAC addresses and wireless SSIDs:

“If I leave my blinds open so that I can get sun into my home, that means I have no problem with anyone walking past my house pausing to watch me in my home. When I talk with a friend while walking in the park, that means I cannot be bothered by a stranger walking alongside us listening to every word. In both cases, am I “publicly broadcasting” something or is the broadcast just a side effect of my activities? The analogy should be clear.

 

“Do a survey to see how many wireless router owners understand what MAC and SSID are. I suspect that very few (< 5%) people know. If they don't know what these are, then how can anyone claim that these people have been intentionally publicly broadcasting these with an understanding that the broadcast has become publicly-available knowledge? Government regulation exists to, among other reasons, protect the public when the public needs protection and would otherwise be unprotected. This seems to me like a good case for protecting the truly ignorant public.

Journalist Mary Branscombe comments:

For me there are 3 big questions.

  1. How much info did or could someone capture; and by the way Google is in the data capture and data mining/machine learning business and that data *has* been used because if they didn't know it was there they didn't know they had to exclude it.
  2. How personally identifiable or anonymised that information is (i don't think my phone has a bunch of captured MAC addresses and if it does I don't think it has any pii about them but I may be wrong.
  3. How much people care. What is public or private about my SSID or about my MAC address? (I honestly don't know is how much you can find out about me from my MAC address but I'm assuming not much; if I'm wrong that's a data point! I know in 2007 Skyhook told me they were confident they had privacy cracked but they would and they're not the only people and they're the good guys…)

Location services is a *huge* business and people are oversharing location information for trivial rewards (see Foursquare). 8000 apps use Skyhook location data. It's Facebook with co-ordinates. I don't think we can not do this – but I think we can regulate and do it more safely.

In answer to Mary's question two, systems like the Street View Wifi system exist to map peoples’ device MAC addresses to their residential address, as described in the Google patent.  Her point about big business is a BIG POINT.  But to me it just increases the urgency to give people the geo-location capabilities they want without creating a privacy chernoble that will explode down the line.

Jan and Susan Huffman write,

“Standing naked in my front yard is like broadcasting my MAC address.  If I don’t want people to look at me naked in my front yard, I wear clothes. I don’t ask that the law punish those who might take a look through the bushes.

My response:  your MAC address is visible in your wireless packets no matter what you do.  Turning on encryption doesn't help.  So there are no clothes to put on.  The analogy with clothes and bushes therefore just doesn't stand up.  Furthermore, in most neighborhoods, if you spend your time trawling the neighborhood and peeking through bushes you end up with people giving you a pretty hard time…  Maybe that's what's happening here.

 

There is a fundamental problem here

Joe Mansfield at Peccavi has done a very cogent post where, though he agrees with my concerns, he criticizes me for picking almost exclusively on Google when there are lots of others who have been doing the same thing.  He's right – I have been too narrowly focused. 

Let me be clear:  I have great respect for Google and many of its accomplishments.   I have a disagreement with a particular Google team.

I find the Google Street View team's abuse of identifiers especially worrisome because they have not only been collecting info about WiFi access points, but the MAC addresses of peoples’ personal devices (laptops and phones).  

This bothers me because I see it as dangerous.  It's like going over to visit a neighbor and finding out he's been building a nuclear reactor in his basement. 

 I'm not an expert on the geolocation industry and I have no knowledge of whether this kind of end-user-device-snooping is commonplace.  If it is, then let me know.  Everything I have said about Google applies equally to any similar practitioners. 

But let's get to Peccavi which makes the point better than I do:

I’ve been following Kim Cameron’s increasingly critical analysis of Google’s StreetView WiFi mapping data privacy debacle with some interest of late.

Some background might be in order for those interested in reading where he’s been coming from – start here and work forward. He’s been quite vocal and directed in his criticism and I have been surprised that his focus has been almost entirely on Google rather than on the underlying technical root cause. My initial view on the issue was that it was a stupid over-reaction to something that everyone has been doing for years, and that at least Google were being open about having logged too much data. I’m still of the opinion that the targeting of Google specifically is off base here, although I think Kim is right that there is a fundamental problem here.

Kim is probably the pre-eminent proponent and defender of strong authentication and privacy on the net at the moment. His Laws of Identity should be mandatory reading for anyone working with user data in any sort of context but especially for anyone working with online systems. He’s a hugely influential thought leader for doing the right thing and as a key technical leader within Microsoft he’s doing more than almost anyone else to lay the groundwork for a move away from our current reliance on insecure, privacy leaking methods of authentication. Let’s just say that I’m a fan.

For obvious reasons he has spotted the huge privacy problems associated with the practice of gathering WiFi SSID and MAC addresses and using them to create large scale geo-location databases. There are serious privacy issues here and despite my initial cynicism about this perhaps it’s a good thing that there has been a huge furore over what Google were doing.

Note that there were two issues in play here – the intentional data (the SSID’s, MAC addresses and geo-location info) and the unintentional data (actual user payloads). I’m only going to talk about the intentionally harvested data right now because that is the much trickier problem – few people would argue that having Google (or anyone) logging actual WiFi traffic from their homes is OK.

The problem that I see with Kim’s general position on this and the focus on Google’s activities alone is that he’s not seeing the wood for the trees. The problem of companies or individuals harvesting this data is minor compared to the problem that enables it. The technical standards that we all use to connect wirelessly with the endless array of devices that we all now have in our homes, use at work and carry on our person every day are promiscuous communicators of identifiers that can be easily and extensively misused. Even if Google are prevented by law from doing it, if the standards aren’t changed then someone else will…

I agree with almost every point made except, “The problem of companies or individuals harvesting this data is minor compared to the problem that enables it.”  I would put it differently.  I would say, “There are two problems.  Both are bad.”

We're technologists so we immediately look to technology to prevent abuse.  This is the right instinct for us to have.  But societly can use disincentives too.  I've come to believe that technology must belong to society as a whole.  And we need a combination of  technical solutions and those society can impose.

I actually think I see at least some of the woods as well as the trees.  That is what the Fourth Law is all about.  Of course I want to change the underlying technology as fast as we can. 

But I don't think that will happen unless there is a MUCH greater understanding of the issues, and I've been trying with this set of posts to get them onto the table.    

[More Peccavi here.]

 

How to prevent wirelesstapping

Responding to “What harm can possibly come from a MAC address“, Hal Berenson writes:

“The real problem here is technological not legal. You could ban collecting SSIDs and MAC addresses and why would it matter? Your sexual predator scenario wouldn’t be prevented (as (s)he is already committing a far more heinous crime it just isn’t going to deter them). The real problem is that WIFI (a) still doesn’t encrypt properly and (b) nearly all public hotspots avoid encryption altogether. I’ll almost leave (b) alone because it is so obvious, yet despite that we have companies like AT&T pushing us (by eliminating unlimited data plans) to use hotspots rather than their (better) protected 3G access.

“Sure my iPad connects nicely via WIFI when I’m in the United Red Carpet Club, but it also leaves much of my communications easily intercepted (3G may be vulnerable, but it does take some expertise and special equipment to set up my own cell). But what the *&#$#&*^$ is going on with encrypted WIFI not encrypting the MAC addresses? If something needs to be exposed it should be a locally unique address, not a globally unique one! I seem to recall that when I first looked at cryptography in the early 70s I read articles about how traffic analysis on encrypted data was nearly as useful as being able to decrypt the data itself. There were all kinds of examples of tracking troop movements, launch orders, etc. using traffic analysis. It is almost 40 years later and we still haven’t learned our lesson.”

I assume Hal is using “*&#$#&*^$” as a form of encryption.  Anyway, I totally agree with the technical points being made.  WIreless networks used the static MAC concept they inherited from wired systems in order to facilitate interoperability with them.  Designers didn't think the fact that the MAC addresses would be visible to eavesdroppers would be very important – the payload was all they cared about.   As I said in the Fourth Law of Identity:

Bluetooth and other wireless technologies have not so far conformed to the fourth law. They use public beacons for private entities.

I'd love to figure out how we would get agreement on “fixing” the wireless infrastructure.  But one thing is for sure:  it is really hard and would take a while!  I don't think, in the meantime, we should simply allow our private space to be invaded.  Just because technology allows theft of the identifiers doesn't mean society should.

Similarly, in reference to the predator scenario, the fact that laws don't prevent crime has never meant there shouldn't be laws.  Regulation of “wirelesstapping” would make the emergence of this new kind of crime less likely.

 

MAC addresses will be used to reveal where you live

Conor has responded to my comments on why house numbers don't make a good metaphor for MAC addresses.

He writes that when I characterized house number as a “universal identifier”,

[Kim's argument] “confuses house address with house number. A house number is not able to be used as a universal identifier (I presume that there are many houses out there with the number 15, even in the same town, many times even on the same street in the same zip code (where the only difference is the N.W. and S.E. on the end of the street name).

“Like SSIDs and MAC addresses, the house number is only usable as an identifier once you get to the neighborhood and very often only once you get to the street.”

I like Conor's distinction between house number and house address.  It's true there are many houses with the number 15, thus the house number is a local identiier, and only becomes universal when combined with the street name, the city, and so on.  I hadn't understood that this is what he was trying to say.

Then Conor continues:

“I will admit that there are some differences with the MAC address because of how basic Ethernet networking was designed. The MAC address is designed to be unique (though, those in networking know that this isn't always the case and in fact most devices let you override the mac address anytime you want). So this could be claimed to be some form of a universal identifier. However, it's not at all usable outside of the local neighborhood. There is no way for me to talk to a particular MAC address unless I am locally on the same network with that device.”

Conor is completely right here.  In networking as we have known it, the MAC address is not usable outside the “local network neighborhood”.  But that is exactly what this WiFi snooping is about to change.  In fact this is very much the core of what I'm talking about.

MAC addresses will be used to reveal where you live

Once you have snooped peoples’ MAC addresses, and put them into a database linking them to “where they live” (literally),  you have dramatically changed the way network identifiers work.

In this new world, armed with such a database, if you see a MAC address somewhere – anywhere – you can look it up in your database – precisely because it is unique – and see where “it lives”.   When I say, “where it lives”, I don't mean what network it belongs to.  I mean where it is normally located in physical space – as a street address.  

Is there some way to opt out of this?  No – other than turning everything off.  Unfortunately, given the way networks are designed, we have no choice but to reveal our MAC address when we use our Wireless.  So  anyone who is physically near us and has access to a linking database has access to where we live. I'll explore the implications of this going forward.

Conor concludes,

“I do believe that a more privacy enabled design of networking would have allowed for scenarios where MAC addresses were more dynamic and thus reducing the universal-ness and persistence of the MAC address itself…”

We both agree on this.  And IPV6 has plenty of options that could make this possible.  However, the current infrastructure is the one we live in, and one which is sorely in need of protections, mores and regulations.  The fact that current technology allows the creation of Dr. No technology like that which Google StreetView WiFi has laid on the world doesn't mean that society should or will.

 

 

Google patent is a shocker

There are many who have assumed Google's WiFi snooping was “limited” to mapping of routers.  However an article in Computerworld reporting on new developments in an Oregon class action law suit links to a patent application that speaks volumes about what is at stake here.  The abstract begins (emphasis is mine):

“The invention pertains to location approximation of devices, e.g., wireless access points and client devices in a wireless network. “

By “client” the patent is referring to devices being used by you and your family.  This interest in the family devices is exactly what I supposed – it is the natural conclusion you reach using the kind of thinking that drove the Street View WiFi initiative.  The abstract continues,

“Location estimates may be obtained by observation/analysis of packets transmitted or received by the access point. For instance, data rate information associated with a packet is used to approximate the distance between a client device and the access point. This may be coupled with known positioning information to arrive at an approximate location for the access point. Confidence information and metrics about whether a device is an access point and the location of that device may also be determined…

“A location information database of access points may employ measurements from various devices over time. Such information may identify the location of client devices and provide location-based services to them. “

The system is actually doing measurements inside your house or business.

We will refer to these aspects of the plan when examining in further detail the potential harm the construction of massive MAC address databases can bring.

 [Read the whole patent here]

What harm can possibly come from a MAC address?

If you are new to doing privacy threat analysis, I should explain that to do it, you need to be thoroughly pessimistic.  A privacy threat analysis is in this sense no different from any other security threat analysis.  

In our pessimistic frame of mind we then need to brainstorm from two different vantage points.  The first is the vantage point of the party being attacked.  How can people in various situations potentially be endangered by the new technology?  The second is the vantage point of the attacker.  How can people with different motivations misuse the technology?  This is a long and complicated job, and should be done in great detail by those who propose a technology.  The results should be published and vetted.

I haven't seen such publication or vetting by the proponents of world-wide WiFi packet collection and giant central databases of device identifiers.  Perhaps the Street View team or someone else has such a study at hand – it would be great for them to share it.

In the meantime I'm just going to throw out a few simple initial ideas – things that are pretty obvious, by constructing a few scenerios.

SCENARIO:  Collecting MAC Addresses is Legal and Morally Acceptable

In this scenario it is legal and socially acceptable to drive up and down the streets recording people's MAC addresses and other network traffic.   

It is also fine for anyone to use a geolocation service to build his own database of MAC addresses and street addresses. 

How could a collector could possibly get the software to do this?  No problem.  In this scenario, since the activity is legal and there is a demand, the software is freely available.  In fact it is widely advertised on various Internet sites.

The collector builds his collection in the evenings, when people are at home with their WiFi enabled phones and computers.  It doesn't take very long to assemble a really detailed map of all the devices used by the people who live in an entire neighborhood  – perhaps a rich neighborhood

Note that it would not matter whether people in the neighborhood have their WiFi encryption turned on or off – the drive by collector would be able to map their devices, since WiFi encryption does not hide the MAC address.

SCENARIO 2 – Collector is a sexual predator

In Scenario 1, anyone can be “a MAC collector”.  In this scenario, the collector is a sexual predator.

When children pass him in the park, they have their phones and WiFi turned on and their MAC addresses are discernable by his laptop software.  Normally the MAC addresses would be meaningless random numbers, but the collector has a complete database of what MAC addresses are associated with a given house address.  It is therefore simple for the collection software on his laptop to automatically convert the WiFi packets emitted from the childrens’ phones into the street addresses where the children live, showing the locations on a map.

There is thus no need for the collector to go up to the children and ask them where they live.  And it won't matter that their parents have taught them never to reveal that to a stranger.  Their devices will have revealed it for them.

I can easily understand that some people might have problems with this example simply because so many questionable things have been justified through reference to predators.  That's not a bandwagon I'm trying to get on. 

I chose the example not only because I think it's real and exposes a threat, but because it reveals two important things germane to a threat analysis:

  • The motivations people have to abuse the technical mechanisms we put in place are pretty much unlimited. 
  • We need to be able to empathize with people who are vulnerable – like children – rather than taking a “people deserve what they get” attitude.   

Finally, I hope it is obvious I am not arguing Google is doing anything remotely on a par this example,  I'm talking about something different: the matter of whether we want WiFi snooping to be something our society condones, and what some of the software that might come into being if we do.

 

Are SSIDs and MAC addresses like house numbers?

Architect Conor Cahill writes:

Kim's assertion that Google was wrong to do so is based upon two primary factors:

  • Google intended to capture the SSID and MAC address of the access points
  • SSIDs and MAC addresses are persistent identifiers

And it seems that this has at least gotten Ben re-thinking his assertion that this was all about privacy theater and even him giving Kim a get-out-of-jail-free card.

While I agree that Kim's asserted facts are true, I disagree with his conclusion.

  • I don't believe Google did anything wrong in collecting SSIDs and MAC addresses (capturing data, perhaps). The SSIDs were configured to *broadcast* (to make something known widely). However, SSIDs and MAC addresses are local identifiers more like house numbers. They identify entities within the local wireless network and are generally not re-transmitted beyond that wireless network.
  • I don't believe that what they did had an impact on the user's privacy. As I pointed out above, it's like capturing house numbers and associating them with a location. That, in itself, has little to do with the user's privacy unless something else associates the location with the user…

Let's think about this.  Are SSIDs and MAC addresses like house numbers?

Your house number is used – by anyone in the world who wants to find it – to get to your house.  Your house was given a number for that purpose.  The people who live in the houses like this.  They actually run out and buy little house number things, and nail them up on the side of their houses, to advertise clearly what number they are.

So let's see:

  1. Are SSIDS and MAC addresses used by anyone in the world to get through to your network?  No.  A DNS name would be used for that.  In residential neighborhoods, you employ a SSID for only one reason – to make it easier to get wireless working for members of your family and their visitors.  Your intent is for the wireless access point's MAC address to be used only by your family's devices, and the MACs of their devices only by the other devices in the house.
  2. Were SSIDS and MAC addressed invented to allow anyone in the world to find the devices in your house?   No, nothing like that.  The MAC is used only within the confines of the local network segment.
  3. Do people consciously try to advertise their SSIDs and MAC addresses to the world by running to the store, buying them, and nailing them to their metaphorical porches?  Nope again.  Zero analogy.

So what is similar?  Nothing. 

That's because house addresses are what, in Law Four of the Laws of Identity, were called “universal identifiers”, while SSIDs and MAC addresses are what were called “unidirectional identifiers” – meaning that they were intended to be constrained to use in a single context. 

Keeping “unidirectional identifiers” private to their context is essential for privacy.  And let me be clear: I'm not refering only to the privacy of individuals, but also that of enterprises, governments and organizations.  Protecting unidirectional identifiers is essential for building a secure and trustworthy Internet.

 

Ben Adida releases me from the theatre

When I published Misuse of network identifiers was done on purposeBen Adida  twittered that “Kim Cameron answers my latest post with some good points I need to think about…”.  And he came through on that promise, even offering me a “Get out of theatre free” card:

“A few days ago, I wrote about Privacy Advocacy Theater and lamented how some folks, including EPIC and Kim Cameron, are attacking Google in a needlessly harsh way for what was an accidental collection of data.  Kim Cameron responded, and he is right to point out that my argument, in the Google case, missed an important issue.

“Kim points out that two issues got confused in the flurry of press activity: the accidental collection of payload data, i.e. the URLs and web content you browsed on unsecured wifi at the moment the Google Street View car was driving by, and the intentional collection of device identifiers, i.e. the network hardware identifiers and network names of public wifi access points.  Kim thinks the network identifiers are inherently more problematic than the payload, because they last for quite a bit of time, while payload data, collected for a few randomly chosen milliseconds, are quite ephemeral and unlikely to be problematic.  [Just for the record, I didn't actually say “unlikely to be problematic” – Kim]

“Kim’s right on both points. Discussion of device identifiers, which I missed in my first post, is necessary, because the data collection, in this case, was intentional, and apparently was not disclosed, as documented in EPIC’s letter to the FCC. If Google is collecting public wifi data, they should at least disclose it. In their blog post on this topic, Google does not clarify that issue.

“So, Google, please tell us how long you’ve been collecting network identifiers, and how long you failed to disclose it. It may have been an oversight, but, given how much other data you’re collecting, it would really improve the public’s trust in you to be very precise here.”

Ben also says my initial post seems “to weave back and forth between both issues”.  In fact I see payload and header being two parts of the same WiFi packet.  Google “accidently” collected one part of the packet but collected the other part on purpose.  I think it is really bizarre that a lot of technical people consider one part of the packet (emails and instant messages) to be private, and then for some irrational reason assume the other part of the same packet (the MAC address) is public.  This makes no sense and as an architect it drives me nuts.  Stealing one part of the WiFi packet is as bad as stealing another.

Ben also says,

“I agree that device privacy can be a big deal, especially when many people are walking around with RFIDs in their passports, pants, and with bluetooth headsets. But, in this particular case, is it a problem? If Google really only did collect the SSIDs of open, public networks that effectively invite anyone to connect to them and thus discover network name and device identifier, is that a violation of privacy, or of the Laws of Identity? I’m having trouble seeing the harm or the questionable act. Once again, these are public/open WiFi networks.”

Let me be clear:  If Google or any other operator only collected the SSIDs of “open, public networks that invite anyone to connect to them” there would be zero problem from the point of view of the Laws of Identity.  They would, in the terminology of Law Four, be collecting “universal identifiers”. 

But when you drive down a street, the vast majority of networks you encounter are NOT public, and are NOT inviting just anyone to connect to them.  The routers emit packets so the designated users of the network can connect to them, not so others can connect to them, hack them, map them or use them for commercial purposes.  If one is to talk about intent, the intent is for private, unidirectional identifiers to be used within a constrained scope.

In other words, as much as I wish I didn't have to do so, I must strongly dispute Ben's assertion that “Once again, these are public/open WiFi networks” and insist that private identifiers are being misappropriated.

In matters of eavesdropping I subscribe to EPIC's argument that proving harm is not essential – it is the eavesdropping itself which is problematic.  However, in my next post I'll talk about harm, and the problems of a vast world-wide system capable of inference based on use of device identifiers.

  

“I just did it because Skyhook did it”

I received a helpful and informed comment by Michael Hanson at Mozilla Labs on the Street View MAC Address issue:

I just wanted to chip in and say that the practice of wardriving to create a SSID/MAC geolocation database is hardly unique to Google.

The practice was invented by Skyhook Wireless], formerly Quarterscope. The iPhone, pre-GPS, integrated the technology to power the Maps application. There was some discussion of how this technology would work back in 2008, but it didn't really break out beyond the community of tech developers. I'm not sure what the connection between Google and Skyhook is today, but I do know that Android can use the Skyhook database.

Your employer recently signed a deal with Navizon, a company that employs crowdsourcing to construct a database of WiFi endpoints.

Anyway – I don't mean to necessarily weigh in on the question of the legality or ethics of this approach, as I'm not quite sure how I feel about it yet myself. The alternative to a decentralized anonymous geolocation system is one based on a) GPS, which requires the generosity of a space-going sovereign to maintain the satellites and has trouble in dense urban areas, or b) the cell towers, which are inefficient and are used to collect our phones’ locations. There's a recent paper by Constandache (et al) at Duke that addresses the question of whether it can be done with just inertial reckoning… but it's a tricky problem.

Thanks for the post.

The scale of the “wardriving” [can you beieve the name?] boggles my mind, and the fact that this has gone on for so long without attracting public attention is a little incredible.  But in spite of the scale, I don't think the argument  that it's OK to do something because other people have already done it will hold much water with regulators or the thinking public  In fact  it all sounds a bit like a teenager trying to avoid his detention because he was “just doing what Johnny did.”

As Michael say, one can argue that there are benefits to drive-by device identity theft.  In fact, one can argue that there would be benefits to appropriating and reselling all kinds of private information and property.  But in most cases we hold ourselves back, and find other, socially acceptable ways of achieving the same benefits.  We should do the same here.

Are these databases decentralized and anonymous?

As hard as I try, I don't see how one can say the databases are decentralized and anonymous.  For starters, they are highly centralized, allowing monetized lookup of any MAC address in the world.  Secondly, they are not anonymous – the databases contain the identity information of our personal devices as well as their exact locations in molecular space.   It is strange to me that personal information can just be “declared to be public” by those who will benefit from that in their businesses.

Do these databases protect our privacy in some way? 

No – they erode it more than before.  Why?

Location information has long been available to our telephone operators, since they use cell-tower triangulation.  This conforms to the Law of Justifiable Parties – they need to know where we are (though not to remember it) to provide us with our phone service. 

But now yet another party has insinuated itself into the mobile location equation: the MAC database operator – be it Google, Skyhook or Navizon. 

If you carry a cell phone that uses one of these databases – and maybe you already do – your phone queries the database for the locations of MAC addresses it detects.  This means means that in additon to your phone company, a database company is constantly being informed about your exact location.   From what Michael says it seems the cell phone vendor might additionally get in the middle of this location reporting – all parties who have no business being part of the location transaction unless you specifically opt to include them.

Exactly what MAC addresses does your phone collect and submit to the database for location analysis?  Clearly, it might be all the MAC addresses detected in its vicinity, including those of other phones and devices…  You would then be revealing not only your own location information, but that of your friends, colleagues, and even of complete strangers who happen to be passing by – even if they have their location features turned off

Having broken into our home device-space to take our network identifiers without our consent, these database operators are thus able to turn themselves into intelligence services that know not only the locations of people who have opted into their system, but of people who have opted out.  I predict that this situation will not be allowed to stand.

Are there any controls on this, on what WiFi sniffing outfits can do with their information, and on how they relate it to other information collected on us, on who they sell it to?

I don't know anything about Navizon or the way it uses crowdsourcing, but I am no happier with the idea that crowds are – probably without their knowledge – eavesdropping on my network to the benefit of some technology outfit.  Do people know how they are being used to scavenge private network identifiers – and potentially even the device identifiers of their friends and colleagues?

Sadly, it seems we might now have a competitive environment in which all the cell phone makers will want to employ these databases.  The question for me is one of whether, as these issues come to the attention of the general public and its representatives, a technology breaking two Laws of Identity will actually survive without major reworking.  My prediction is that it will not. 

Reaping private identifiers is a mistake that, uncorrected,  will haunt us as we move into the age of the smart home and the smart grid.  Sooner or later society will nix it as acceptable behavior.  Technologists will save a lot of trouble if we make our mobile location systems conform with reasonable expectations of privacy and security starting now.