Update to iTunes comes with privacy fibs

A few days ago I reported that from now on, to get into the iPhone App store you must allow Apple to share your phone or tablet device fingerprints and detailed, dynamic location information with anyone it pleases.  No chance to vet the purposes for which your location data is being used.  No way to know who it is going to. 

As incredible as it sounds in 2010, no user control.  Not even  transparency.  Just one thing is for sure.  If privacy isn't dead, Apple is now amongst those trying to bury it alive.

Then today, just when I thought Apple had gone as far as it could go in this particular direction, a new version of iTunes wanted to install itself on my laptop.  What do you know?  It had a new privacy policy too… 

The new iTunes policy was snappier than the iPhone policy – it came to the point – sort of – in the 5th paragraph rather than the 37th page!

5. iTunes Store and other Services.  This software enables access to Apple's iTunes Store which offers downloads of music for sale and other services (collectively and individually, “Services”). Use of the Services requires Internet access and use of certain Services requires you to accept additional terms of service which will be presented to you before you can use such Services.

By using this software in connection with an iTunes Store account, you agree to the latest iTunes Store Terms of Service, which you may access and review from the home page of the iTunes Store.

I shuddered.  Mind bend!  A level of indirection in a privacy policy! 

Imagine:  “Our privacy policy is that you need to read another privacy policy.”  This makes it much more likely that people will figure out what they're getting into, don't you think?  Besides, it is a really novel application of the proposition that all problems of computer science can be solved through a level of indirection!  Bravo!

But then – the coup de grace.  The privacy policy to which Apple redirects you is… are you ready… the same one we came across a few days ago at the App Store!  So once again you need to get to the equivalent of page 37 of 45 to read:

Collection and Use of Non-Personal Information

We also collect non-personal information – data in a form that does not permit direct association with any specific individual. We may collect, use, transfer, and disclose non-personal information for any purpose. The following are some examples of non-personal information that we collect and how we may use it:

  • We may collect information such as occupation, language, zip code, area code, unique device identifier, location, and the time zone where an Apple product is used so that we can better understand customer behavior and improve our products, services, and advertising.

The mind bogggggles.  What does downloading a song have to do with giving away your location???

Some may remember my surprise that the Lords of The iPhone would call its unique device identifier – and its location – “non-personal data”.  Non-personal implies there is no strong relationship to the person who is using it.  I wrote:

The irony here is a bit fantastic.  I was, after all, using an “iPhone”.   I assume Apple’s lawyers are aware there is an ”I” in the word “iPhone”.  We’re not talking here about a piece of shared communal property that might be picked up by anyone in the village.  An iPhone is carried around by its owner.  If a link is established between the owner’s natural identity and the device (as Google’s databases have done), its “unique device identifier” becomes a digital fingerprint for the person using it. 

Anybody who thinks about identity understands that a “personal device” is associated with (even an extension of) the person who uses it.  But most people – including technical people – don't give these matters the slightest thought.  

A parade of tech companies have figured out how to use peoples’ ignorance about digital identity to get away with practices letting them track what we do from morning to night in the physical world.  But of course, they never track people, they only track their personal devices!  Those unruly devices really have a mind of their own – you definitely need central databases to keep tabs on where they're going.

I was therefore really happy to read some of  Google CEO Eric Schmidt’s recent speech to the American Society of News Editors.  Talking about mobility he made a number of statements that begin to explain the ABCs of what mobile devices are about:

Google is making the Android phone, we have the Kindle, of course, and we have the iPad. Each of these form factors with the tablet represent in many ways your future….: they’re personal. They’re personal in a really fundamental way. They know who you are. So imagine that the next version of a news reader will not only know who you are, but it’ll know what you’ve read…and it’ll be more interactive. And it’ll have more video. And it’ll be more real-time. Because of this principle of “now.”

It is good to see Eric sharing the actual truth about personal devices with a group of key influencers.  This stands in stark contrast to the silly fibs about phones and laptops being non-personal that are being handed down in the iTunes Store, the iPhone App Store, and in the “Refresher FAQ” Fantasyland Google created in response to its Street View WiFi shenanigans. 

As the personal phone evolves it will become increasingly obvious  that groups within some of our best tech companies have built businesses based on consciously crafted privacy fibs.  I'm amazed at the short-sightedness involved:  folks, we're talking about a “BP moment”.  History teaches us that “There is no vice that doth so cover a man with shame as to be found false and perfidious.” [Francis Bacon]  And statements that your personal device doesn't identify you and that location is not personal information are precisely “false and perfidious.”

 

What Could Google Do With the Data It's Collected?

Niraj Chokshi has published a piece in The Atlantic where he grapples admirably with the issues related to Google's collection and use of device fingerprints (technically called MAC Addresses).  It is important and encouraging to have journalists like Niraj taking the time to explore these complex issues.  

But I have to say that such an exploration is really hard right now. 

Whether on purpose or by accident, the Google PR machine is still handing out contradictory messages.  In particular, the description in Google's Refresher FAQ titled “How does this location database work?” is currently completely different from (read: the opposite of) what its public relations people are telling journalists like Nitaj.  I think reestablishing credibility around location services requires the messages to be made consistent so they can be verified by data protection authorities.

Here are some excerpts from the piece – annotated with some comments by me.  [Read the whole article here.] 

The Wi-Fi data Google collected in over 30 countries could be more revealing than initially thought…

Google's CEO Eric Schmidt has said the information was hardly useful and that the company had done nothing with it. The search giant has also been ordered (or sought) to destroy the data. According to their own blog post, Google logged three things from wireless networks within range of their vans: snippets of unencrypted data; the names of available wireless networks; and a unique identifier associated with devices like wireless routers. Google blamed the collection on a rogue bit of code that was never removed after it had been inserted by an engineer during testing.

[The statement about rogue code is an example of the PR ambiguity Nitaj and other journalists must deal with.  Google blogs don't actually blame the collection of unique identifiers on rogue code, although they seem crafted to leave people with that impression.  Spokesmen only blame rogue code for the collection of unencrypted data content (e.g. email messages.) – Kim]

Each of the three types of data Google recorded has its uses, but it's that last one, the unique identifier, that could be valuable to a company of Google's scale. That ID is known as the media access control (MAC) address and it is included — unencrypted, by design — in any transfer, blogger Joe Mansfield explains.

Google says it only downloaded unencrypted data packets, which could contain information about the sites users visited. Those packets also include the MAC address of both the sending and receiving devices — the laptop and router, for example.

[Another contradiction: Google PR says it “only” collected unencrypted data packets, but Google's GStumbler report  says its cars did collect and record the MAC addresses from encrypted data frames as well. – Kim]

A company as large as Google could develop profiles of individuals based on their mobile device MAC addresses, argues Mansfield:

Get enough data points over a couple of months or years and the database will certainly contain many repeat detections of mobile MAC addresses at many different locations, with a decent chance of being able to identify a home or work address to go with it.

Now, to be fair, we don't know whether Google actually scrubbed the packets it collected for MAC addresses and the company's statements indicate they did not. [Yet the GStumbler report says ALL MAC addresses were recorded – Kim].  The search giant even said it “cannot identify an individual from the location data Google collects via its Street View cars.”  Add a step, however, and Google could deduce an individual from the location data, argues Avi Bar-Zeev, an employee of Microsoft, a Google competitor.

[Google] could (opposite of cannot) yield your identity if you've used Google's services or otherwise revealed it to them in association with your IP address (which would be the public IP of your router in most cases, visible to web servers during routine queries like HTTP GET). If Google remembered that connection (and why not, if they remember your search history?), they now have your likely home address and identity at the same time. Whether they actually do this or not is unclear to me, since they say they can't do A but surely they could do B if they wanted to.

Theoretically, Google could use the MAC address for a mobile device — an iPod, a laptop, etc. — to build profiles of an individual's activity. (It's unclear whether they did and Google has indicated that they have not.) But there's also value in the MAC addresses of wireless routers.

Once a router has been associated with a real-world location, it becomes useful as a reference point. The Boston company Skyhook Wireless, for example, has long maintained a database of MAC addresses, collected in a (slightly) less-intrusive way. Skyhook is the primary wireless positioning system used by Apple's iPhone and iPod Touch. (See a map of their U.S. coverage here.) When your iPod Touch wants to retrieve the current location, it shares the MAC addresses of nearby routers with Skyhook which pings its database to figure out where you are.

Google Latitude, which lets users share their current location, has at least 3 million active users and works in a similar way. When a user decides to share his location with any Google service on a non-GPS device, he sends all visible MAC addresses in the vicinity to the search giant, according to the company's own description of how its location services works.

[Update: Google's own “refresher FAQ” states that a user of its geo-location services, such as Latitude, sends all MAC addresses “currently visible to the device” to Google, but a spokesman said the service only collects the MAC addresses of routers. That FAQ statment is the basis of the following argument.]

This is disturbing, argues blogger Kim Cameron (also a Microsoft employee), because it could mean the company is getting not only router addresses, but also the MAC addresses of devices such as laptops and iPods. If you are sitting next to a Google Latitude user who shares his location, Google could know the address and location of your device even though you didn't opt in. That could then be compared with all other logged instances of your MAC address to develop a profile of where the device is and has been.

Google denies using the information it collected and, if the company is telling the truth, then only data from unencrypted networks was intercepted anyway, so you have less to worry about if your home wireless network is password-protected. (It's still not totally clear whether only router MAC addresses were collected. Google said it collected the information for devices “like a WiFi router.”) Whether it did or did not collect or use this information isn't clear, but Google, like many of its competitors, has a strong incentive to get this kind of location data.

[Again, and I really do feel for Niraj, the PR leaves the impression that if you have passwords and encryption turned on you have nothing to worry about, but Googles’ GStumbler report says that passwords and encryption did not prevent the collection of the MAC addresses of phones and laptops from homes and businesses. – Kim]

I really tuned in to these contradictory messages when a reader first alerted me to Niraj's article.   It looked like this:

My comments earned their strike-throughs when a Google spokesman assured the Atlantic “the Service only collects the MAC addresses of routers.”  I pointed out that my statement was actually based on Google's own FAQ, and it was their FAQ (“How does this location database work?”) – rather than my comments – that deserved to be corrected.  After verifying that this was true, Niraj agreed to remove the strikethrough.

How can anyone be expected to get this story right given the contradictions in what Google says it has done?

In light of this, I would like to see Google issue a revision to its “Refresher FAQ” that currently reads:

The “list of MAC addresses which are currently visible to the device” would include the addresses of nearby phones and laptops.  Since Google PR has assured Niraj that “the service only collects the MAC addresses of routers”, the right thing to do would be to correct the FAQ so it reads:

  • “The user’s device sends a request to the Google location server with the list of MAC addresses found in Beacon Frames announcing a Network Access Point SSID and excluding the addresses of end user devices like WiFi enabled phones and laptops.”

This would at least reassure us that Google has not delivered software with the ability to track non-subscribers and this could be verified by data protection authorities.  We could then limit our concerns to what we need to do to ensure that no such software is ever deployed in the future.

 

Apple giving out your iPhone fingerprints and location

I went to the Apple App store a few days ago to download a new iPhone application.  I expected that this would be as straightforward as it had been in the past: choose a title, click on pay, and presto – a new application becomes available.

No such luck.  Apple had changed it's privacy policy, and I was taken to the screen at right,  To proceed I had to “read and accept the new Terms and Conditions”.  I pressed OK and up came page 1 of a new 45 page “privacy” policy.

I would assume “normal people” would say “uncle” and “click approve” around page 3.  But in light of what is happening in the industry around location services I kept reading the tiny, unsearchable, unzoomable print.

And there – on page 37 – you come to “the news”.  Apple's new “privacy” policy reveals that if you use Apple products Apple can disclose your device fingerprints and location to whomever it chooses and for whatever purpose:

Collection and Use of Non-Personal Information

We also collect non-personal information – data in a form that does not permit direct association with any specific individual. We may collect, use, transfer, and disclose non-personal information for any purpose. The following are some examples of non-personal information that we collect and how we may use it:

  • We may collect information such as occupation, language, zip code, area code, unique device identifier, location, and the time zone where an Apple product is used so that we can better understand customer behavior and improve our products, services, and advertising.

No “direct association with any specific individual…”

Maintaining that a personal device fingerprint has “no direct association with any specific individual” is unbelievably specious in 2010 – and even more ludicrous than it used to be now that Google and others have collected the information to build giant centralized databases linking phone MAC addresses to house addresses.  And – big surprise – my iPhone, at least, came bundled with Google's location service.

The irony here is a bit fantastic.  I was, after all, using an “iPhone”.  I assume Apple's lawyers are aware there is an “I” in the word “iPhone”.  We're not talking here about a piece of shared communal property that might be picked up by anyone in the village.  An iPhone is carried around by its owner.  If a link is established between the owner's natural identity and the device (as Google's databases have done), its “unique device identifier” becomes a digital fingerprint for the person using it. 

Apple's statements constitute more disappointing doubletalk that is suspiciously well-aligned with the statements in Google's now-infamous WiFi FAQ.  Checking with the “Wayback machine” (which is of course not guaranteed to be accurate or up to date) the last change recorded in Apple's privacy policy seems to have been made in April 2008.  It contained no reference to device identifiers or location services. 

 

Conor changes his mind

Conor Cahill has taken a look at the Gstumbler report.  His conclusion is:

Given this new information I would have to agree that Google has clearly stepped into the arena of doing something that could be detrimental to the user's privacy.

Conor explains that, “the information in the report is quite different than the information that had been published at the time I expressed my opinions on the events at hand.”

He argues:

  1. “We had been led to believe that Google had only captured data on open wireless networks (networks that broadcast their SSIDs and/or were unencrypted). The analysis of the software shows that to be incorrect — Google captured data on every network regardless of the state of openness. So no matter what the user did to try to protect their network, Google captured data that the underlying protocols required to be transmitted in the clear.
  2. “We had been led to believe that Google had only captured data from wireless access points (APs). Again the analysis shows that this was incorrect — Google captured data on any device for which it was able to capture the wireless traffic for (AP or user device). So portable devices that were currently transmitting as the Street View vehicle passed would have their data captured.”

Anyone who knows Conor knows he is a gentlemanly model of how people should behave towards each other in our industry.  I understand his position fully, and respect it.  He says:

[Kim] seems to have a particular fondness for the phrase “wrong,” “completely wrong,” and “wishful thinking” when referring to my comments on the topic.  In my defense, I will say that there was no “wishful thinking” going on in my mind. I was just examining the published information rather than jumping to conclusions — something that I will always advocate. In this case, after examining the published report, it does appear that those who jumped to conclusions happened to be closer to the mark, but I still think they were wrong to jump to those conclusions until the actual facts had been published.

I can't disagree that Google's public relations messages may well have been crafted to leave the impression that their wireless eavesdropping was only directed at network access points.  But if you read them extremely carefully you see they refrain from making any such claims. 

At any rate, Conor needs no defense and I accept his point.  People who took the view that Google couldn't possibly have been doing what I claimed were acting based on the messages the company conveyed.  Sadly, if people of Conor's undisputed technical sophistication are misled by this kind of public relations campaign, the crafting of the information might also be considered suspect.

[More of Conor's post here]

 

Latitude privacy policy doesn't fess up to what Google stores

Never one to mince words, Jackson Shaw asks, “To the Google privacy core – Is it rotten?”  He writes,

“I read Kim’s post and immediately decided to turn off Google’s Latitude service on my phone but, as Kim illustrates, it probably won’t make any difference…

“I took a few minutes to check out Google’s privacy policy around Latitude and found out this much:

“If you choose to ‘Hide your location’, you can hide from your Latitude friends all at once, so they won't be able to see your location. If you hide in Latitude, we don't store your location.

“I’m not worried about hiding in Latitude. I wish I could hide from Google!”

The funny thing here is that Google already stores our residential locations through association with our devices, as indicated by its Gstumbler report, contradicting the Latitude privacy policy.

Jackson then directs us to a Wired article that is tremendously germane to this discussion – partly because of what it says about the current legal environment in the US, and partly because it reflects the very real problem that, in general, neither technologists nor policy makers understand that tapping of device identifiers is as serious as theft of content. 

See:  “Former Prosecutor: Google Wi-Fi Snafu ‘Likely’ Illegal ” – I'll discuss it next.  

The core of the matter at hand

We've explored many of the basic issues of WiFi snooping.  I would now like to go directly to the core of the matter: why do large centralized databases of MAC addresses linked to our street addresses have really serious consequences for peoples’ privacy?  I'd like to approach this through an example:

Consider the case of someone attending a conference at which people are using laptops and phones over a wireless network.  We picture the devices within range of a given attendee in Figure 1:

The green dot represents the WiFi access point through which conference attendees gain access to the Internet.  For now, let's assume this is a permanent WiFi network.  Let's therefore assume its MAC address and location are present within the linking database that also contains our residential MAC to street address mapping.

Now suppose one or more people at the conference have opted into a geo-location service that makes use of the database.  And let's assume that the way this service works is to listen for nearby MAC addresses (all the little circles in the figure) and submit them to the geo-location system for analysis.

The geo-location system will learn that the opted-in user (let's call him Red) is near the given WiFi point, and thus will know Red is at a given location.  If the geo-location system is also capable of searching the web (as one would expect that Google's could), it will also be able to infer that Red is in a given hotel, and that the hotel is hosting a conference C on the date in question. 

If Red stays in the same location for some time, and is surrounded by a number of other people who are in the same location (discernable because their MAC addresses continue to be near by), the smart service will be able to infer that Red is attending conference C being held in the hotel. 

So far, there's nothing wrong with this, since Red has opted in to the geo-location service, and presumably been told that's how it works.

However, note that the geo-location system also learns about the MAC addresses of all the attendees within range who have NOT opted into the system (Green).  And if they remain within range over time, it can also deduce that they too are present at conference C.  Further, it can look up their MAC addresses in the database to discover their street addresses.  This in turn can be used to make many inferences about who the attendees at the conference are, since a lot of information is keyed to their street addresses.  That can itself become further profile information.

Opting out doesn't help

The problem here is this:  The geo-location system is perfectly capable of tracking your location and associating it with your home street address whether you opt in or not.  Home address is a key to many aspects of your identity.  Presto – you have linked many aspects of your identity to your location, and this becomes intellectual property that the geo-location can service sell and benefit from in a myriad of ways.

Is this the way any particular geo-location services would actually work?  I have no idea.  But that's not the point.  The point is that this is the capability one enables by building the giant central database of laptop and phone MAC addresses linked to street addresses.

Commercial interest will naturally tend towards maximal use of these capabilities and the information at hand. 

That is why we need to fully understand the implications of wirelesstapping on a massive scale and figure out if and where we want to draw the line.  How does the collection of MAC addresses using WiFi trucks relate to the regulations involving data collection, proportionality and consent?  Are there limits on the usage of this data? 

One thing for sure.  Breaking the Fourth Law, and turning a unidirectional identifier into a universal identifier is like the story of the Sorcerer's Apprentice.  All the brooms have started dancing.  I wonder if Mickey will get out of this one?

 

“I just did it because Skyhook did it”

I received a helpful and informed comment by Michael Hanson at Mozilla Labs on the Street View MAC Address issue:

I just wanted to chip in and say that the practice of wardriving to create a SSID/MAC geolocation database is hardly unique to Google.

The practice was invented by Skyhook Wireless], formerly Quarterscope. The iPhone, pre-GPS, integrated the technology to power the Maps application. There was some discussion of how this technology would work back in 2008, but it didn't really break out beyond the community of tech developers. I'm not sure what the connection between Google and Skyhook is today, but I do know that Android can use the Skyhook database.

Your employer recently signed a deal with Navizon, a company that employs crowdsourcing to construct a database of WiFi endpoints.

Anyway – I don't mean to necessarily weigh in on the question of the legality or ethics of this approach, as I'm not quite sure how I feel about it yet myself. The alternative to a decentralized anonymous geolocation system is one based on a) GPS, which requires the generosity of a space-going sovereign to maintain the satellites and has trouble in dense urban areas, or b) the cell towers, which are inefficient and are used to collect our phones’ locations. There's a recent paper by Constandache (et al) at Duke that addresses the question of whether it can be done with just inertial reckoning… but it's a tricky problem.

Thanks for the post.

The scale of the “wardriving” [can you beieve the name?] boggles my mind, and the fact that this has gone on for so long without attracting public attention is a little incredible.  But in spite of the scale, I don't think the argument  that it's OK to do something because other people have already done it will hold much water with regulators or the thinking public  In fact  it all sounds a bit like a teenager trying to avoid his detention because he was “just doing what Johnny did.”

As Michael say, one can argue that there are benefits to drive-by device identity theft.  In fact, one can argue that there would be benefits to appropriating and reselling all kinds of private information and property.  But in most cases we hold ourselves back, and find other, socially acceptable ways of achieving the same benefits.  We should do the same here.

Are these databases decentralized and anonymous?

As hard as I try, I don't see how one can say the databases are decentralized and anonymous.  For starters, they are highly centralized, allowing monetized lookup of any MAC address in the world.  Secondly, they are not anonymous – the databases contain the identity information of our personal devices as well as their exact locations in molecular space.   It is strange to me that personal information can just be “declared to be public” by those who will benefit from that in their businesses.

Do these databases protect our privacy in some way? 

No – they erode it more than before.  Why?

Location information has long been available to our telephone operators, since they use cell-tower triangulation.  This conforms to the Law of Justifiable Parties – they need to know where we are (though not to remember it) to provide us with our phone service. 

But now yet another party has insinuated itself into the mobile location equation: the MAC database operator – be it Google, Skyhook or Navizon. 

If you carry a cell phone that uses one of these databases – and maybe you already do – your phone queries the database for the locations of MAC addresses it detects.  This means means that in additon to your phone company, a database company is constantly being informed about your exact location.   From what Michael says it seems the cell phone vendor might additionally get in the middle of this location reporting – all parties who have no business being part of the location transaction unless you specifically opt to include them.

Exactly what MAC addresses does your phone collect and submit to the database for location analysis?  Clearly, it might be all the MAC addresses detected in its vicinity, including those of other phones and devices…  You would then be revealing not only your own location information, but that of your friends, colleagues, and even of complete strangers who happen to be passing by – even if they have their location features turned off

Having broken into our home device-space to take our network identifiers without our consent, these database operators are thus able to turn themselves into intelligence services that know not only the locations of people who have opted into their system, but of people who have opted out.  I predict that this situation will not be allowed to stand.

Are there any controls on this, on what WiFi sniffing outfits can do with their information, and on how they relate it to other information collected on us, on who they sell it to?

I don't know anything about Navizon or the way it uses crowdsourcing, but I am no happier with the idea that crowds are – probably without their knowledge – eavesdropping on my network to the benefit of some technology outfit.  Do people know how they are being used to scavenge private network identifiers – and potentially even the device identifiers of their friends and colleagues?

Sadly, it seems we might now have a competitive environment in which all the cell phone makers will want to employ these databases.  The question for me is one of whether, as these issues come to the attention of the general public and its representatives, a technology breaking two Laws of Identity will actually survive without major reworking.  My prediction is that it will not. 

Reaping private identifiers is a mistake that, uncorrected,  will haunt us as we move into the age of the smart home and the smart grid.  Sooner or later society will nix it as acceptable behavior.  Technologists will save a lot of trouble if we make our mobile location systems conform with reasonable expectations of privacy and security starting now.

 

Misuse of network identifiers was done on purpose

Ben Adida has a list of achievements as long as my arm – many of which are related to privacy and security.  His latest post concerns what he calls, “privacy advocacy theater… a problem that my friends and colleagues are guilty of, and I’m sure I’m guilty of it at times, too.  Privacy Advocacy Theater is the act of extreme criticism for an accidental data breach rather than a systemic privacy design flaw. Example: if you’re up in arms over the Google Street View privacy “fiasco” of the last few days, you’re guilty of Privacy Advocacy Theater.”

Ben then proceeds take me to task for this piece:

I also have to be harsh with people I respect deeply, like Kim Cameron who says that Google broke two of his very nicely crafted Laws of Identity. Come on, Kim, this was accidental data collection by code that the Google Street View folks didn’t even realize was running. (I’m giving them the benefit of the doubt. If they are lying, that’s a different problem, but no one’s claiming they’re lying, as far as I know.) The Laws of Identity apply predominantly to the systems that individuals choose to use to manage their data. If anyone is breaking the Laws of Identity, it’s the WiFi access points that don’t actively nudge users towards encrypting their WiFi network.

But let's hold on a minute.  My argument wasn't about the payload data that was collected accidently.  It was about the device identification data that was collected on purpose.  As Google's Alan Eustace put it: 

We said that while Google did collect publicly broadcast SSID information (the WiFi network name) and MAC addresses (the unique number given to a device like a WiFi router) using Street View cars, we did not collect payload data (information sent over the network). But it’s now clear that we have been mistakenly collecting samples of payload data…

Device identifiers were collected on purpose

SSID and MAC addresses are the identifiers of your devices.  They are transmitted as part of the WiFi traffic just like the payload data is.  And they are not “publically broadcast” any more than the payload data is. 

Yet Google consciously decided to abscond with, tabulate and monetize the identities of our personal, business and home devices.  The identifiers are persistent and last for the lifetime of the devices.  Their collection, cataloging and use is, in my view, more dangerous than the payload data that was collected. Why? The payload data, though deeply personal, is transient and represents a single instant.  The identifiers are persistent, and the Street View WiFi plan was to use them for years.  

Let's be clear:  Identity has as much to do with devices, software, services and organizations as with individuals.  And equally important, identity is about the relationships between these things.  In fact identity can only be adequately expressed through the relationships (some call it context).

When Google says, “MAC addresses are a simple hardware ID assigned by the manufacturer” and “We cannot identify an individual” using those “simple hardware IDs”,  it sounds like the devices found in your home and briefcase and pocket have nothing to do with you as a flesh and blood person.  Give me a break!  It reminds me of an old skit by “Beyond the Fringe” where a police inspector points out that “Once you have identified the criminal's face, the criminal's body is likely to be close by…”  Our identities and the identities of our devices are related, and understanding this relationship is essential to getting identity and privacy right.

One great thing about blogging is you find out when you haven't been clear enough.  I hope I'm making progress in expressing the real issues here:  the collection of device identifiers was purposeful, and this represents precisely the kind of “systemic privacy design flaw” to which Ben refers.  

It bothers me that this disturbing systemic privacy design flaw – for which there has been no apology – is being obscured through the widely publicized apology for a completely separate and apparently accidental sin.  

In contemporary networks, the hardware ID of the device is NOT intended to be a “universal identifier”.  It is intended to be a “unidirectional identifier” (see The Fourth Law) employed purely to map between a physical machine and a transient, local logical address.  Many people who read this blog understand why networking works this way.  In Street View WiFi, Google was consciously misusing this unidirectional identifier as a universal identifier, and misappropriating it by insinuating itself, as eavesdropper, into our network conversations.

Ben says, “The Laws of Identity apply predominantly to the systems that individuals choose to use to manage their data.”  But I hope he rethinks this in the context of what identity really is, its use in devices and systems, and the fact that human, device and service identities are tied together in what one day should be a trustworthy system.  I also hope to see Google apologize for its misuse of our device identities, and assure us they will not be used in any of their systems.

Finally, despite Ben's need to rethink this matter,  I do love his blog, and strongly agree with his comments on  Opera Mini, discussed in the same piece.

 

EPIC on Google WiFi eavesdropping

Readers have drawn our attention to a recent letter from EPIC's Marc Rotenberg to  FCC Chairman, Julius Genachowski.

In the detailed letter, Marc Rotenberg specifically calls attention to the mapping of private device identifiers, saying, “We understand that Google also downloaded and recorded a unique device ID, the MAC address, for wireless access devices as well as the SSID assigned by users.”

He argues:

The capture of Wi-Fi data in this manner by Google Street View could easily constitute a violation of Title III of the Omnibus Crime Control and Safe Streets Act of 1968, also known as the Wiretap Act, as amended by the Electronic Communications Privacy Act (ECPA) of 1986 to include electronic communications. Courts most oten define “interception” under ECPA as “acquisitions contemporaneous with transmission.” The Wiretap Act provides for civil liability and criminal penalties against any person who “intentionally intercepts, endeavors to intercept, or procures any other person to intercept or endeavor to intercept any… electronic communication [except as provided in the statute].”

The Wiretap Act imposes identical liability on any person who “intentionally discloses … to any other person the contents of any… electronic communication, knowing or having reason to know that the information was obtained through the interception of a[n] … electronic communication in violation of
this subsection,” or “intentionally uses … the contents of any… electronic communication, knowing or having reason to know that the information was obtained through the interception of a[n]… electronic communication in violation of this subsection.”

Full text (including many footnotes elided in the quote above) is available in pdf and Word format.  See also The Hill's technology blog.

The Laws of Identity smack Google

Alan Eustace, Google's Senior VP of Engineering & Research, blogged recently about Google's collection of Wi-Fi data using its Street View cars:

The engineering team at Google works hard to earn your trust—and we are acutely aware that we failed badly here. We are profoundly sorry for this error and are determined to learn all the lessons we can from our mistake.  

I think the idea of learning all the lessons he can from Google's mistake is a really good one, and I accept that Alan really is sorry.  But what constituted the mistake?

Last month Google was good enough to provide us with a “refresher FAQ” that dealt with the subject in a particularly specious way, even though it was remarkable in its condescension:

“What do you mean when you talk about WiFi network information?
“WiFi networks broadcast information that identifies the network and how that network operates. That includes SSID data (i.e. the network name) and MAC address (a unique number given to a device like a WiFi router).

“Networks also send information to other computers that are using the network, called payload data, but Google does not collect or store payload data.*

“But doesn’t this information identify people?
“MAC addresses are a simple hardware ID assigned by the manufacturer. And SSIDs are often just the name of the router manufacturer or ISP with numbers and letters added, though some people do also personalize them.

“However, we do not collect any information about householders, we cannot identify an individual from the location data Google collects via its Street View cars.

“Is it, as the German DPA states, illegal to collect WiFi network information?
“We do not believe it is illegal–this is all publicly broadcast information which is accessible to anyone with a WiFi-enabled device…

Let's start with the last point. Is information that can be collected using a WiFi device actually being “broadcast”?  Or is it being transmitted for a specific purpose and private use?  If everything is deemed to be “broadcast” simply by virtue of being a signal that can be received, then surely payload data – people's surfing behavior, emails and chat – is also being “broadcast”.  Once the notion of “broadcast” is accepted, the FAQ implies there can be no possible objection to collecting it.

But Alan's recent post says, “it’s now clear that we have been mistakenly collecting samples of payload data from open (i.e. non-password-protected) WiFi networks.”  He adds, “We want to delete this data as soon as possible…”  What is the mistake?  Does Alan mean Google has now accepted that WiFi information is not by definition being “broadcast” for its use?  Or does Alan see the mistake as being the fact they created a PR disaster?  I think “learning everything we can” means learning that the initial premises of the Street View WiFi system were wrong (and the behavior perhaps even illegal) because the system collected WiFi information that was intended to be used for private purposes and not intended to include Google.  

The FAQ claims – and this is disturbing – that the information collected about network identifiers “doesn't identify people”.  The fact is that it identifies devices that are closely associated with people – including their personal computers and phones.  MAC addresses are persistent, remaining constant over the lifetime of the device.  They are identifiers that are extremely reliable in establishing identity by virtue of being in peoples’ pockets or briefcases.

As a result, Google breaks two Laws of Identity in one go with their Street View boondoggle, 

Google breaks Law 3, the Law of  Justifiable Parties.

Digital identity systems must limit disclosure of identifying information to parties having a necessary and justifiable place in a given identity relationship

Google is not part of the transactions between my network devices and is not justified in intervening or recording the details of their use and relationship. 

Google also breaks Law 4, Directed Identity:

A universal identity metasystem must support both “omnidirectional” identifiers for use by public entities and “unidirectional” identifiers for private entities, thus facilitating discovery while preventing unnecessary release of correlation handles.

My network devices are private entities intended for use in the contexts for which I authorize them.  My home network is a part of my home, and Google (or any other company) has not been invited to employ that network for its own purposes.  The identifiers in use there are contextually specific, not public, and not intended to be shared across all contexts.  They are more private than the IP addresses used in TCP/IP, since they are not shared across end-points in different networks.  The same applies to SSIDs.

One can stand in the street, point a directional microphone at a window and record the conversations inside.  This doesn't make them public or give anyone the right to use the conversations for commercial purposes.  The same applies to recording the information we exchange using digital media – including our identifiers, SSIDs and MAC addresses.  It is particularly disingenuous to argue that because information is not encrypted it doesn't belong to anyone and there are no rights associated with it.  If lack of encryption meant information is fair game a lot of Google's own intellectual property would be up for grabs,

Google's justification for collecting MAC addresses was that if a stranger walked down your street, the MAC addresses of your computers and routers could be used provide his systems (or Googles’?)  with information on where he was.  The idea that Google would, without our consent, employ our home networks for its own commercial purposes betrays a problem of ethics and a lack of control.  Let's hope this is what Alan means when he says,

“Given the concerns raised, we have decided that it’s best to stop our Street View cars collecting WiFi network data entirely.”

I know there are many people inside Google who will recognize that these problems represent more than a “mistake” – there is clearly the need for a much deeper understanding of identity and privacy within the engineering and business staff.   I hope this will be the outcome.  The Laws of Identity are a harsh teacher, and it's sad to see the Street View technology sullied by privacy catastrophes.

Meanwhile, there is one more lesson for the rest of us.  We tend to be cavalier in pooh poohing the idea that commercial interests would actually abuse our networks and digital privacy in fundamental ways.  This episode demonstrates how naive that is.  We need to strengthen the networking infrastructure, and protect it from misuse by commercial interests as well as criminals.  We need clear legislation that serves as a disincentive to commercial interests contemplating privacy-invasive use of technology.  And on a technical note, we need to fix the problems of static MAC addresses precisely because they are strong personal identifiers that ultimately will be used to target individuals physically as criminals begin to understand their possible uses.