{"id":1139,"date":"2010-06-28T15:55:47","date_gmt":"2010-06-28T23:55:47","guid":{"rendered":"\/?p=1139"},"modified":"2010-06-28T18:34:06","modified_gmt":"2010-06-29T02:34:06","slug":"what-could-google-do-with-the-data-its-collected","status":"publish","type":"post","link":"https:\/\/www.identityblog.com\/?p=1139","title":{"rendered":"What Could Google Do With the Data It&#39;s Collected?"},"content":{"rendered":"<p><a href=\"http:\/\/www.theatlantic.com\/niraj-chokshi\/\">Niraj Chokshi<\/a> has published\u00a0<a href=\"http:\/\/www.theatlantic.com\/science\/archive\/2010\/06\/what-could-google-do-with-the-data-its-collected\/58396\/\">a piece <\/a>in The Atlantic where he grapples admirably with the issues related to Google&#39;s collection and use of device fingerprints (technically called MAC Addresses).\u00a0\u00a0It is\u00a0important and encouraging to have\u00a0journalists like\u00a0Niraj\u00a0taking the time to\u00a0explore\u00a0these\u00a0complex issues.\u00a0\u00a0<\/p>\n<p>But I have to say that such an exploration\u00a0is<strong> really hard<\/strong> right now.\u00a0<\/p>\n<p>Whether on purpose or by accident, the Google PR machine is still handing out contradictory messages.\u00a0 In particular,\u00a0the description\u00a0in Google&#39;s <a href=\"http:\/\/googlepolicyeurope.blogspot.com\/2010\/04\/data-collected-by-google-cars.html\">Refresher FAQ<\/a>\u00a0titled &#8220;How does this location database work?&#8221;\u00a0is currently completely\u00a0different from (read: the opposite of)\u00a0what\u00a0its public relations\u00a0people are telling journalists like Nitaj.\u00a0 I think reestablishing credibility around location services requires the messages to be made consistent\u00a0so they can\u00a0be verified by data protection authorities.<\/p>\n<p>Here are some excerpts from the piece\u00a0&#8211;\u00a0annotated with\u00a0some comments by me.\u00a0 [Read the whole article<a href=\"http:\/\/www.theatlantic.com\/science\/archive\/2010\/06\/what-could-google-do-with-the-data-its-collected\/58396\/\"> here<\/a>.]\u00a0<\/p>\n<p style=\"PADDING-LEFT: 30px\">The Wi-Fi data Google collected in over 30 countries could be more revealing than initially thought&#8230;<\/p>\n<p style=\"PADDING-LEFT: 30px\">Google&#39;s CEO Eric Schmidt has said the information was <a href=\"http:\/\/news.bbc.co.uk\/2\/hi\/technology\/10122339.stm\"><span style=\"color: #00598c;\">hardly useful<\/span><\/a> and that the company had done nothing with it. The search giant has also been <a href=\"http:\/\/www.guardian.co.uk\/technology\/2010\/may\/18\/google-destroy-wi-fi-networks\"><span style=\"color: #00598c;\">ordered<\/span><\/a> (or <a href=\"http:\/\/www.informationweek.com\/news\/mobility\/business\/showArticle.jhtml?articleID=225400209\"><span style=\"color: #00598c;\">sought<\/span><\/a>) to destroy the data. According to their <a href=\"http:\/\/googleblog.blogspot.com\/2010\/05\/wifi-data-collection-update.html\"><span style=\"color: #00598c;\">own blog post<\/span><\/a>, Google logged three things from wireless networks within range of their vans: snippets of unencrypted data; the names of available wireless networks; and a unique identifier associated with devices like wireless routers. Google blamed the collection on a rogue bit of code that was never removed after it had been inserted by an engineer during testing.<\/p>\n<p style=\"PADDING-LEFT: 30px\"><em>[The statement about rogue code is\u00a0an example of the PR ambiguity Nitaj and other journalists must deal with.\u00a0 Google\u00a0blogs don&#39;t\u00a0actually\u00a0blame the collection of\u00a0unique identifiers\u00a0on rogue code,\u00a0although\u00a0they seem crafted to leave people with\u00a0that impression.\u00a0 Spokesmen\u00a0only blame rogue code for the collection of unencrypted data content (e.g. email messages.) &#8211; Kim]<\/em><\/p>\n<p style=\"PADDING-LEFT: 30px\">Each of the three types of data Google recorded has its uses, but it&#39;s that last one, the unique identifier, that could be valuable to a company of Google&#39;s scale. That ID is known as the\u00a0media access control (MAC) address and it is included &#8212; unencrypted, by design &#8212; in any transfer, blogger <a href=\"http:\/\/helvick.blogspot.com\/2010\/06\/so-how-much-does-mac-address-tell-you.html\"><span style=\"color: #00598c;\">Joe Mansfield explains<\/span><\/a>.<\/p>\n<p style=\"PADDING-LEFT: 30px\">Google says it only downloaded unencrypted data packets, which could contain information about the sites users visited. Those packets also include the MAC address of both the sending and receiving devices &#8212; the laptop and router, for example.<\/p>\n<p style=\"PADDING-LEFT: 30px\">[Another <em>contradiction: Google\u00a0PR says it &#8220;only&#8221; collected unencrypted data packets,\u00a0but\u00a0Google&#39;s GStumbler report\u00a0 says\u00a0its cars\u00a0did collect and record the MAC addresses from encrypted<\/em> <em>data frames<\/em> <em>as well. &#8211; Kim<\/em>]<\/p>\n<p style=\"PADDING-LEFT: 30px\">A company as large as Google could develop profiles of individuals based on their mobile device MAC addresses, argues Mansfield:<\/p>\n<p style=\"PADDING-LEFT: 60px\">Get enough data points over a couple of months or years and the database will certainly contain many repeat detections of mobile MAC addresses at many different locations, with a decent chance of being able to identify a home or work address to go with it.<\/p>\n<p style=\"PADDING-LEFT: 30px\">Now, to be fair, we don&#39;t know whether Google actually scrubbed the packets it collected for MAC addresses and the company&#39;s statements indicate they did not. [<em>Yet the GStumbler report says ALL MAC addresses were recorded &#8211; Kim<\/em>].\u00a0 The search giant even said it &#8220;cannot identify an individual from the location data Google collects via its Street View cars.&#8221;\u00a0 Add a step, however, and Google could deduce an individual from the location data, <a href=\"http:\/\/www.realityprime.com\/articles\/is-google-recording-your-routers-mac-address-when-they-drive-by\"><span style=\"color: #00598c;\">argues<\/span><\/a> Avi Bar-Zeev, an employee of Microsoft, a Google competitor.<\/p>\n<p style=\"PADDING-LEFT: 60px\">[Google] could (opposite of cannot) yield your identity if you&#39;ve used Google&#39;s services or otherwise revealed it to them in association with your IP address (which would be the public IP of your router in most cases, visible to web servers during routine queries like HTTP GET). If Google remembered that connection (and why not, if they remember your search history?), they now have your likely home address and identity at the same time. Whether they actually do this or not is unclear to me, since they say they can&#39;t do A but surely they could do B if they wanted to.<\/p>\n<p style=\"PADDING-LEFT: 30px\">Theoretically, Google could use the MAC address for a mobile device &#8212; an iPod, a laptop, etc. &#8212; to build profiles of an individual&#39;s activity. (It&#39;s unclear whether they did and Google has indicated that they have not.) But there&#39;s also value in the MAC addresses of wireless routers.<\/p>\n<p style=\"PADDING-LEFT: 30px\">Once a router has been associated with a real-world location, it becomes useful as a reference point. The Boston company <a href=\"http:\/\/www.skyhookwireless.com\/\" class=\"broken_link\"><span style=\"color: #00598c;\">Skyhook Wireless<\/span><\/a>, for example, has long maintained a database of MAC addresses, collected in a (slightly) <a href=\"http:\/\/www.boston.com\/business\/technology\/articles\/2010\/06\/19\/coakley_presses_google_for_details_on_data_collected\/\" class=\"broken_link\"><span style=\"color: #00598c;\">less-intrusive way<\/span><\/a>. Skyhook is the primary wireless positioning system used by Apple&#39;s iPhone and iPod Touch. (See a map of their U.S. coverage <a href=\"http:\/\/www.skyhookwireless.com\/howitworks\/coverage.php\" class=\"broken_link\"><span style=\"color: #00598c;\">here<\/span><\/a>.) When your iPod Touch wants to retrieve the current location, it shares the MAC addresses of nearby routers with Skyhook which pings its database to figure out where you are.<\/p>\n<p style=\"PADDING-LEFT: 30px\"><a href=\"http:\/\/www.google.com\/latitude\/intro.html\" class=\"broken_link\"><span style=\"color: #00598c;\">Google Latitude<\/span><\/a>, which lets users share their current location, has at least <a href=\"http:\/\/www.fiercemobilecontent.com\/story\/google-latitude-tops-3-million-active-users-check-ins-next\/2010-05-07\" class=\"broken_link\"><span style=\"color: #00598c;\">3 million active users<\/span><\/a> and works in a similar way. When a user decides to share his location with any Google service on a non-GPS device, he sends all visible MAC addresses in the vicinity to the search giant, according to the company&#39;s <a href=\"http:\/\/googlepolicyeurope.blogspot.com\/2010\/04\/data-collected-by-google-cars.html\"><span style=\"color: #00598c;\">own description<\/span><\/a> of how its location services works.<\/p>\n<p style=\"PADDING-LEFT: 30px\">[Update: Google&#39;s own &#8220;<a href=\"http:\/\/googlepolicyeurope.blogspot.com\/2010\/04\/data-collected-by-google-cars.html\"><span style=\"color: #00598c;\">refresher FAQ<\/span><\/a>&#8221; states that a user of its geo-location services, such as Latitude, sends all MAC addresses &#8220;currently visible to the device&#8221; to Google, but a spokesman said the service only collects the MAC addresses of routers. That FAQ statment is the basis of the following argument.]<\/p>\n<p style=\"PADDING-LEFT: 60px\">This is disturbing, <a href=\"\/?p=1133\"><span style=\"color: #00598c;\">argues<\/span><\/a> blogger Kim Cameron (also a Microsoft employee), because it could mean the company is getting not only router addresses, but also the MAC addresses of devices such as laptops and iPods. If you are sitting next to a Google Latitude user who shares his location, Google could know the address and location of your device even though you didn&#39;t opt in. That could then be compared with all other logged instances of your MAC address to develop a profile of where the device is and has been.<\/p>\n<p style=\"PADDING-LEFT: 30px\">Google denies using the information it collected and, if the company is telling the truth, then only data from unencrypted networks was intercepted anyway, so you have less to worry about if your home wireless network is password-protected. (It&#39;s still not totally clear whether only router MAC addresses were collected. Google said it collected the information for devices &#8220;like a WiFi router.&#8221;) Whether it did or did not collect or use this information isn&#39;t clear, but Google, like many of its competitors, has a strong incentive to get this kind of location data.<\/p>\n<p style=\"PADDING-LEFT: 30px\">[<em>Again, and I really do feel for Niraj, the PR leaves the\u00a0impression that if you have passwords and encryption turned on you have nothing to worry about,\u00a0but Googles&#8217; GStumbler report says that passwords and encryption did not prevent the collection of the MAC addresses of phones and laptops from homes and businesses. &#8211; Kim]<\/em><\/p>\n<p>I really tuned in to these contradictory messages when\u00a0a reader first alerted me to\u00a0Niraj&#39;s article.\u00a0\u00a0\u00a0It looked like this:<\/p>\n<p><img style=\"margin-left: 30px; margin-right: 30px;\" src=\"\/wp-content\/images\/2010\/06\/kimStrikethrough.gif\" alt=\"\" \/><\/p>\n<p>My comments\u00a0earned their strike-throughs\u00a0when a Google spokesman\u00a0assured the Atlantic\u00a0&#8220;the Service only collects the MAC addresses of routers.&#8221;\u00a0 I pointed out that my statement was\u00a0actually based on Google&#39;s own FAQ, and it was their FAQ (&#8220;How does this location database work?&#8221;) &#8211; rather than my comments &#8211;\u00a0that deserved to be corrected.\u00a0\u00a0After verifying that this was true, Niraj\u00a0agreed to remove the strikethrough.<\/p>\n<p>How can anyone be expected to get this story right given the contradictions in what Google says it has done?<\/p>\n<p>In light of this, I would like to see Google issue a revision to its &#8220;<a href=\"http:\/\/googlepolicyeurope.blogspot.com\/2010\/04\/data-collected-by-google-cars.html\">Refresher FAQ<\/a>&#8221; that currently reads:<\/p>\n<p><img style=\"margin-left: 30px; margin-right: 30px;\" src=\"\/wp-content\/images\/2010\/06\/HowDoesItWork.png\" alt=\"\" \/><\/p>\n<p>The &#8220;list of MAC addresses which are currently visible to the device&#8221; would include the addresses of nearby phones and laptops.\u00a0 Since Google PR has\u00a0assured Niraj that &#8220;the service only collects the MAC addresses of routers&#8221;,\u00a0the right thing to do would be to correct the FAQ so it reads:<\/p>\n<ul>\n<li>\u201cThe user\u2019s device sends a request to the Google location server <em><span style=\"color: #6c6c6d;\">with <\/span><\/em>the list of MAC addresses found in Beacon Frames announcing a Network Access Point SSID and <em>excluding<\/em> the addresses of end user devices like WiFi enabled phones and laptops.\u201d<\/li>\n<\/ul>\n<p>This would at least reassure us that Google\u00a0has not delivered software with the ability to track non-subscribers and this could be verified by data protection authorities.\u00a0 We could then limit our concerns to what we need to do to ensure that no such software is ever deployed in the future.<\/p>\n<p>\u00a0<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Google should change its FAQs about WiFi data collection to line up with what its PR people are telling journalists.<\/p>\n","protected":false},"author":68,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[12,71,47,40,11,77],"tags":[],"_links":{"self":[{"href":"https:\/\/www.identityblog.com\/index.php?rest_route=\/wp\/v2\/posts\/1139"}],"collection":[{"href":"https:\/\/www.identityblog.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.identityblog.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.identityblog.com\/index.php?rest_route=\/wp\/v2\/users\/68"}],"replies":[{"embeddable":true,"href":"https:\/\/www.identityblog.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1139"}],"version-history":[{"count":0,"href":"https:\/\/www.identityblog.com\/index.php?rest_route=\/wp\/v2\/posts\/1139\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.identityblog.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1139"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.identityblog.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1139"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.identityblog.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1139"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}