Minimal Disclosure – Page 2 – Kim Cameron's Identity Weblog

Remembering Andreas Pfitzmann

Andreas Pfitzmann, head of the Privacy and Data Security Research group at Technische Universität Dresden, has died. For more than 25 years he worked on privacy and multilateral security issues. As Caspar Bowden puts it, “Andreas was the eminence grise of serious PET research in Europe, an extraordinarily decent person, and massively influential in the public policy of privacy technology in Germany and Europe.”

Those not familiar with his work should definitely read and use A terminology for talking about privacy by data minimization – a great contribution that gives us clearly defined concepts through which scientific understanding of privacy and multilateral security can move forward.

The obituary posted by Germany's Chaos Computer Club reveals his impact on a community that extended far beyond the walls of the university:

The sudden and unexpected death of Professor Andreas Pfitzmann on 23rd September 2010 leaves a huge gap in the lives of all who knew him. Through both his work and approach, Prof. Pfitzmann set measurably high standards. He was one of a small group of computer scientists who always clearly put forward his soundly based and independent opinion. In his endeavours to foster cross-discipline interaction, he proved instrumental in shaping both technical and political discourses on anonymity and privacy issues in Germany – thus ensuring him a well-deserved international reputation. He always managed to cross the boundaries of his discipline and make the impact of technology comprehensible. His contributions to research in this regard remain eloquent and courageous, and his insistence on even voicing inconvenient truths means he will remain a role model for us all.

In his passing we recognise and mourn the loss of an outstanding scientist, unique in his stance as a defender of people’s basic rights of anonymity and the administration of information pertaining to themselves – both of which are basic prerequisites for a thriving democracy. None of us will ever forget his rousing lectures and speeches, or the ways he found to nurture experimental, enquiring thought amongst an audience.

In Andreas Pfitzmann, too many of our members have lost a dear friend and long-term inspirer. Our thoughts are firmly with his family, to whom we extend our deepest and most profound condolences.

I too will miss both Andreas Pfitzmann and the great clarity he brought to any conversation he participated in.

U-Prove honored by International Association of Privacy Professionals

There was great news this week about the growing support for U-Prove Minimal Disclosure technology: it received the top award in the technology innovation category from the International Association of Privacy Professionals – the world's largest association of privacy professionals.

BALTIMORE — September 30, 2010 — Winners of the eighth annual HP-International Association of Privacy Professionals (IAPP) Privacy Innovation Awards were recognized today at the IAPP Privacy Dinner, held in conjunction with the IAPP Privacy Academy 2010. The honorees include Symcor, Inc., Minnesota Privacy Consultants, and Microsoft Corporation.

The annual awards recognize exceptional integration of privacy and are judged from a broad field of entries. This year’s winners were selected by a panel of private and public sector privacy experts including Allen Brandt, CIPP, Corporate Counsel, Chief Privacy Official, Graduate Management Admission Council; Joanne McNabb, CIPP, CIPP/G, Chief, California Office of Privacy Protection; Susan Smith, CIPP, Americas Privacy Officer, Hewlett-Packard Company; and Florian Thoma, Chief Data Protection Officer, Siemens AG.

“On behalf of more than 7,000 privacy professionals across 50 countries, we applaud this year’s HP-IAPP Privacy Innovation Award winners,” said IAPP Executive Director Trevor Hughes. “At a time when privacy is driving significant conversation and headlines, this year’s results show how protecting privacy and assuring organizational success go hand-in-hand.”

“HP is pleased to sponsor an award that advances privacy worldwide,” said Hewlett Packard Company Americas Privacy Officer Susan Smith.

In the Large Organization category (more than 5,000 employees), Symcor, Inc. won for its “A-integrity Process,” which is designed to manage and protect sensitive financial information that is ultimately presented to customers in the form of client statements. As the largest transactional printer in Canada, Symcor provides statement-to-payment services for some of Canada’s major financial, telecommunications, insurance, utility and payroll institutions. A-integrity established a new standard in data protection with an industry-leading error rate of less than one per million statements produced. Symcor has been improving on this rate each year. A robust privacy incident management process was also developed to standardize error identification and resolution. Symcor’s dedicated Privacy Office provides overall governance to the process and has instilled a deep culture of privacy awareness throughout the organization.

The winner in the Small Organization category (fewer than 5,000 employees), is Minnesota Privacy Consultants (MPC). MPC helps multinational corporations and government agencies operationalize their governance of personal data. The organization won for its Privacy Maturity Model (PMM), a benchmarking tool that evaluates privacy program maturity and effectiveness. Using the Generally Accepted Privacy Principles (GAPP) framework as the basis but recognizing that the GAPP does not provide for degrees of compliance and maturity of a privacy program, MPC cross-referenced the 73 subcomponents of the GAPP framework against the six “maturity levels” of the Capability Maturity Model (CMM) developed by Carnegie Mellon University. From this, the Privacy Maturity Model (PMM) was developed to define specific criteria and weighting to various control areas based on prevailing statistics in the areas of data breaches and security enforcement actions worldwide. The Innovation Award judges recognized MPC for its successful and sophisticated approach to a very difficult problem.

Microsoft Corporation received the honor in the Technology category for “U-Prove”, a privacy-enhancing identity management technology that helps enable people to protect their identity-related information. The technology is based on advanced cryptographic protocols designed for electronic transactions and communications. It was acquired by Microsoft in 2008 and released into Proof of Concept as well as donated to the Open Source community in 2010. U-Prove technology has similar characteristics of conventionally used technologies, such as PKI certificates and SAML tokens, with additional privacy and security benefits. Through a technique of minimal disclosure, U-Prove tokens enable individuals to disclose just the information needed by applications and services, but nothing more, during online transactions. Online service providers, such as businesses and governments that are involved in transactions with individuals cannot link or collect a profile of activities. U-Prove effectively meets the security and privacy requirements of many identity systems—most notably national e-ID schemes now being contemplated by world governments. U-Prove has already won the Kuppinger Cole prize for best innovation in European identity projects and is now this year’s recipient of the HP-IAPP Privacy Innovation Award in technology.

About the IAPP
The International Association of Privacy Professionals is the world's largest association of privacy professionals with more than 7,400 members across 50 countries. The IAPP helps to define, support and improve the privacy profession globally through networking, education and certification. More information about the IAPP is available at www.privacyassociation.org.

Non-Personal Information – like where you live?

Last week I gave a presentation at PII 2010 in Seattle where I tried to summarize what I had learned from my recent work on WiFi location services and identity. During the question period an audience member asked me to return to the slide where I recounted how I had first encountered Apple’s new location tracking policy:

My questioner was clearly a bit irritated with me, Didn’t I realize that the “unique device identifier” was just a GUID – a purely random number? It wasn’t a MAC address. It was not personally identifying.

The question really perplexed me, since I had just shown a slide demonstrating how if you go to this well-known web site (for example) and enter a location you find out who lives there (I used myself as an example, and by the way, “whitepages” releases this information even though I have had an unlisted number…).

I pointed out the obvious: if Apple releases your location and a GUID to a third party on multiple occasions, one location will soon stand out as being your residence… Then presto, if the third pary looks up the address in a “Reverse Address” search engine, the “random” GUID identifies you personally forever more. The notion that location information tied to random identifiers is not personally identifiable information is total hogwash.

My questioner then asked, “Is your problem that Apple’s privacy policy is so clear? Do you prefer companies who don’t publish a privacy policy at all, but rather just take your information without telling you?” A chorus of groans seemed to answer his question to everyone’s satisfaction. But I personally found the question thought provoking. I assume corporations publish privacy policies – even those as duplicitous as Apple’s – because they have to. I need to learn more about why.

[Meanwhile, if you’re wondering how I could possibly post my own residential address on my blog, it turns out I’ve moved and it is no longer my address. Beyond that, the initial “A” in the listing above has nothing to do with my real name – it’s just a mechanism I use to track who has given out my personal information.]

How to anger your most loyal supporters

The gaming world is seething after what is seen as an egregious assault on privacy by World of Warcraft (WoW), one of the most successful multiplayer role-playing games yet devised. The issue? Whereas players used to know each other through their WoW “handles”, the company is now introducing a system called “RealID” that forces players to reveal their offline identities within the game's fantasy context. Commentators think the company wanted to turn its user base into a new social network. Judging from the massive hullabaloo amongst even its most loyal supporters, the concept may be doomed.

To get an idea of the dimensions of the backlash just type “WoW RealID” into a search engine. You'll hit paydirt:

The RealID feature is probably the kookiest example yet of breaking the Fourth Law of Identity – the law of Directed Identity. This law articulates the requirement to scope digital identifiers to the context in which they are used. In particular, it explains why universal identifiers should not be used where a person's relationship is to a specific context. The law arises from the need for “contextual separation” – the right of individuals to participate in multiple contexts without those contexts being linkable unless the individual wants them to be.

The company seems to have initially inflicted Real ID onto everyone, and then backed off by describing the lack of “opt-in” as a “security flaw”, according to this official post on wow.com:

To be clear, everyone who does not have a parentally controlled account has in fact opted into Real ID, due to a security flaw. Addons have access to the name on your account right now. So you need to be very careful about what addons you download — make sure they are reputable. In order to actually opt out, you need to set up parental controls on your account. This is not an easy task. Previous to the Battle.net merge, you could just go to a page and set them up. Done. Now, you must set up an account as one that is under parental control. Once your account is that of a child's (a several-step process), your settings default to Real ID-disabled. Any Real ID friends you have will no longer be friends. In order to enable it, you need to check the Enable Real ID box.

Clearly there are security problems that emerge from squishing identifiers together and breaking cross-context separation. Mary Landsman has a great post on her Antivirus Software Blog called “WoW Real ID: A Really Bad Idea“:

Here are a couple of snippets about the new Battle.net Real ID program:

“…when you click on one of your Real ID friends, you will be able to see the names of his or her other Real ID friends, even if you are not Real ID friends with those players yourself.”

“…your mutual Real ID friends, as well as their Real ID friends, will be able to see your first and last name (the name registered to the Battle.net account).”

“…Real ID friends will see detailed Rich Presence information (what character the Real ID friend is playing, what they are doing within that game, etc.) and will be able to view and send Broadcast messages to other Real ID friends.”

And this is all cross-game, cross-realm, and cross-alts. Just what already heavily targeted players need, right? A merge of WoW/Battle.net/StarCraft with Facebook-style social networking? Facepalm might have been a better term to describe Real ID given its potential for scams. Especially since Blizzard rolled out the change without any provision to protect minors whatsoever:

Will parents be able to manage whether their children are able to use Real ID?
We plan to update our Parental Controls with tools that will allow parents to manage their children's use of Real ID. We'll have more details to share in the future.

Nice. So some time in the future, Blizzard might start looking at considering security seriously. In the meantime, the unmanaged Real ID program makes it even easier for scammers to socially engineer players AND it adds potential stalking to the list of concerns. With no provision to protect minors whatsoever.

Thanks, Blizz…Not!

And Kyth has a must-read post at stratfu called Deeply Disappointed with the ‘RealID’ System where he explains how RealID should have been done. His ideas are a great implementation of the Fourth Law.

Using an alias would be fine, especially if the games are integrated in such a way that you could pull up a list of a single Battle.net account's WoW/D3 characters and SC2 profiles. Here is how the system should work:

You have a Battle.net account. The overall account has a RealID Handle. This Handle defaults to being your real name, but you can easily change it (talking single-click retard easy here) to anything you desire. Mine would be [WGA]Kazanir, just like my Steam handle is.
Each of your games is attached to your Battle.net account and thereby to your RealID. Your RealID friends can see you when you are online in any of those games and message you cross-game, as well as seeing a list of your characters or individual game profiles. Your displayed RealID is the handle described above.
Each game contains either a profile (SC2) or a list of characters. A list of any profiles or characters attached to your Battle.net account would be easily accessible from your account management screen. Any of these characters can be “opted out” of your RealID by unchecking them from the list. Thus, my list might look like this:
X Kazanir.wga – SC2 ProfileX Kazanir – WoW – 80 Druid Mal'ganisX Gidgiddoni – WoW – 60 Warrior Mal'ganis_ Kazbank – WoW – 2 Hunter Mal'ganisX Kazabarb – D3 – 97 Barbarian US East_ Kazahidden – D3 – 45 Monk US West

In this way I can play on characters (such as a bank alt or a secret D3 character with my e-girlfriend) without forcibly having their identity broadcast to my friends.When I am online on any of the characters I have unchecked, my RealID friends will be able to message me but those characters will not be visible even to RealID friends. The messages will merely appear to come from my RealID and the “which character is he on” information will not be available.
Finally, the RealID messenger implementation in every game should be able to hide my presence from view just like any instant messenger application can right now. I shouldn't be forced to be present with my RealID just because I am playing a game — there should be a universal “pretend to not be online” button available in every Battle.net enabled game.

These are the most basic functionality requirements that should be implemented by anyone with an IQ over 80 who designs a system like this.

Check out the comments in response to his post. I would have to call his really sensible and informed proposal “wildly popular”. It will be really interesting to see how this terrible blunder by such a creative company will end up.

[Thanks to Joe Long for heads up]

Doing it right: Touch2Id

And now for something refreshingly different: an innovative company that is doing identity right.

I'm talking about a British outfit called Touch2Id. Their concept is really simple. They offer young people a smart card that can be used to prove they are old enough to drink alcohol. The technology is now well beyond the “proof of concept” phase – in fact its use in Wiltshire, England is being expanded based on its initial success.

To register, people present their ID documents and, once verified, a template of their fingerprint is stored on a Touch2Id card that is immediately given to them.
When they go to a bar, they wave their card over a machine similar to a credit card reader, and press their finger on the machine. If their finger matches the template on their card, the lights come on and they can walk on in.

What's great here is:

Merchants don't have to worry about making mistakes. The age vetting process is stringent and fake IDs are weeded out by experts.
Young people don't have to worry about being discriminated against (or being embarassed) just because they “look young”
No identifying information is released to the merchant. No name, age or photo appears on (or is stored on) the card.
The movements of the young person are not tracked.
There is no central database assembled that contains the fingerprints of innocent people
The fingerprint template remains the property of the person with the fingerprint – there is no privacy issue or security honeypot.
Kids cannot lend their card to a friend – the friend's finger would not match the fingerprint template.
If the card is lost or stolen, it won't work any more
The templates on the card are digitally signed and can't be tampered with

I met the man behind the Touch2Id, Giles Sergant, at the recent EEMA meeting in London.

Being a skeptic versed in the (mis) use of biometrics in identity – especially the fingerprinting of our kids – I was initially more than skeptical.

But Giles has done his homework (even auditing the course given by privacy experts Gus Hosein and Simon Davies at the London School of Economics). The better I understood the approach he has taken, the more impressed I was.

Eventually I even agreed to enroll so as to get a feeling for what the experience was like. The verdict: amazing. Its a lovely piece of minimalistic engineering, with no unnecessary moving parts or ugly underbelly. If I look strangely euphoric in the photo that was taken it is because I was thoroughly surprised by seeing something so good.

Since then, Giles has already added an alternate form factor – an NFC sticker people can put on their mobile phone so they don't actually need to carry around an additional artifact. It will be fascinating to watch how young people respond to this initiative, which Giles is trying to grow from the bottom up. More info on the Facebook page.

What Could Google Do With the Data It's Collected?

Niraj Chokshi has published a piece in The Atlantic where he grapples admirably with the issues related to Google's collection and use of device fingerprints (technically called MAC Addresses). It is important and encouraging to have journalists like Niraj taking the time to explore these complex issues.

But I have to say that such an exploration is really hard right now.

Whether on purpose or by accident, the Google PR machine is still handing out contradictory messages. In particular, the description in Google's Refresher FAQ titled “How does this location database work?” is currently completely different from (read: the opposite of) what its public relations people are telling journalists like Nitaj. I think reestablishing credibility around location services requires the messages to be made consistent so they can be verified by data protection authorities.

Here are some excerpts from the piece – annotated with some comments by me. [Read the whole article here.]

The Wi-Fi data Google collected in over 30 countries could be more revealing than initially thought…

Google's CEO Eric Schmidt has said the information was hardly useful and that the company had done nothing with it. The search giant has also been ordered (or sought) to destroy the data. According to their own blog post, Google logged three things from wireless networks within range of their vans: snippets of unencrypted data; the names of available wireless networks; and a unique identifier associated with devices like wireless routers. Google blamed the collection on a rogue bit of code that was never removed after it had been inserted by an engineer during testing.

[The statement about rogue code is an example of the PR ambiguity Nitaj and other journalists must deal with. Google blogs don't actually blame the collection of unique identifiers on rogue code, although they seem crafted to leave people with that impression. Spokesmen only blame rogue code for the collection of unencrypted data content (e.g. email messages.) – Kim]

Each of the three types of data Google recorded has its uses, but it's that last one, the unique identifier, that could be valuable to a company of Google's scale. That ID is known as the media access control (MAC) address and it is included — unencrypted, by design — in any transfer, blogger Joe Mansfield explains.

Google says it only downloaded unencrypted data packets, which could contain information about the sites users visited. Those packets also include the MAC address of both the sending and receiving devices — the laptop and router, for example.

[Another contradiction: Google PR says it “only” collected unencrypted data packets, but Google's GStumbler report says its cars did collect and record the MAC addresses from encrypted data frames as well. – Kim]

A company as large as Google could develop profiles of individuals based on their mobile device MAC addresses, argues Mansfield:

Get enough data points over a couple of months or years and the database will certainly contain many repeat detections of mobile MAC addresses at many different locations, with a decent chance of being able to identify a home or work address to go with it.

Now, to be fair, we don't know whether Google actually scrubbed the packets it collected for MAC addresses and the company's statements indicate they did not. [Yet the GStumbler report says ALL MAC addresses were recorded – Kim]. The search giant even said it “cannot identify an individual from the location data Google collects via its Street View cars.” Add a step, however, and Google could deduce an individual from the location data, argues Avi Bar-Zeev, an employee of Microsoft, a Google competitor.

[Google] could (opposite of cannot) yield your identity if you've used Google's services or otherwise revealed it to them in association with your IP address (which would be the public IP of your router in most cases, visible to web servers during routine queries like HTTP GET). If Google remembered that connection (and why not, if they remember your search history?), they now have your likely home address and identity at the same time. Whether they actually do this or not is unclear to me, since they say they can't do A but surely they could do B if they wanted to.

Theoretically, Google could use the MAC address for a mobile device — an iPod, a laptop, etc. — to build profiles of an individual's activity. (It's unclear whether they did and Google has indicated that they have not.) But there's also value in the MAC addresses of wireless routers.

Once a router has been associated with a real-world location, it becomes useful as a reference point. The Boston company Skyhook Wireless, for example, has long maintained a database of MAC addresses, collected in a (slightly) less-intrusive way. Skyhook is the primary wireless positioning system used by Apple's iPhone and iPod Touch. (See a map of their U.S. coverage here.) When your iPod Touch wants to retrieve the current location, it shares the MAC addresses of nearby routers with Skyhook which pings its database to figure out where you are.

Google Latitude, which lets users share their current location, has at least 3 million active users and works in a similar way. When a user decides to share his location with any Google service on a non-GPS device, he sends all visible MAC addresses in the vicinity to the search giant, according to the company's own description of how its location services works.

[Update: Google's own “refresher FAQ” states that a user of its geo-location services, such as Latitude, sends all MAC addresses “currently visible to the device” to Google, but a spokesman said the service only collects the MAC addresses of routers. That FAQ statment is the basis of the following argument.]

This is disturbing, argues blogger Kim Cameron (also a Microsoft employee), because it could mean the company is getting not only router addresses, but also the MAC addresses of devices such as laptops and iPods. If you are sitting next to a Google Latitude user who shares his location, Google could know the address and location of your device even though you didn't opt in. That could then be compared with all other logged instances of your MAC address to develop a profile of where the device is and has been.

Google denies using the information it collected and, if the company is telling the truth, then only data from unencrypted networks was intercepted anyway, so you have less to worry about if your home wireless network is password-protected. (It's still not totally clear whether only router MAC addresses were collected. Google said it collected the information for devices “like a WiFi router.”) Whether it did or did not collect or use this information isn't clear, but Google, like many of its competitors, has a strong incentive to get this kind of location data.

[Again, and I really do feel for Niraj, the PR leaves the impression that if you have passwords and encryption turned on you have nothing to worry about, but Googles’ GStumbler report says that passwords and encryption did not prevent the collection of the MAC addresses of phones and laptops from homes and businesses. – Kim]

I really tuned in to these contradictory messages when a reader first alerted me to Niraj's article. It looked like this:

My comments earned their strike-throughs when a Google spokesman assured the Atlantic “the Service only collects the MAC addresses of routers.” I pointed out that my statement was actually based on Google's own FAQ, and it was their FAQ (“How does this location database work?”) – rather than my comments – that deserved to be corrected. After verifying that this was true, Niraj agreed to remove the strikethrough.

How can anyone be expected to get this story right given the contradictions in what Google says it has done?

In light of this, I would like to see Google issue a revision to its “Refresher FAQ” that currently reads:

The “list of MAC addresses which are currently visible to the device” would include the addresses of nearby phones and laptops. Since Google PR has assured Niraj that “the service only collects the MAC addresses of routers”, the right thing to do would be to correct the FAQ so it reads:

“The user’s device sends a request to the Google location server with the list of MAC addresses found in Beacon Frames announcing a Network Access Point SSID and excluding the addresses of end user devices like WiFi enabled phones and laptops.”

This would at least reassure us that Google has not delivered software with the ability to track non-subscribers and this could be verified by data protection authorities. We could then limit our concerns to what we need to do to ensure that no such software is ever deployed in the future.

National Strategy for Trusted Identities in Cyberspace

Friday saw what I think is a historic post by Howard Schmidt on The Whitehouse Blog:

“Today, I am pleased to announce the latest step in moving our Nation forward in securing our cyberspace with the release of the draft National Strategy for Trusted Identities in Cyberspace (NSTIC). This first draft of NSTIC was developed in collaboration with key government agencies, business leaders and privacy advocates. What has emerged is a blueprint to reduce cybersecurity vulnerabilities and improve online privacy protections through the use of trusted digital identities. “

I say the current draft is historic because of the grasp of identity issues it achieves.

At the core of the document is a recognition that we need a solution supporting privacy-enhancing technologies and built by harnessing a user-centric Identity Ecosystem offering citizens and private enterprise plenty of choice.

Finally we have before us a proposal that can move society forward in protecting individual privacy and simultaneously create a secure and trustworthy infrastructure with enough protections to be resistant to insider attacks.

Further, the work appears to have support from multiple government agencies – the Department of Homeland Security was a key partner in its creation.

Here are the guiding principles (beginning page 8):

Identity solutions will be secure and resilient
Identity solutions will be interoperable
Identity solutions will be privacy enhancing and voluntary for the public
Identity solutions will be cost-effective and easy to use

Let's start with the final “s” on the word “solutions” – a major achievement. The authors understand society needs a spectrum of approaches suitable for different use cases but fitting within a common interoperable framework – what I and others have called an identity metasystem.

The report embraces the need for anonymous access as well as that for strong identification. It stands firmly in favor of minimal disclosure. The authors call out the requirement that solutions be privacy enhancing and voluntary for the public, rather than attempting to ram something bureaucratic down peoples’ throats. And they are fully cognisant of the practicality and usability requirements for the initiative to be successful. A few years ago I would not have believed this kind of progress would be possible.

Nor is the report just a theoretical treatment devoid of concrete proposals. The section on “Commitment to Action” includes:

Designate a federal agency to lead the public/private sector efforts to advance the vision
Develop a shared, comprehensive public/private sector implementation plan
Accelerate the expansion of government services, pilots and policies that align with the identity ecosystem
Work to implement enhanced privacy protections
Coordinate the development and refinement of risk management and interoperability standards
Address liability concerns of service providers and individuals
Perform outreach and awareness across all stakeholders
Continue collaborating in international efforts
Identify other means to drive adoption

Readers should dive into the report – it is in a draft stage and “Public ideas and recommendations to further refine this Strategy are encouraged.”

A number of people and organizations in the identity world have participated in getting this right, working closely with policy thinkers and those leading this initiative in government. I don't hesitate to say that congratulations are due all round for getting this effort off to such a good start.

We can expect suggestions to be made strengthening various aspects of the report – mainly in terms of making it more internally consistent.

For example, the report contains good vignettes about minimal disclosure and the use of claims to gain access to resources. Yet it also retains the traditional notion that authentication is dependent on identification. What is meant by identification? Many will assume it means “unique identification” in the old-fashioned sense of associating someone with an identifier. That doesn't jive with the notion of minimal disclosure present throughout the report. Why? For many purposes association with an identifier is over-identification or unhelpful, and a simple proof of some set of claims would suffice to control access.

But these refinements can be made fairly easily. The real challenge will be to actually live up to the guiding principles as we move from high level statements to a widely deployed system – making it truly secure, resilient and privacy enhancing. These are guiding principles we can use to measure our success and help select between alternatives.

Digital copiers – a privacy and security timebomb

Everyone involved with software and services should watch this remarkable investigative report by CBS News and think about what it teaches us.

Nearly every digital copier built since 2002 contains a hard drive storing an image of every document copied, scanned, or emailed by the machine. Because of this, the report shows, an office staple has turned into a digital time-bomb packed with highly-personal or sensitive data. To quote the narrator, “If you're in the identity theft business it seems this would be a pot of gold.”

In the video, the investigators purchase some used machines and then John Juntunen of Digital Copier Security shows them what is still stored on them when they are resold. As he says, “The type of information we see on these machines with the social security numbers, birth certificates, bank records, income tax forms… would be very valuable.” He's been trying to warn people about the potential risk, but “Nobody wants to step up and say, ‘we see the problem, and we need to solve it.'”

The results obtained by the investigators in their random sample are stunning, turning up:

detailed domestic violence complaints;
a list of wanted sex offenders;
a list of targets in a major drug raid;
design plans for a building near Ground Zero in Manhattan;
95 pages of pay stubs with names, addresses and social security numbers;
$40,000 in copied checks; and
300 pages of individual medical records including everything from drug prescriptions, to blood test results, to a cancer diagnosis.

Why are these records sitting around on the hard disk in the first place? Why aren't they deleted once the copy has been completed or within some minimal time? If they are kept for audit purposes, why aren't they encrypted for the auditor?

Is this “rainy-day data collection?” Gee, we have a hard disk, why don't we keep the scans around – they might come in useful sometime.

It becomes clear that addressing privacy and security threats was never a concern in designing these machines – which are actually computer systems. This was an example of “privacy chernoble by design”. Of course I'm speaking not only about individual privacy, but that of the organizations using the machines as well. The report makes it obvious that digital copiers, or anything else that collects or remembers information, must be designed based on the Law of Minimal Disclosure.

This story also casts an interesting light on what the French are calling “le droit à l'oubli” – the right to have things forgotten. Most discussions I've seen call for this principle to be applied on the Internet. But as the digital world collides with the molecular one, we will see the need to build information lifetimes into all digital systems, including smart systems in our environment. The current and very serious problems with copiers should be seen as profoundly instructive in this regard.

[Thanks to Francis Shanahan for heads up]

Rethink things in light of Google's Gstumbler report

A number of technical people have given Google the benefit of the doubt in the Street View Wifi case and as a result published information that Google's new “Gstumbler” report shows is completely incorrect. It is important that people re-evaluate what they are saying in light of this report.

I'll pick on Conor's recent posting on our discussion as an example – it contains a number of statements and implies a number of things explicitly contradicted by Google's new report. Once he reads the report and applies the logic he has put forward, logic will require Conor to change his conclusions.

Conor begins with a bunch of statements that are true:

MAC addresses typically are persistent identifiers that by the definition of the protocols used in wireless APs can't be hidden from snoopers, even if you turn on encryption.
By themselves, MAC addresses are not all that useful except to communicate with a local network entity (so you need to be nearby on the same local network to use them.
When you combine MAC addresses with other information (locality, user identity, etc.) you can be creating worrisome data aggregations that when exposed publicly could have a detrimental impact on a user's privacy.
SSIDs have some of these properties as well, though the protocol clearly gives the user control over whether or not to broadcast (publicize) their SSID. The choice of the SSID value can have a substantial impact on it's use as a privacy invading value — a generic value such as “home” or “linksys” is much less likely to be a privacy issue than “ConorCahillsHomeAP”.

Wishful thinking and completely wrong

These are followed by a statement that is just plain wishful thinking. Conor continues:

Google purposely collected SSID and MAC Addresses from APs which were configured in SSID broadcast mode and inadvertently collected some network traffic data from those same APs. Google did not collect information from APs configured to not broadcast SSIDs.

Google's report says Conor is wrong about this, explicitly saying in paragraph 26, “Kismet can also detect the existence of networks with non-broadcast SSIDs, and will capture, parse, and record data from such networks“. Conor continues:

Google associated the SSID and MAC information with some location information (probably the GPS vehicle location at the time the AP signal was strongest).

This is true, but it is important to indicate that this was not limited to access points. Google's report says that it recorded the association between the MAC address and geographic location of all the active devices on the network. When it did this, the MAC addresses became, according to Conor's own earlier definition, “worrisome data aggregations”.

There is no AP protocol defined means to differentiate between open wireless hotspots and closed hotspots which broadcast their SSIDs.

This is true, but Google's report indicates this would not have mattered – it collected MACs regardless of whether SSIDs were broadcast.

I have not found out if Google used the encryption status of the APs in its decision about recording the SSID/MAC information for the AP.

Google's report indicates it did not. It only used that status to decide whether or not to record the payload – and only recorded the payload of unencrypted frames…

I like Conor's logic that, “When you combine MAC addresses with other information (locality, user identity, etc.) you can be creating worrisome data aggregations that when exposed publicly could have a detrimental impact on a user's privacy.” I urge Conor to read the Gstumbler report. Once he knows what was actually happening, I hope he'll tell the world about it.

Gstumbler tells all

The third party commissioned by Google to review the software used in its Street View WiFi cars has completed its report, called Source Code Analysis of ‘Gstumbler’. I will resist commenting on the name, since Google did the right thing in publishing the report: there will no longer be any ambiguity about what was being collected.

As we have discussed over the last week, two issues are of importance – collection of device identity data, and collection of payload data. One thing I like about te report is that it has a begins with a a number of technical “descriptions and definitions”. For example, in paragraph 7 it explains enveloping:

“Each packet is comprised of a packet header which contains network administrative information and the addressing information (or “envelope” information) necessary to transmit the data packet from one device to another along the path to its final destination. Each packet also contains a “payload” which is a fragment of the “content” of the communication or data transmission sent or received over the internet…”

It explains that in 802.11 packets are encapsulated in frames, describes the types of frames and presents the standard diagram showing how a frame is structured.

Readers should understand that when network encryption is turned on, it is only the Frame Body (Payload) of data frames that is encrypted.

In paragraph 19, the report provides an overview of its findings:

“While running in memory, the program parses frame header information, such as frame type, MAC addresses, and other network administrative data from each of the captured frames. The parsing separates the information into discreet fields for easier analysis… All available MAC addresses contained in a frame are also parsed. All of this parsed header information is written to disk for frames transmitted over both encrypted and unencrypted wireless networks [emphasis mine – Kim].”

In paragraph 20, the report explains that the software discards the content of encrypted bodies (which of course it can't analyse anyway) whereas unencrypted bodies are also written to disk. I have not discussed the issue of collecting the frame bodies in these pages – there is no need to do so since it is intuitively easy for people to understand what it means to collect payloads.

In paragraph 22 the report concludes that “all wireless frame data was recorded except for the bodies of 802.11 Data frames from encypted networks.”

All device identifiers were recorded

As a result, there is no longer any question. The MAC addresses of all the WiFi laptops and phones in the homes, businesses, enterprises and government buildings were recorded by the driveby mapping cars, as were the wireless access points, and this regardless of the use of encryption.

My one quibble with the otherwise excellent report is that it calls the MAC addresses “network administrative data”. In fact they are the device identifiers of the network devices – both of the network access point and the devices connecting to that access point – phones and laptops.

It is also worth, given some of the previous conversations about supposed “broadcasting”, drawing attention to paragraph 26, which explains,

“Kismet captures wireless frames using wireless network interface cards set to monitoring mode. The use of monitoring mode means that Kismet directs the wireless hardware to listen for and process all wireless traffic regardless of its intended destination… Through the use of passive packet sniffing, Kismet can also detect the existence of netwrks with non-broadcast SSIDs, and will capture, parse, and record data from such networks.”