Linkage and identification

Inspired by some of Ben Laurie's recent postings, I want to continue exploring the issues of privacy and linkability (see related pieces here and here). 

I have explained that CardSpace is a way of selecting and transferring a relevant digital identity – not a crypto system; and that the privacy characteristics involved depend on the nature of the transaction and the identity provider being used within CardSpace – not on CardSpace itself.   I ended my last piece this way:

The question now becomes that of how identity providers behave.  Given that suddenly they have no visibility onto the relying party, is linkability still possible?

But before zeroing in on specific technologies, I want to drill into two issues.  First is the meaning of “identification”; and second, the meaning of “linkability” and its related concept of “traceability”.  

Having done this will allow us to describe different types of linkage, and set up our look at how different cryptographic approaches and transactional architectures relate to them.

Identification 

There has been much discussion of identification (which, for those new to this world, is not at all the same as digital identity).  I would like to take up the definitions used in the EU Data Protection Directive, which have been nicely summarized here, but add a few precisions.  First, we need to broaden the definition of “indirect identification” by dropping the requirement for unique attributes – as long as you end up with unambiguous identification.  Second, we need to distinguish between identification as a technical phenomenon and personal identification.

This leads to the following taxonomy:

  • Personal data:
    •  any piece of information regarding an identified or identifiable natural person.
  • Direct Personal Identification:
    • establishing that an entity is a specific natural person through use of basic personal data (e.g., name, address, etc.), plus a personal number, a widely known pseudo-identity, a biometric characteristic such as a fingerprint, PD, etc.
  • Indirect Personal Identification:
    • establishing that an entity is a specific natural person through other characteristics or attributes or a combination of both – in other words, to assemble “sufficiently identifying” information
  • Personal Non-Identification:
    • assumed if the amount and the nature of the indirectly identifying data are such that identification of the individual as a natural person is only possible with the application of disproportionate effort, or through the assistance of a third party outside the power and authority of the person responsible… 

Translating to the vocabulary we often use in the software industry, direct personal identification is done through a unique personal identifier assigned to a natural person.  Indirect personal identification occurs when enough claims are released – unique or not – that linkage to a natural person can be accomplished.  If linkage to a natural person is not possible, you have personal non-identification.  We have added the word “personal”  to each of these definitions so we could withstand the paradox that when pseudonyms are used, unique identifiers may in fact lead to personal non-identification… 

The notion of “disproportionate effort” is an important one.  The basic idea is useful, with the proviso that when one controls computerized systems end-to-end one may accomplish very complicated tasks,  computations and correlations very easily – and this does not in itself constitute “disproportionate effort”.

Linkability

If you search for “linkability”, you will find that about half the hits refer to the characteristics that make people want to link to your web site.  That's NOT what's being discussed here.

Instead, we're talking about being able to link one transaction to another.

The first time I heard the word used this way was in reference to the E-Cash systems of the eighties.  With physical cash, you can walk into a store and buy something with one coin, later buy something else with another coin, and be assured there is no linkage between the two transactions that is caused by the coins themselves. 

This quality is hard to achieve with electronic payments.  Think of how a credit card or debit card or bank account works.  Use the same credit card for two transactions and you create an electronic trail that connects them together.

E-Cash was proposed as a means of getting characteristics similar to those of the physical world when dealing with electronic transactions.  Non-linkability was the concept introduced to describe this.  Over time it has become a key concept of privacy research, which models all identity transactions as involving similar basic issues.

Linkability is closely related to traceability.  By traceability people are talking about being able to follow a transaction through all its phases by collecting transaction information and having some way of identifying the transaction payload as it moves through the system.

Traceability is often explicitly sought.  For example, with credit card purchases, there is a transaction identifier which ties the same event together across the computer systems of the participating banks, clearing house and merchant.  This is certainly considered “a feature.”  There are other, subtler, sometimes unintended, ways of achieving traceability (timestamps and the like). 

Once you can link two transactions, many different outcomes may result.  Two transactions conveying direct personal identification might be linked.  Or, a transaction initially characterized by personal non-identification may suddenly become subject to indirect personal identification.

To further facilitate the discussion, I think we should distinguish various types of linking:

  • Intra-transaction linking is the product of traceability, and provides visibility between the claims issuer, the user presenting the claims, and the relying party  (for example, credit card transaction number).
  • Single-site transaction linking associates a number of transactions at a single site with a data subject.  The phrase “data subject” is used to clarify that no linking is implied between the transactions and any “natural person”.
  • Multi-site transaction linking associates linked transactions at one site with those at another site.
  • Natural person linking associates a data subject with a natural person.

Next time I will use these ideas to help explain how specific crypto systems and protocol approaches impact privacy.

Ben Laurie on Selective Disclosure (Part 1)

Google's Ben Laurie has a new paper called Selective Disclosure in which he argues the importance of zero knowledge proofs and privacy-enhancing cryptography. I fully share his view of the importance of these technologies.

Everyone with a technical interest in identity should look at Credentica’s recently released SDK, called U-Prove. It holistically embodies the cryptographic breakthroughs of Stefan Brands.

There is also a competing system from IBM called IDEMIX, though it is not yet publicly available and I can't talk about it first-hand.

On his way toward explaining how these systems work, Ben takes the time to put forward his own Laws of Identity (“Let a thousand flowers bloom!”)  He is responding to my Fourth Law, which asserts the need for the Identity Metasystem to support both public identifiers (for example, my blogging address) and private ones (my account number with a given company, unknown to anyone but me and them).  He says:

“For an identity management system to be both useful and privacy preserving, there are three properties assertions must be able to have. They must be:

  • Verifiable: There’s often no point in making a statement unless the relying party has some way of checking it is true. Note that this isn’t always a requirement – I don’t have to prove my address is mine to Amazon, because its up to me where my goods get delivered. But I may have to prove I’m over 18 to get alcohol delivered.
  • Minimal: This is the privacy preserving bit – I want to tell the relying party the very least he needs to know. I shouldn’t have to reveal my date of birth, just prove I’m over 18 somehow.
  • Unlinkable: If the relying party or parties, or other actors in the system, can, either on their own or in collusion, link together my various assertions, then I’ve blown the minimality requirement out of the water.”

These are important things for the Identity Metasystem to support, and I make the same points in my own version of the laws. But I don't think these characteristics are the whole story – rather, they describe requirements for certain use cases.  However, there are other use cases, and it was the goal of the original Laws of Identity to embrace them as well.

For example, when I blog I want to use an identity that is linkable. I want anyone who is interested in my ideas to be able to talk about them with anyone else, and tell them how to get to my web site, which is – in the most literal sense of the word – a “linkable” characteristic of my identity.

And when accessing my bank account through the internet, I would rather like to ensure that the party withdrawing the money is tightly linked to a real-world person – hopefully me.

So we don't always want our identity to be unlinkable. We want unlinkability to be one of our options.

Similarly, I don't want every assertion I make to be verified by some bureaucratic process. I don't run around in the molecular world armed with official documents that I present to every Tom, Dick and Harry.  Again, I want verifiability to be one of my options, but not more than that. We need to be careful about what we wish for here. Requiring individuals to employ identities verified by third parties in contexts where there is no good reason for it is a slippery and dangerous slope. So I hope that's not what Ben has in mind.

When using a public identity (which I call “omnidirectional” because you share it with everyone) I may want to divulge more information than is necessary for some specific purpose. So even the notion of minimal disclosure doesn't apply within certain contexts and when public personas are involved.

Thus I take Ben's real point to be that an important and mainstream use case is one where verifiability, minimal disclosure AND unlinkability, should all be achievable at the same time.  This I agree with.

What strikes me as strange in Ben's document is this comment:

“Note a subtle but important difference between Kim’s laws and mine – he talks about identifiers whereas I talk about assertions. In an ideal world, assertions would not be identifiers; but it turns out that in practice they often are.”

I say “strange” because when you actually read the Laws of Identity this is what you will find:

We will begin by defining a digital identity as a set of claims made by one digital subject about itself or another digital subject. (page 4)

Is what I called a claim different from what Ben calls an assertion?  Well, on page 5 of the Laws, I wrote (citing the Oxford English Dictionary, well known to Ben):

A claim is, “…an assertion of the truth of something, typically one which is disputed or in doubt”.

Clearly I'm going a bit beyond Ben's concerns in that I remind the reader that assertions must be evaluated, not just made.  But in no way can I be said to confuse identifiers and digital identity.  In fact, I go on to give some examples which make my ideas in this regard perfectly clear.  (page 5). 

  • A claim could just convey an identifier – for example, that the subject’s student number is 490-525, or that the subject’s Windows name is REDMOND\kcameron. This is the way many existing identity systems work.
  • Another claim might assert that a subject knows a given key – and should be able to demonstrate this fact.
  • A set of claims might convey personally identifying information – name, address, date of birth and citizenship, for example.
  • A claim might simply propose that a subject is part of a certain group – for example, that she has an age less than 16.
  • And a claim might state that a subject has a certain capability – for example to place orders up to a certain limit, or modify a given file.

These examples demonstrate that there is no difference between my approach to claims in the Laws of Identity and that proposed recently by Ben.  Further examples abound: 

In proffering this definition, we recognize it does not jive with some widely held beliefs – for example that within a given context, identities have to be unique. (page 5)

Or:

In this scenario we actually do not want to employ unique individual identifiers as digital identities. (page 6)

Or:

It makes sense to employ a government issued digital identity when interacting with government services (a single overall identity neither implies nor prevents correlation of identifiers between individual government departments). (Page 9)

Now I know we're all busy.  I don't expect people to be familiar with my work.  But Ben explicitly cites what he calls my “famous laws” in a footnote, and calls out the “important difference” in our thinking as being that I'm concerned with identifiers, whereas he is concerned with assertions.  He's just wrong, as anyone who reads the Laws can see. 

The fact is, I like Ben, and I like a lot of his thinking.  I hope he reads my version of the laws before citing them again, and that he corrects his rendition of my thinking about identifiers and claims or assertions.  This seems doable since his paper is still a work in progress.

Age and identity verification in Second Life

Via Dennis Hamilton, a pointer to a new experiment at Second Life:

We will shortly begin beta testing an age and identity verification system, which will allow Residents to provide a one-time proof of identity (such as a driver’s license, passport or ID card) and have that identity verified in a matter of moments.

Second Life has always been restricted to those over 18. All Residents personally assert their age on registration. When we receive reports of underage Residents in Second Life, we close their account until they provide us with proof of age. This system works well, but as the community grows and the attractions of Second Life become more widely known, we’ve decided to add an additional layer of protection.

Once the age verification system is in place, only those Residents with verified age will be able to access adult content in Mature areas. Any Resident wishing to access adult content will have to prove they are over 18 in real life.We have created Teen Second Life for minors under the age of 18. Access to TSL by adults is prohibited, with minors not allowed into the rest of Second Life.

For their part, land owners will be required to flag their land as ‘adult’ if it contains adult content using the estate and land management tools provided to landowners. This flag will protect landowners from displaying inappropriate content to underage users who may have entered Second Life. Landowners are morally and legally responsible for the content displayed and the behavior taking place on their land. The identity verification system gives them new tools to ensure any adult content is only available to adults over 18 because unverified avatars will not have access to land flagged as containing adult content.

We hope you’ll agree that the small inconvenience of doing this once is far outweighed by the benefits of protecting minors from inappropriate content. Further, this system will assist landowners in engaging in lawful businesses.

The verification system will be run by a third party specializing in age and identity authentication. No personally identifying information will be stored by them or by Linden Lab, including date of birth, unless the Resident chooses to do so. Those who wish to be verified, but remain anonymous, are free to do so.

(Continues here…)

The idea of presenting a passport to get into an imaginary adult establishment strikes me as nutso.  I must be missing a gene.  It is certainly a conundrum, this virtual world. 

I think that rather than adopting this one-off inspector approach, outfits like Second Life and all the other big web sites should get together to accept registration claims from whatever identity providers would fully guarantee both accuracy and the anonymity of their users.  Information Cards combined with the anonymous credential technology developed by people like Stefan Brands would provide the ideal solution.

QT and the perimeter

Jeff Bohren, The Identity Management Expert at TalkBMC, makes a great point about what laptops mean and hits gold with Quantum Tunneling.

Kim Cameron has another interesting entry on Deperimeterization here. All of this got me to thinking about another aspect of perimeter security, and that is network location. People tend to think of computers as being logically located inside or outside of the security perimeter. Or more specifically people without laptops tend to think that. If you have a laptop, you quickly realize that you flip-flop between the state of being in and out of the perimeter on a daily basis, or more frequently is you use VPN.

I like the analogy of Quantum Tunneling (QT). One moment your laptop is outside the perimeter, the next it’s magically in. Then out again. QT in, QT out. Of course any malware your laptop picks up outside the perimeter will be carried in on the next trip in. This should really be the nail in the coffin of perimeter security thinking, but unfortunately it isn’t.

The QT analogy came to me because I have been reading Ilium by Dan Simmons (author of the Hyperion series). This SF novel combines The Illiad, QT, Greek Gods, a mostly depopulated Earth, a terraformed Mars, Little Green Men, and Jovian Cyborg buddies (one who likes Shakespeare and one who like Proust). I’m not done yet, so it will be interesting to see if Simmons can pull it all together at the end.

Here is the identity tie in. In Simmon’s future Earth, the few remaining inhabitants can teleport from place to place. It turns out that peoples bodies aren’t actually teleported. The body and brain waves are scanned at origin and that information is stored in a central computer. The body, thoughts, and clothing are reconstituted at the destination based on that information.

Teleportation of identity! Fascinating.

Jeff has definitely got it.

Identity – the toy model

I look forward to seeing Dave Kearns explore the notion of Legonics in an upcoming newsletter. As Dave explains – with his usual clarity:

This morning while delivering the opening keynote address for this year's Directory Experts Conference, Kim Cameron introduced me to a new term – “Legonics“.

This is a reference to the well-known building blocks, Legos, familiar to anyone under 40, and the parents of those under 40! The great thing about Legos is that any one piece can connect to any other piece. And while you can buy a small set that can build a particular object (such as a fire truck), the pieces in that set can be put together in different ways to build other objects or combined with other sets – or other loose pieces – to build completely different things. So by creating a Legonic Identity System (LIS?) we have one which can put together identity data in various ways to fit the conditions of the moment. Relying Parties, Identity Providers and User Agents can work together to construct sets of Identity Claims from all of the available pieces of identity data.

It's a good analogy, and a good paradigm, I think. I'll probably explore his more in the newsletter.

The fire truck link is fantastic, by the way.  Meanwhile, how about:

le·gon·ics: noun

  1. (used with a singular verb) the science dealing with the development and application of devices and systems that can be assembled through claims.
  2. (used with a plural verb) Legonic systems and devices:  The legonics aboard the new aircraft are very sophisticated.

Subject oriented programming

Here's a seminal posting by =kermit at a blog called Subjectivity – mapping the world of digital identity.  I buy into the “Subject Oriented Programming” idea – it's wonderful.

More than a decade ago I happened upon this programming language called C+-, pronounced “C, more or less”:

Unlike C++, C+- is a subject-oriented language. Each C+- class instance, known as a subject, holds hidden members, known as prejudices or undeclared preferences, which are impervious to outside messages, as well as public members known as boasts or claims.

Of course it was a joke and I laughed, but the joke stung a bit. It had occurred to me that a claims-based system like this could actually be useful. I had even come up with the name “subject-oriented” for it. So it hurt a bit to find the idea “out there” only as the butt of a joke.

Well, things have certainly changed since then. Today Kim Cameron posted an item titled “Identity systems all about making claims”, and linked to another article by NetworkWorld’s John Fontana which elaborates:

Cameron said the flexible claims architecture, which is based on standard protocols such as WS-Federation, WS-Trust and the Security Assertion Markup Language (SAML) will replace today’s more rigid systems that are based on a single point of truth […]

The claims model, he said, is more flexible and based on components that can be snapped together like Lego blocks. Cameron called them Legonic Systems, which, he said, are agile and self-organizing much like service-oriented architectures. The Legonic identity system is rethinking what users know today, he said, and is defined by a set of claims one subject makes about another.

Formulations like this make it clear how fundamental the coming “identity revolution” in computing could be. The German philosopher Hans Blumenberg argued in his book The Legitimacy of the Modern Age that modern science emerged from the sterility of medieval Scholasticism precisely because of its “renunciation of exactitude.” In other words, modern science emerged by replacing the idea of “eternal truth” with that of subjective claims and methodical doubt as epitomized in Descartes.

This incorporation of uncertainty and error continued into the twentieth century with the discovery of statistical mechanics and quantum indeterminacy. Could computer science, with the discovery of digital identity, finally be leaving its own rigid Scholastic period behind as well?

Answer:  Yup.