User centric – Kim Cameron's Identity Weblog

The Clay Feet of Giants?

Over at Craig Burton, the marketing guru who put Netware on the map and later formed the Burton Group with Jamie Lewis lets loose with a passionate fury that couldn't care less about who has deployed what:

It’s been a week since Microsoft announced that it was never going to release the next version of CardSpace. The laughable part of the announcement is the title “Beyond Windows CardSpace” which would leave you to believe that Microsoft has somehow come up with a better architecture.

In fact Microsoft announced its discontinued development of CardSpace with absolutely no alternative.

Just further evidence of just how irrelevant Microsoft has become.

The news that Microsoft had abandoned CardSpace development is not news to those of us who watch this space, Microsoft hasn’t done Jack with CardSpace for over two years.

It’s just that for some reason Microsoft PR decided to announce the matter. Probably so the U-Prove group could get more press.

Well, that's a bit harsh. Identity selectors like CardSpace only make sense in the context of the other components of the Identity Metasystem – and Microsoft has done a lot over the last two years to deliver those components to customers who are doing successful deployments on a massive scale all over the world. I don't think that's irrelevant, Craig.

Beyond that, I think Craig should look more closely at what the U-Prove agent actually does (I'll help by putting up a video). As I said here, the U-Prove agent doesn't do what CardSpace did. And the problems CardSpace addressed DO remain tremendously important. But while more tightly scoped, for the crucial scenario of sensitive claims that are privacy protected the U-Prove agent does go beyond CardSpace. Further, protecting privacy within the Identity Metasystem will turn out, historically, to be absolutely relevant. So let's not hit on U-Prove.

Instead, let's tune in to Craig's “Little History” of the Identity Metasystem:

In early 2006, Kim Cameron rolled out the Laws of Identity in his blog. Over next few months as he rolled out each law, the impact of this powerful vision culminating in the release of the CardSpace architecture and Microsoft’s licensing policy rocked the identity community.

Two years earlier Microsoft was handed its head when it tried to shove the Passport identity initiative down our throats.

Kim Cameron turned around and proposed and delivered an Identity Metasystem—based on CardSpace—that has no peer. Thus the Identity Metasystem is the industry initiative to create open selector-based digital identity framework. CardSpace is Microsoft’s instantiation of that Metasystem. The Pamela Project, XMLDAP, Higgins Project, the Bandit Project, and openinfocard are all instantiations in various stages of single and multiple vendor versions of the Identity Metasystem.

Let me clear. The Identity Metasystem has no peer.

Anything less than a open identity selector system for claims-based digital identity is simply a step backwards from the Identity Metasystem.

Thus SAML, OpenID, OAuth, Facebook Connect and so on are useful, but are giant steps back in time and design when compared to the Identity Metasystem.

I agree that the Identity Metasystem is as important as Craig describes it, and that to reach its potential it MUST have user agents. I further agree that the identity selector is the key component for making the system user centric. But I also think adoption is, ah, essential… We need to work out a kink or two or three. This is a hard problem and what we've done so far hasn't worked.

Be this as it may, back at Craig's site he marches on in rare form, dissecting Vendor Speak as he goes. Mustering more than a few thrusts and parries (I have elided the juicier ones), he concludes:

This means there is an opening for someone or some group with a bit of vision and leadership to take up the task…

But mark my words, we WILL have a selector-based identity layer for the Internet in the future. All Internet devices will have a selector or a selector proxy for digital identity purposes.

I'm glad to finally see this reference to actual adoption, and now am just waiting for more discussion about how we could actually evolve our proposals to get this to happen.

Bizzare customer journey at myPay…

Internet security is a sitting duck that could easily succumb to a number of bleak possible futures.

One prediction we can make with certainty is that as the overall safety of the net continues to erode, individual web sites will flail around looking for ways to protect themselves. They will come across novel ideas that seem to make sense from the vantage point of a single web site. Yet if they implement these ideas, most of them will backfire. Internet users have to navigate many different sites on an irregular basis. For them, the experience of disparate mechanisms and paradigms on every different site will be even more confusing and troubling than the current degenerating landscape. The Seventh Law of Identity is animated by these very concerns.

I know from earlier exchanges that Michael Ramirez understands these issues – as well as their architectural implications. So I can just imagine how he felt when he first encountered a new system that seems to represent an unfortunately great example of this dynamic. His first post on the matter started this way:

“Logging into the DFAS myPay site is frustrating. This is the gateway where DoD employees can view and change their financial data and records.

“In an attempt secure the interface (namely to prevent key loggers), they have implemented a javascript-based keyboard where the user must enter their PIN using their mouse (or using the keyboard pressing tab LOTS of times).

“A randomization function is used to change the position of the buttons, presumably to prevent a simple click-tracking virus from simply replaying the click sequence. Numbers always appear on the upper row and the letters will appear in a random position on the same row where they exist on the keyboard (e.g. QWERTY letters will always appear on the top row, just in a random order).

“At first glance, I assumed that there would be some server-side state that identified the position of the buttons (as to not allow the user's browser to arbitrarily choose the positions). Looking at how the button layout is generated, however, makes it clear that the position is indeed generated by the client-side alone. Javascript functions are called to randomize the locations, and the locations of these buttons are included as part of the POST parameters upon authentication.

“A visOrder variable is included with a simple substitution cipher to identify button locations: 0 is represented by position 0, 1 by position 1, etc. Thus:

VisOrder =3601827594

Substitution =0123456789

Example PIN =325476

Encoded =102867

“Thus any virus/program can easily mount an online guessing attack (since it defines the substitution pattern), and can quickly decipher the PIN if it has access to the POST parameters.

“The web site's security implementation is painfully trivial, so we can conclude that the Javascript keyboard is only to prevent keyloggers. But it has a number of side effects, especially with respect to the security of the password. Given the tedious nature of PIN entry, users choose extremely simplistic passwords. MyPay actually encourages this as it does not enforce complexity requirements and limits the length of the password between 4 and 8 characters. There is no support for upper/lower case or special characters. 36 possible values over an 4-character search space is not terribly secure.”

A few days later, Michael was back with an even stranger report. In fact this particular “user journey” verges on the bizarre. Michael writes:

“MyPay recently overhauled their interface and made it more “secure.” I have my doubts, but they certainly have changed how they interact with the user.

“I was a bit speechless. Pleading with users is new, but maybe it'll work for them. Apparently it'll be the only thing working for them:

Although most users have established their new login credentials with no trouble, some users are calling the Central Customer Support Unit for assistance. As a result, customer support is experiencing high call volume, and many customers are waiting on hold longer than usual.

We apologize for any inconvenience this may cause. We are doing everything possible to remedy this situation.

Michael concludes by making it clear he thinks “more than a few” users may have had trouble. He says, “Maybe, just maybe, it's because of your continued use of the ridiculous virtual keyboard. Yes, you've increased the password complexity requirements (which actually increased security), but slaughtered what little usability you had. I promise you that getting rid of it will ‘remedy this situation.'”

One might just shrug one's shoulders and wait for this to pass. But I can't do that. I feel compelled to redouble our efforts to produce and adopt a common standards-based approach to authentication that will work securely and in a consistent way across different web sites and environments. In other words, reusable identities, the claims-based architecture, and truly usable and intuitive visual interfaces.

Identity Roadmap Presentation at PDC09

Earlier this week I presented the Identity Keynote at the Microsoft Professional Developers Conference (PDC) in LA. The slide deck is here, and the video is here.

After announcing the release of the Windows Identity Foundation (WIF) as an Extension to .NET, I brought forward three architect/engineers to discuss how claims had helped them solve their development problems. I chose these particular guests because I wanted the developer audience to be able to benefit from the insights they had previously shared with me about the advantages – and challenges – of adopting the claims based model. Each guest talks about the approach he took and the lessons learned.

Andrew Bybee, Principal Program Manager from Microsoft Dynamics CRM, talked about the role of identity in delivering the “the Power of Choice” – the ability for his customers to run his software wherever they want, on premises or in the cloud or in combination, and to offer access to anyone they choose.

Venky Veeraraghavan, the Program Manager in charge of identity for SharePoint, talks about what it was like to completely rethink the way identity works in Sharepoint so it takes advantage of the claims based architecture to solve problems that previously had been impossibly difficult. He explores the problems of “Multi-hop” systems and web farms, especially the “Dreaded Second Hop” – which he admits “really, really scares us…” I find his explanation riveting and think any developer of large scale systems will agree.

Dmitry Sotnikov, who is Manager of New Product Research at Quest Software, presents a remarkable Azure-based version of a product Quest has previously offered only “on premise”. The service is a backup system for Active Directory, and involved solving a whole set of hard identity problems involving devices and data as well as people.

Later in the presentation, while discussing future directions, I announce the Community Technical Preview of our new work on REST-based authorization (a profile of OAuth), and then show the prototype of the mutli-protocol identity selector Mike Jones unveiled at the recent IIW. And finally, I talk for the first time about “System.Identity”, work on user-centric next generation directory that I wanted to take to the community for feedback. I'll be blogging about this a lot and hopefully others from the blogosphere will find time to discuss it with me.

Green Dam and the First Law of Identity

China Daily posted this opinion piece by Chen Weihua that provides context on how the Green Dam proposal could ever have emerged. I found it striking because it brings to the fore the relationship of the initiative to the First Law of Identity (User Control). As in so many cases where the Laws are broken, the result is passionate opposition and muddled technology.

The Ministry of Industry and Information Technology's latest regulation to preinstall filtering software on all new computers by July 1 has triggered public concern, anger and protest.

A survey on Sina.com, the largest news portal in China, showed that an overwhelming 83 percent of the 26,232 people polled said they would not use the software, known as Green Dam. Only 10 percent were in favor.

Despite the official claim that the software was designed to filter pornography and unhealthy content on the Internet, many people, including some computer experts, have disputed its effectiveness and are worried about its possible infringement on privacy, its potential to disrupt the operating system and other software, and the waste of $6.1 million of public fund on the project.

These are all legitimate concerns. But behind the whole story, one pivotal question to be raised is whether we believe people should have the right to make their own choice on such an issue, or the authorities, or someone else, should have the power to make such a decision.

Compared with 30 years ago, the country has achieved a lot in individual freedom by giving people the right to make their own decisions regarding their personal lives.

Under the planned economy three decades ago, the government decided the prices of all goods. Today, the market decides 99 percent of the prices based on supply and demand.

Three decades ago, the government even decided what sort of shirts and trousers were proper for its people. Flared trousers, for example, were banned. Today, our streets look like a colorful stage.

Till six years ago, people still needed an approval letter from their employers to get married or divorced. However bizarre it may sound to the people today, the policy had ruled the nation for decades.

The divorce process then could be absurdly long. Representatives from trade union, women's federation and neighborhood committee would all come and try to convince you that divorce is a bad idea – bad for the couple, bad for their children and bad for society.

It could be years or even decades before the divorce was finally approved. Today, it only takes 15 minutes for a couple to go through the formalities to tie or untie the knot at local civil affair bureaus.

Less than three decades ago, the rigid hukou (permanent residence permit) system didn't allow people to work in another city. Even husbands and wives with hukou in different cities had to work and live in separate places. Today, over 200 million migrant workers are on the move, although hukou is still a constraint.

Less than 20 years ago, doctors were mandated to report women who had abortions to their employers. Today, they respect a woman's choice and privacy.

No doubt we have witnessed a sea of change, with more and more people making their own social and economic decisions .

The government, though still wielding huge decision-making power, has also started to consult people on some decisions by hosting public hearings, such as the recent one on tap water pricing in Shanghai.

But clearly, some government department and officials are still used to the old practice of deciding for the people without seeking their consent.

In the Green Dam case, buyers, mostly adults, should be given the complete freedom to decide whether they want the filtering software to be installed in their computers or not.

Respect for an individual's right to choice is an important indicator of a free society, depriving them of which is gross transgression.

Let's not allow the Green Dam software to block our way into the future.

The many indications that the technology behind Green Dam weakens the security fabric of China indicates Chen Weihua is right in more ways than one.

Just for completeness, I should point out that the initiative also breaks the Third Law (Justifiable Parties) if adults have not consciously enabled the software and chosen to have the government participate in their browsing.

Green Dam goes in all the wrong directions

The Chinese Government's Green Dam sets an important precedent: government trying to achieve its purposes by taking control over the technology installed on peoples’ personal computers. Here's how the Chinese Government's explained its initiative:

‘In order to create a green, healthy, and harmonious internet environment, to avoid exposing youth to the harmful effects of bad information, The Ministry of Information Industry, The Central Spiritual Civilization Office, and The Commerce Ministry, in accordance with the requirements of “The Government Purchasing Law,” are using central funds to purchase rights to “Green Dam Flower Season Escort”(Henceforth “Green Dam”) … for one year along with associated services, which will be freely provided to the public.

‘The software is for general use and testing. The software can effectively filter improper language and images and is prepared for use by computer factories.

‘In order to improve the government’s ability to deal with Web content of low moral character, and preserve the healthy development of children, the regulation and demands pertaining to the software are as follows:

Computers produced and sold in China must have the latest version of “Green Dam” pre-installed, imported computers should have the latest version of the software installed prior to sale.
The software should be installed on computer hard drives and available discs for subsequent restoration
The providers of “Green Dam” have to provide support to computer manufacturers to facilitate installation
Computer manufacturers must complete installation and testing prior to the end of June. As of July 1, all computers should have “Green Dam” pre-installed.
Every month computer manufacturers and the provider of Green Dam should give MII data on monthly sales and the pre-installation of the software. By February 2010, an annual report should be submitted.’

What does the software do? According to OpenNet Initiative:

Green Dam exerts unprecedented control over users’ computing experience: The version of the Green Dam software that we tested, when operating under its default settings, is far more intrusive than any other content control software we have reviewed. Not only does it block access to a wide range of web sites based on keywords and image processing, including porn, gaming, gay content, religious sites and political themes, it actively monitors individual computer behavior, such that a wide range of programs including word processing and email can be suddenly terminated if content algorithm detects inappropriate speech [my emphasis – Kim]. The program installs components deep into the kernel of the computer operating system in order to enable this application layer monitoring. The operation of the software is highly unpredictable and disrupts computer activity far beyond the blocking of websites.

The functionality of Green Dam goes far beyond that which is needed to protect children online and subjects users to security risks: The deeply intrusive nature of the software opens up several possibilities for use other than filtering material harmful to minors. With minor changes introduced through the auto-update feature, the architecture could be used for monitoring personal communications and Internet browsing behavior. Log files are currently recorded locally on the machine, including events and keywords that trigger filtering. The auto-update feature can used to change the scope and targeting of filtering without any notification to users.

How is it being received? Wikipedia says:

Online polls conducted by leading Chinese web portals revealed poor acceptance of the software by netizens. On Sina and Netease, over 80% of poll participants said they would not consider or were not interested in using the software; on Tencent, over 70% of poll participants said it was unnecessary for new computers to be preloaded with filtering software; on Sohu, over 70% of poll participants said filtering software would not effectively prevent minors from browsing inappropriate websites. A poll conducted by the Southern Metropolis Daily showed similar results.

In addition, the software is a virus transmission system. Researchers from the University of Michigan concluded:

We have discovered remotely-exploitable vulnerabilities in Green Dam, the censorship software reportedly mandated by the Chinese government. Any web site a Green Dam user visits can take control of the PC [my emphasis – Kim].

We examined the Green Dam software and found that it contains serious security vulnerabilities due to programming errors. Once Green Dam is installed, any web site the user visits can exploit these problems to take control of the computer. This could allow malicious sites to steal private data, send spam, or enlist the computer in a botnet. In addition, we found vulnerabilities in the way Green Dam processes blacklist updates that could allow the software makers or others to install malicious code during the update process.

We found these problems with less than 12 hours of testing, and we believe they may be only the tip of the iceberg. Green Dam makes frequent use of unsafe and outdated programming practices that likely introduce numerous other vulnerabilities. Correcting these problems will require extensive changes to the software and careful retesting. In the meantime, we recommend that users protect themselves by uninstalling Green Dam immediately.

There is no doubt that government has a legitimate interest in the safety of the Internet, and in the safety of our children. But neither goal can be achieved with any of the unfortunate methods being used here.

Rather than so-called “blacklisting”, the alternative is to construct virtual networks that are dramatically safer for children than the Internet as a whole. As such virtual networks emerge, technology can be created allowing parents to limit the access of their young children to those networks.

It's a big job to build such “green zones”. But government is the strong force that could serve as a catalyst in bringing this about. The key would be to organize virtual districts and environments that would be fun and safe for children, so children want to play in them.

This kind of virtual world doesn't require the generalized banning of sites or ideas or prurient thoughts – or require government to “improve” the nature of human beings.

Definitions for a Common Identity Framework

The Proposal for a Common Identity Framework begins by explaining the termnology it uses. This wasn't intended to open up old wounds or provoke ontological debate. We just wanted to reduce ambiguity about what we actually mean to say in the rest of the paper. To do this, we did think very carefully about what we were going to call things, and tried to be very precise about our use of terms.

The paper presents its definitions in alphabetical order to faciliate lookup while reading the proposal, but I'll group them differently here to facilitate discussion.

Let's start with the series of definitions pertaining to claims. It is key to the document that claims are assertions by one subject about another subject that are “in doubt”. This is a fundamental notion since it leads to an understanding that one of the basic services of a multi-party model must be “Claims Approval”. The simple assumption by systems that assertions are true – in other words the failure to factor out “approval” as a separate service – has lead to conflation and insularity in earlier systems.

Claim: an assertion made by one subject about itself or another subject that a relying party considers to be “in doubt” until it passes “Claims Approval”
Claims Approval: The process of evaluating a set of claims associated with a security presentation to produce claims trusted in a specific environment so it can used for automated decision making and/or mapped to an application specific identifier.
Claims Selector: A software component that gives the user control over the production and release of sets of claims issued by claims providers.
Security Token: A set of claims.

The concept of claims provider is presented in relation to “registration” of subjects. Then claims are divided into two broad categories: primordial and substantive…

Registration: The process through which a primordial claim is associated with a subject so that a claims provider can subsequently issue a set of claims about that subject.
Claims Provider: An individual, organization or service that:

Registers subjects and associates them with primordial claims, with the goal of subsequently exchanging their primordial claims for a set of substantive claims about the subject that can be presented at a relying party; or
Interprets one set of substantive claims and produces a second set (this specialization of a claims provider is called a claims transformer). A claims set produced by a claims provider is not a primordial claim.

Claims Transformer: A claims provider that produces one set of substantive claims from another set.

To understand this better let's look at what we mean by “primordial” and “substantive” claims. The word “primordial” may seem a strange at first, but its use will be seen to be rewardingly precise: Constituting the beginning or starting point, from which something else is derived or developed, or on which something else depends. (OED) .

As will become clear, the claims-based model works through the use of “Claims Providers”. In the most basic case, subjects prove to a claims provider that they are an entity it has registered, and then the claims provider makes “substantive” claims about them. The subject proves that it is the registered entity by using a “primordial” claim – one which is thus the beginning or starting point, and from which the provider's substantive claims are derived. So our definitions are the following:

Primordial Claim: A proof – based on secret(s) and/or biometrics – that only a single subject is able to present to a specific claims provider for the purpose of being recognized and obtaining a set of substantive claims.
Substantive claim: A claim produced by a claims provider – as opposed to a primordial claim.

Passwords and secret keys are therefore examples of “primordial” claims, whereas SAML tokens and X.509 certificates (with DNs and the like) are examples of substantive claims.

Some will say, “Why don't you just use the word ‘credential'”? The answer is simple. We avoided “credential” precisely because people use it to mean both the primordial claim (e.g. a secret key) and the substantive claim (e.g. a certificate or signed statement). This conflation makes it unsuitable for expressing the distinction between primordial and substantive, and this distinction is essential to properly factoring the services in the model.

There are a number of definitions pertaining to subjects, persons and identity itself:

Identity: The fact of being what a person or a thing is, and the characteristics determining this.

This definition of identity is quite different from the definition that conflates identity and “identifier” (e.g. kim@foo.bar being called an identity). Without clearing up this confusion, nothing can be understood. Claims are the way of communicating what a person or thing is – different from being that person or thing. An identifier is one possible claim content.

We also distinguish between a “natural person”, a “person”, and a “persona”, taking into account input from the legal and policy community:

Natural person: A human being…
Person: an entity recognized by the legal system. In the context of eID, a person who can be digitally identified.
Persona: A character deliberately assumed by a natural person

A “subject” is much broader, including things like services:

Subject: The consumer of a digital service (a digital representation of a natural or juristic person, persona, group, organization, software service or device) described through claims.

And what about user?

User: a natural person who is represented by a subject.

The entities that depend on identity are called relying parties:

Relying party: An individual, organization or service that depends on claims issued by a claims provider about a subject to control access to and personalization of a service.
Service: A digital entity comprising software, hardware and/or communications channels that interacts with subjects.

Concrete services that interact with subjects (e.g. digital entities) are not to be confused with the abstract services that constitute our model:

Abstract services: Architectural components that deliver useful services and can be described through high level goals, structures and behaviors. In practice, these abstract services are refined into concrete service definitions and instantiations.

Concrete digital services, including both relying parties and claims providers, operate on the behalf of some “person” (in the sense used here of legal persons including organizations). This implies operations and administration:

Administrative authority: An organization responsible for the management of an administrative domain.
Administrative domain: A boundary for the management of all business and technical aspects related to:

A claims provider;
A relying party; or
A relying party that serves as its own claims provider

There are several definitions that are necessary to understand how different pieces of the model fit together:

ID-data base: A collection of application specific identifiers used with automatic claims approval
Application Specific Identifier (ASID): An identifier that is used in an application to link a specific subject to data in the application.
Security presentation: A set consisting of elements like knowledge of secrets, possession of security devices or aspects of administration which are associated with automated claims approval. These elements derive from technical policy and legal contracts of a chain of administrative domains.
Technical Policy: A set of technical parameters constraining the behavior of a digital service and limited to the present tense.

And finally, there is the definition of what we mean by user-centric. Several colleagues have pointed out that the word “user-centric” has been used recently to justify all kinds of schemes that usurp the autonomy of the user. So we want to be very precise about what we mean in this paper:

User-centric: Structured so as to allow users to conceptualize, enumerate and control their relationships with other parties, including the flow of information.

Proposal for a Common Identity Framework

Today I am posting a new paper called, Proposal for a Common Identity Framework: A User-Centric Identity Metasystem.

Good news: it doesn’t propose a new protocol!

Instead, it attempts to crisply articulate the requirements in creating a privacy-protecting identity layer for the Internet, and sets out a formal model for such a layer, defined through the set of services the layer must provide.

The paper is the outcome of a year-long collaboration between Dr. Kai Rannenberg, Dr. Reinhard Posch and myself. We were introduced by Dr. Jacques Bus, Head of Unit Trust and Security in ICT Research at the European Commission.

Each of us brought our different cultures, concerns, backgrounds and experiences to the project and we occasionally struggled to understand how our different slices of reality fit together. But it was in those very areas that we ended up with some of the most interesting results.

Kai holds the T-Mobile Chair for Mobile Business and Multilateral Security at Goethe University Frankfurt. He coordinates the EU research projects FIDIS (Future of Identity in the Information Society), a multidisciplinary endeavor of 24 leading institutions from research, government, and industry, and PICOS (Privacy and Identity Management for Community Services). He also is Convener of the ISO/IEC Identity Management and Privacy Technology working group (JTC 1/SC 27/WG 5) and Chair of the IFIP Technical Committee 11 “Security and Privacy Protection in Information Processing Systems”.

Reinhard taught Information Technology at Graz University beginning in the mid 1970’s, and was Scientific Director of the Austrian Secure Information Technology Center starting in 1999. He has been federal CIO for the Austrian government since 2001, and was elected chair of the management board of ENISA (The European Network and Information Security Agency) in 2007.

I invite you to look at our paper. It aims at combining the ideas set out in the Laws of Identity and related papers, extended discussions and blog posts from the open identity community, the formal principles of Information Protection that have evolved in Europe, research on Privacy Enhancing Technologies (PETs), outputs from key working groups and academic conferences, and deep experience with EU government digital identity initiatives.

Our work is included in The Future of Identity in the Information Society – a report on research carried out in a number of different EU states on topics like the identification of citizens, ID cards, and Virtual Identities, with an accent on privacy, mobility, interoperability, profiling, forensics, and identity related crime.

I’ll be taking up the ideas in our paper in a number of blog posts going forward. My hope is that readers will find the model useful in advancing the way they think about the architecture of their identity systems. I’ll be extremely interested in feedback, as will Reinhard and Kai, who I hope will feel free to join into the conversation as voices independent from my own.

More precision on the Right to Correlate

Dave Kearns continues to whack me for some of my terminology in discussing data correlation. He says:

‘In responding to my “violent agreement” post, Kim Cameron goes a long way towards beginning to define the parameters for correlating data and transactions. I'd urge all of you to jump into the discussion.

‘But – and it's a huge but – we need to be very careful of the terminology we use.

‘Kim starts: “Let’s postulate that only the parties to a transaction have the right to correlate the data in the transaction, and further, that they only have the right to correlate it with other transactions involving the same parties.” ‘

Dave's right that this was overly restrictive. In fact I changed it within a few minutes of the initial post – but apparently not fast enough to prevent confusion. My edited version stated:

‘Let’s postulate that only the parties to a transaction have the right to correlate the data in the transaction (unless it is fully anonymized).’

This way of putting things eliminates Dave's concern:

‘Which would mean, as I read it, that I couldn't correlate my transactions booking a plane trip, hotel and rental car since different parties were involved in all three transactions!’

That said, I want to be clear that “parties to a transaction” does NOT include what Dave calls “all corporate partners” (aka a corporate information free-for-all!) It just means parties (for example corporations) participating directly in some transaction can correlate it with the other transacitons in which they directly participate (but not with the transactions of some other corporation unless they get approval from the transaction participants to do so).

Dave argues:

‘In the end, it isn't the correlation that's problematic, but the use to which it's put. So let's tie up the usage in a legally binding way, and not worry so much about the tools and technology.

‘In many ways the internet makes anti-social and unethical behavior easier. That doesn't mean (as some would have it) that we need to ban internet access or technological tools. It does mean we need to better educate people about acceptable behavior and step up our policing tools to better enable us to nab the bad guys (while not inconveniencing the good guys).’

To be perfectly clear, I'm not proposing a ban on technology! I don't do banning! I do creation.

So instead, I'm arguing that as we develop our new technologies we should make sure they support the “right to correlation” – and the delegation of that right – in ways that restore balance and give people a fighting chance to prevent unseen software robots from limiting their destinies.

Do people care about data correlation?

While I was working on the last couple of posts about data correlation, trusty old RSS brought in a corroborating piece by Colin McKay at the Office of the Privacy Commissioner of Canada. Many in the industry seem to assume people will trade any of their personal information for the smallest trinkets, so more empirical work of the kind reported here seems to be essential.

‘How comfortable, exactly, are online users with their information and online browsing habits being used to track their behaviour and serve ads to them?

‘A survey of Canadian respondents, conducted by TNS Facts and reported by the Canadian Marketing Association, reports that a large number of Canadians and Americans “(69% and 67% respectively) are aware that when they are online their browsing behaviour may be captured by third parties for advertising purposes.”

‘That doesn’t mean they are comfortable with the practice. The same survey notes that “just 33 per cent of Canadians who are members of a site are comfortable with these sites using their browsing information to improve their site experience. There is no difference in support for the use of consumers’ browsing history to serve them targeted ads, be it with the general population, the privacy concerned, or members of a site.”’

If only only 33% are comfortable with using browsing information to improve site experience, I wonder how many will be comfortable with using browsing information to evaluate terminating of peoples’ credit cards (see thread on Martinism)? Can I take a guess? How about 1%? (This may seem high, but I have a friend in the direct marketing world who tells me 1% of the population will believe in anything at all!) Colin continues:

‘But how much information are users willing to consciously hand over to win access to services, prizes or additional content?

‘A survey of 1800 visitors to coolsavings.com, a coupon and rebate site owned by Q Interactive, has claimed that web visitors are willing “to receive free online services and information in exchange for the use of my data to target relevant advertising to me.”

‘Now, my impression is that visitors to sites like coolsavings.com – who are actively seeking out value and benefits online – would be predisposed to believing that online sites would be able to deliver useful content and relevant ads.

‘That said, Mediapost, who had access to details of the full Q Interactive survey, cautions that users “… continue to put the brakes on hard when asked which specific information they are willing to hand over. The survey found 77.8% willing to give zip code, 64.9% their age and 72.3% their gender, but only 22.4% said they wanted to share the Web sites they visited and only 12% and 12.1% were willing to have their online purchases or the search history respectively to be shared …” ‘

I want to underline Colin's point. These statistics come from people who actively sought out a coupon site in order to trade information for benefits! Even so, we are talking about a mere 12% who were willing to have their online purchases or search history shared. This empirically nixes the notion, held by some, that people don't care about data correlation (an issue I promised to address in my last post.

Colin's conclusions seem consistent with the idea I sketched there of defining a new “right to data correlation” and requiring delegation of that right before trusted parties can correlate individuals across contexts.

‘In both the TNS Facts/CMA and Q Interactive surveys, the results seem to indicate that users are willing to make a conscious decision to share information about themselves – especially if it is with sites they trust and with whom they have an established relationship.

‘A common thread seems to be emerging: consumers see a benefit to providing specific data that will help target information relevant to their needs, but they are less certain about allowing their past behaviour to be used to make inferences about their individual preferences.

‘They may feel their past search and browsing habits might just have a greater impact on their personal and professional life than the limited re-distribution of basic personal information by sites they trust. Especially if those previous habits might be seen as indiscreet, even obscene.’

Colin's conclusion points to the need to be able to “revoke the right to data correlation” that may have been extended to third parties. It also underlines the need for a built-in scheme for aging and deletion of correlation data.

The Right To Correlate

Dave Kearns’ comment in Another Violent Agreement convinces me I've got to apply the scalpel to the way I talk about correlation handles. Dave writes:

‘I took Kim at his word when he talked “about the need to prevent correlation handles and assembly of information across contexts…” That does sound like “banning the tools.”

‘So I'm pleased to say I agree with his clarification of today:

;”I agree that we must influence behaviors as well as develop tools… [but] there’s a huge gap between the kind of data correlation done at a person’s request as part of a relationship (VRM), and the data correlation I described in my post that is done without a person’s consent or knowledge.” (Emphasis added by Dave)’

Thinking about this some more, it seems we might be able to use a delegation paradigm.

The “right to correlate”

Let's postulate that only the parties to a transaction have the right to correlate the data in the transaction (unless it is fully anonymized).

Then it would follow that any two parties with whom an individual interacts would not by default have the right to correlate data they had each collected in their separate transactions.

On the other hand, the individual would have the right to organize and correlate her own data across all the parties with whom she interacts since she was party to all the transactions.

Delegating the Right to Correlate

If we introduce the ability to delegate, then an individual could delegate her right for two parties to correlate relevant data about her. For example, I could delegate to Alaska Airlines and British Airways the right to share information about me.

Similarly, if I were an optimistic person, I could opt to use a service like that envisaged by Dave Kearns, which “can discern our real desires from our passing whims and organize our quest for knowledge, experience and – yes – material things in ways which we can only dream about now.” The point here is that we would delegate the right to correlate to this service operating on our behalf.

Revoking the Right to Correlate

A key aspect of delegating a right is the ability to revoke that delegation. In other words, if the service to which I had given some set of rights became annoying or odious, I would need to be able terminate its right to correlate. Importantly, the right applies to correlation itself. Thus when the right is revoked, the data must no longer be linkable in any way.

Forensics

There are cases where criminal activity is being investigated or proven where it is necessary for law enforcement to be able to correlate without the consent of the individual. This is already the case in western society and it seems likely that new mechanisms would not be required in a world resepcting the Right to Correlate.

Defining contexts

Respecting the Right to Correlate would not by itself solve the Canadian Tire Problem that started this thread. The thing that made the Canadian Tire human experiments most odious is that they correlated buying habits at the level of individual purchases (our relations to Canadian Tire as a store) with probable behavior in paying off credit cards (Canadian Tire as a credit card issuer). Paradoxically, someone's loyalty to the store could actually be used to deny her credit. People who get Canadian Tire credit cards do know that the company is in a position to correlate all this information, but are unlikely to predict this counter-intuitive outcome.

Those of us prefering mainstream credit card companies presumably don't have the same issues at this point in time. They know where we buy but not what we buy (although there may be data sharing relationships with merchants that I am not aware of… Let me know…).

So we have come to the the most important long-term problem: The Internet changes the rules of the game by making data correlation so very easy.

It potentially turns every credit card company into a data-correlating Canadian Tire. Are we looking at the Canadian Tirization of the Internet?

But do people care?

Some will say that none of this matters because people just don't care about what is correlated. I'll discuss that briefly in my next post.