More precision on the Right to Correlate

Dave Kearns continues to whack me for some of my terminology in discussing data correlation.  He says: 

‘In responding to my “violent agreement” post, Kim Cameron goes a long way towards beginning to define the parameters for correlating data and transactions. I'd urge all of you to jump into the discussion.

‘But – and it's a huge but – we need to be very careful of the terminology we use.

‘Kim starts: “Let’s postulate that only the parties to a transaction have the right to correlate the data in the transaction, and further, that they only have the right to correlate it with other transactions involving the same parties.” ‘

Dave's right that this was overly restrictive.  In fact I changed it within a few minutes of the initial post – but apparently not fast enough to prevent confusion.  My edited version stated:

‘Let’s postulate that only the parties to a transaction have the right to correlate the data in the transaction (unless it is fully anonymized).’

This way of putting things eliminates Dave's concern:

‘Which would mean, as I read it, that I couldn't correlate my transactions booking a plane trip, hotel and rental car since different parties were involved in all three transactions!’

That said, I want to be clear that “parties to a transaction” does NOT include what Dave calls “all corporate partners” (aka a corporate information free-for-all!)  It just means parties (for example corporations) participating directly in some transaction can correlate it with the other transacitons in which they directly participate (but not with the transactions of some other corporation unless they get approval from the transaction participants to do so). 

Dave argues:

‘In the end, it isn't the correlation that's problematic, but the use to which it's put. So let's tie up the usage in a legally binding way, and not worry so much about the tools and technology.

‘In many ways the internet makes anti-social and unethical behavior easier. That doesn't mean (as some would have it) that we need to ban internet access or technological tools. It does mean we need to better educate people about acceptable behavior and step up our policing tools to better enable us to nab the bad guys (while not inconveniencing the good guys).’

To be perfectly clear, I'm not proposing a ban on technology!  I don't do banning!  I do creation. 

So instead, I'm arguing that as we develop our new technologies we should make sure they support the “right to correlation” – and the delegation of that right – in ways that restore balance and give people a fighting chance to prevent unseen software robots from limiting their destinies.

 

Do people care about data correlation?

While I was working on the last couple of posts about data correlation, trusty old RSS brought in a  corroborating piece by Colin McKay at the Office of the Privacy Commissioner of Canada.   Many  in the industry seem to assume people will trade any of their personal information for the smallest trinkets, so more empirical work of the kind reported here seems to be essential.

‘How comfortable, exactly, are online users with their information and online browsing habits being used to track their behaviour and serve ads to them?

‘A survey of Canadian respondents, conducted by TNS Facts and reported by the Canadian Marketing Association, reports that a large number of Canadians and Americans “(69% and 67% respectively) are aware that when they are online their browsing behaviour may be captured by third parties for advertising purposes.”

‘That doesn’t mean they are comfortable with the practice. The same survey notes that “just 33 per cent of Canadians who are members of a site are comfortable with these sites using their browsing information to improve their site experience. There is no difference in support for the use of consumers’ browsing history to serve them targeted ads, be it with the general population, the privacy concerned, or members of a site.”’

If only only 33% are comfortable with using browsing information to improve site experience, I wonder how many will be comfortable with using browsing information to evaluate terminating of peoples’ credit cards (see thread on Martinism)?  Can I take a guess?  How about 1%?  (This may seem high, but I have a friend in the direct marketing world who tells me 1% of the population will believe in anything at all!)  Colin continues:

‘But how much information are users willing to consciously hand over to win access to services, prizes or additional content?

‘A survey of 1800 visitors to coolsavings.com, a coupon and rebate site owned by Q Interactive, has claimed that web visitors are willing “to receive free online services and information in exchange for the use of my data to target relevant advertising to me.”

‘Now, my impression is that visitors to sites like coolsavings.com – who are actively seeking out value and benefits online – would be predisposed to believing that online sites would be able to deliver useful content and relevant ads.

‘That said, Mediapost, who had access to details of the full Q Interactive survey, cautions that users “… continue to put the brakes on hard when asked which specific information they are willing to hand over. The survey found 77.8% willing to give zip code, 64.9% their age and 72.3% their gender, but only 22.4% said they wanted to share the Web sites they visited and only 12% and 12.1% were willing to have their online purchases or the search history respectively to be shared …” ‘

I want to underline Colin's point.  These statistics come from people who actively sought out a coupon site in order to trade information for benefits!  Even so, we are talking about a mere 12% who were willing to have their online purchases or search history shared.  This empirically nixes the notion, held by some, that people don't care about data correlation (an issue I promised to address in my last post.

Colin's conclusions seem consistent with the idea I sketched there of defining a new “right to data correlation” and requiring delegation of that right before trusted parties can correlate individuals across contexts.

‘In both the TNS Facts/CMA and Q Interactive surveys, the results seem to indicate that users are willing to make a conscious decision to share information about themselves – especially if it is with sites they trust and with whom they have an established relationship.

‘A common thread seems to be emerging: consumers see a benefit to providing specific data that will help target information relevant to their needs, but they are less certain about allowing their past behaviour to be used to make inferences about their individual preferences.

‘They may feel their past search and browsing habits might just have a greater impact on their personal and professional life than the limited re-distribution of basic personal information by sites they trust. Especially if those previous habits might be seen as indiscreet, even obscene.’

Colin's conclusion points to the need to be able to “revoke the right to data correlation” that may have been extended to third parties.  It also underlines the need for a built-in scheme for aging and deletion of correlation data.

 

Getting down with Zermatt

Zermatt is a destination in Switzerland, shown above, that benefits from what Nietzsche calls “the air at high altitudes, with which everything in animal being grows more spiritual and acquires wings”.

It's therefore a good code name for the new identity application development framework Microsoft has just released in Beta form.  We used to call it IDFX internally  – who knows what it will be called when it is released in final form? 

Zermatt is what you use to develop interoperable identity-aware applications that run on the Windows platform.  We are building the future versions of Active Directory Federation Services (ADFS) with it, and claims-aware Microsoft applications will all use it as a foundation.  All capabilities of the platform are open to third party developers and enterprise customers working in Windows environments.  Every aspect of the framework works over the wire with other products on other platforms.

 I can't stress enough how important it is to make it easy for application developers to incororate the kind of sensible and sophisticated capabilities that this framework makes available.  And everyone should understand that our intent is for this platform to interoperate fully with products and frameworks produced by other vendors and open source projects, and to help the capabilities we are developing to become universal.

I also want to make it clear that this is a beta.  The goal is to involve our developer community in driving this towards final release.  The beta also makes it easy for other vendors and projects to explore every nook and cranny of our implementation and advise us of problems or work to achieve interoperability.

I've been doing my own little project using the beta Zermatt framework and will write about the experience and share my code.  As an architect, I can tell you already how happy I am about the extent to which this framework realizes the metasystem architecture we've worked so hard to define.

The product comes with a good White Paper for Developers by Keith Brown of Pluralsight.  Here's how Zermatt's main ReadMe sets out the goals of the framework.

Building claims-aware applications

Zermatt makes it easier to build identity aware applications. In addition to providing a new claims model, it provides applications with a rich set of API’s to reason about the identity of a caller using claims.

Zermatt also provides developers with a consistent programming experience whether they choose to build their applications in ASP.NET or in WCF environments. 

ASP.NET Controls

ASP.NET controls simplify development of ASP.NET pages for building claims-aware Web applications, as well as Passive STS’s.

Building Security Token Services (STS)

Zermatt makes it substantially easier for building a custom security token service (STS) that supports the WS-Trust protocol. These STS’s are also referred to as an Active STS.

In addition, the framework also provides support for building STS’s that support WS-Federation to enable web browser clients. These STS’s are also referred to as a Passive STS.

Creating Information Cards

Zermatt includes classes that you can use to create Information Cards – as well as STS's that support them.

There are a whole bunch of samples, and for identity geeks they are incredibly interesting.  I'll discuss what they do in another post.

Follow the installation instructions!

Meanwhile, go ahead and download.  I'll share one word of advice.  If you want things to run right out of the digital box, then for now slavishly follow the installation instructions.  I'm the type of person who never really looks at the ReadMe's – and I was chastened by the experience of not doing what I was told.  I went back and behaved, and the experience was flawless, so don't make the same mistake I did.

For example, there is a master installation script in the /samples/utilities directory called “SamplesPreReqSetup.bat”. This is a miraculous piece of work that sets up your machine certs automatically and takes care of a great number of security configuration details.  I know it's miraculous because initially (having skipped the readme) I thought I had to do this configuration manually.  Congratulations to everyone who got this to work.

You will also find a script in each sample directory that creates the necessary virtual directory for you.  You need this because of the way you are expected to use the visual studio debugger.

Using the debugger

In order to show how the framework really works, the projects all involve at least a couple of aspx pages (for example, one page that acts as a relying party, and another that acts as an STS).  So you need the ability to debug multiple pages at once.

To do this, you run the pages from a virtual directory as though they were “production” aspx pages.  Then you attach your debugger to the w3wp.exe process (under debug, select “Attach to a process” and make sure you can see all the processes from all the sessions.  “Wake up” the w3wp.exe process by opening a page.  Then you'll see it in the list). 

For now it's best to compile the applications in the directory where they get installed.  It's possible that if you move the whole tree, they can be put somewhere else (I haven't tried this with my own hands).  But if you move a single project, it definitely won't work unless you tweak the virtual directory configuration yourself (why bother?).

Clear samples

I found the samples very clear, and uncluttered with a lot of “sample decoration” that makes it hard to understand the main high level points.  Some of the samples have a number of components working together – the delegation sample is totally amazing – and yet it is easy, once you run the sample, to understand how the pieces fit together.  There could be more documentation and this will appear as the beta progresses. 

The Zermatt team is really serious about collecting questions, feedback and suggestions – and responding to them.  I hope that if you are a developer interested in identity you'll take a look and send your feedback – whether you are primarily a Windows developer or not.  After all, our goal remains the Identity Big Bang, and getting identity deployed and cool applications written on all the different platforms. 

Delegation tokens and impersonation

I've been asked to clarify a couple of points by Devlin Daley and Bryant Cutler, who are studying with Phil Windley.

Delegation tokens 

Delegation tokens, as you've described them, (according to one of Dale Old's recent posts) are not yet implemented in CardSpace.  Is that accurate? Is it soon to be added to specification or is it still a work in progress?

I like Dale's piece, but think the “not yet implemented” statement might lead to confusion. 

One of the key characteristics of CardSpace is that it has no idea what kind(s) of token it is carrying.  It's hard to get this across – the practical meaning isn't obvious.  But your question about “delegation tokens” provides  a good concrete example:  delegation coupons can be conveyed through CardSpace without any changes or extensions to it.  This doesn't mean anyone is doing so yet.  That is likely what Dale is talking about. 

I've actually been thinking of putting together some demo code to show how this would work.  If you look at my “HelloWorld Card” tutorial,  you will see that rather than requesting and sending a “HelloWorld Card”, the relying party could easily be requesting a delegation coupon.  So CardSpace is actually ready for “delegation coupons”.

One can then ask what a delegation coupon would look like in concrete terms.  What's the best format for the (possibly multiple) constituent tokens?  The blogosphere discussion about delegation shows lots of people are thinking about this, but so far we haven't built the “early implementations” that let us explore the issues and problems concretely enough to emerge with a new standard.  I would be interested in learning about research systems built in the academic community to explore this territory – perhaps you can share your research with us.

Impersonation

Devin and Bryant continue:

We've been bantering about the idea of delegation vs. impersonation. Clearly impersonating someone without them knowing is wrong and a serious problem. But, is impersonation “bad” if I give my express permission for someone to do so? (assuming there is a mechanism for revoking this permission).

In your Powell's and Amazon example, what if I don't want Powell's to know that I am supplying this information to Amazon? Obviously there are cases where we want to let others know that services are acting with our permission. Perhaps there are cases where we don't want to disclose that. Is granting the choice to me more user-centric?

You are quite right that, as per the first law of identity, the choice of what to disclose must always be in the hands of the user.  Further, if a user wants to delegate to a machine the ability to “be her”, that should be possible too.  Let's call it extreme delegation.  Our job is not to tell anyone that they should live in some particular way.  We might, however, have the responsibility of pointing out the technical dangers of this extreme, perhaps even recommending some interesting science fiction readings…

But I'll point out that it isn't necessary to do impersonation to achieve the goal you want to achieve in your example – preventing Powell's from knowing that you are supplying information to Amazon.  In fact there are two ways to use delegation to do this. 

The first is simply to create a coupon saying, “the holder of this key has the right to see my Powell's behavior”.  Then you give Amazon the coupon and the key.  In return, Amazon might give you assurances about how it will protect the coupon.  Meanwhile, it can retrieve the information it wants without revealing its identity.

Or you may wish to have an agent of your own to which you delegate the ability to assemble your behaviors, and the right to pass them on according to your dictates.  I personally think this is the most likely option since it provides optimal user control.  But even in this case, designing secure systems means limiting the capabilities delegated to that particular piece of software, rather than “making it into you” by having it operate in your identity.  There is zero need for impersonation.

Your use case of information hiding can be handled without departing from my delegation maxim:

No one and no service should ever act in a peron’s identity or employ their credentials when they’re not present.  Ever.  

Putting several threads together, the user should act through a transducer to delegate to well-identified processes.