Beyond maximal disclosure tokens

I concluded my last piece on linkability and identity technology by explaining that the probability of collusions between Relying Parties (RPs)  CAN be greatly reduced by using SAML tokens rather than X.509 certificates.  To provide an example of why this is so, I printed out the content of one of the X.509 certificates I use at work, and here's what it contained:

Version V3
Serial Number 13 9b 3c fc 00 03 00 19 c6 e2
Signature Algorithm sha1RSA
Issuer CN = IDA Enterprise CA 1
Valid From Friday, February 23, 2007 8:15:27 PM
Valid To Saturday, February 23, 2008 8:15:27 PM
Subject CN = Kim Cameron
OU = Users
DC = IDA
DC = Microsoft
DC = com
Public Key 25 15 e3 c4 4e d6 ca 38 fe fb d1 41 9f
ee 50 05 dd e0 15 dc d6 2a c3 cc 98 53
7e 9e b4 c7 a5 4c a7 64 56 66 1e 3d 36
4a 11 72 0a eb cf c9 d2 6c 1f 2e b2 2a
67 4f 07 52 70 36 f2 89 ec 98 09 bd 61
39 b1 52 07 48 9d 36 90 9c 7d de 61 61
76 12 5e 19 a5 36 e2 11 ea 14 45 b1 ba
12 e3 e2 d5 67 81 d1 1f bb 04 b1 cc 52
c2 e5 3e df 09 3d 2b a5
Subject Key Identifier 35 4d 46 4a 13 c1 ae 81 3b b8 b5 f4 86 bb 2a c0 58 d7 ad 92
Enhanced Key Usage Client Authentication (1.3.6.1.5.5.7.3.2)
Subject Alternative Name Other Name – Principal Name=kc@microsoft.com
Thumbprint b9 c6 4a 1a d9 87 f1 cb 34 6c 92 50 20 1b 51 51 87 d5 a8 ee

Everything shown is released every time I use the certificate – which is basically every time I go to a site that asks either for “any old certificate” or for a certificate from my certificate authority.  (As far as I know, the information is offered up before verifying that the site isn't evil).  You can see that there is a lot of information leakage.  X.509 certificates were designed before the privacy implications (to both individuals and their institutions) were well understood.

Beyond leaking potentially unnecessary information (like my email address), each of the fields shown in yellow is a correlation key that links my identity in one transaction to that in another – either within a single site or across multiple sites.  Put another way, each yellow field is a handle that can be used to correlate my profiles.  It's nice to have so MANY potential handles available, isn't it?  Choosing between serial number, subject DN, public key, key identifier, alternative name and thumbprint is pretty exhausting, but any of them will work when trying to build a super-dossier.

I call this a “maximal disclosure token” because the same information is released everywhere you go, whether it is required or not.  Further, it includes not one, but a whole set of omnidirectional identifiers (see law 4).

SAML tokens represent a step forward in this regard because, being constructed at the time of usage, they only need to contain information relevant to a given transaction.  With protocols like the redirect protocol described here, the identity provider knows which relying party a user is visiting. 

The Liberty Alliance has been forward-thinking enough to use this knowledge to avoid leaking omnidirectional handles to relying parties, through what it calls pseudonynms.  For example, “persistent” and “transient” pseudonyms can be put in the tokens by the identity provider, rather than omnidirectional identifiers, and the subject key can be different for every invocation (or skipped altogether). 

As a result, while the identity provider knows more about the sites visited by its users, and about the information of relevance to those sites, the ability of the sites to create cross-site profiles without the participation of the identity provider is greatly reduced.  SAML does not employ maximal disclosure tokens.  So in the threat diagram shown at the right I've removed the RP/RP collusion threat, which now pales in comparison to the other two.

As we will see, this does NOT mean the SAML protocol uses minimal disclosure tokens, and the many intricate issues involved are treated in a balanced way by Stefan Brands here.  One very interesting argument he makes is that the relying party (he calls it “service provider or SP), actually suffers a decrease in control relative to the identity provider (IP) in these redirection protocols, while the IP gains power at the expense of the RP.  For example, if Liberty pseudonyms are used, the IP will know all the customers employing a given RP, while the RP will have no direct relationship with them.  I look forward to finding out, perhaps over a drink with someone who was present, how these technology proposals aligned with various business models as they were being elaborated.

To see how a SAML token compares with an X.509 certificate, consider this example:

You'll see there is an assertionID, which is different for every token that is minted.  Typically it would not link a user across transactions, either within a given site or across multiple sites.  There is also a “name identifier”.  In this case it is a public identifier.  In others it might be a pseudonym or “unidirectional identifier” recognized only by one site.  It might even be a transient identifier that will only be used once.

Then there are the attributes – hopefully not all the possible attributes, but just the ones that are necessary for a given transaction to occur.

 Putting all of this together, the result is an identity provider which has a great deal of visibility into and control over what is revealed where, but more protection against cross-site linking if it handles the release of attributes on a need-to-know basis.

CardSpace control for ASP.NET

Dominick Baier at LeastPrivilege has made leaps and bounds in his CardSpace control for the Microsoft ASP.NET environment (though it should work equally well for people using a Higgins identity selector or managed card).  It is very cleanly designed.  Amazingly, he's already added support for the new icon (does he ever sleep?).  His blog has an ongoing discussion around the control and related issues:

After I made some incremental changes and releases of my CardSpace control (found some bugs, got some feedback), I wanted to consolidate all the information along with a new version and some new features here. It now contains all the features I need and will be the last release for some time i guess.

If you have any feedback or suggestions feel free to write me or leave me a comment. If you want to add support for XHTML, contact me too 😉

Download here. Have fun!

PS. This was only possible with a little help of my friends…thanks Brock and JasonD!

Features

Easy to use syntax
One of my main goals was to have an easy to use markup syntax and intellisense support. I don't want to type in all those namespace URIs…

<lp:CardSpaceSelector runat="server" ID="_selector_sic" AutoPostback="true" 
    IssuerType="SelfIssued"> 

    <lp:ClaimType Name="givenname" /> 
    <lp:ClaimType Name="surname" /> 
    <lp:ClaimType Name="email" /> 

</lp:CardSpaceSelector>

Clean markup and independence of the server form
The emitted markup works with Firefox and IE. I also made sure that the <object> tag is placed outside of the postback form. This allows you to have multiple postback controls on the form without triggering the identity selector.

Support for standard InfoCard image
You can choose between all standard sizes of the official InfoCard image. You can also supply your own image and dimensions

Designer integration
I never use the designer – but I acknowledge the fact that some people do 😉 The control renders correctly in the designer and has an editor to setup the required/optional claims (including intellisense support).

Event driven
The control fires an event when a token is submitted.

protected void _selector_sic_TokenSubmitted(object sender, TokenSubmittedEventArgs e) {     string xmlToken = e.Token; }

Conditional rendering
You can choose to render the control only if the client browser supports InfoCards. You can specify an alternative <div /> that would render in that case (e.g. to tell the user how to get CardSpace).

Decoupling
I intentionally didn't couple the control with any user management semantics (like membership) or decryption clases (like the TokenProcessor). It is totally up to you how to proceed after you received the encrypted token. This is considered a feature 😉

Properties

InfoCard setup

IssuerType
This enum has two values ‘SelfIssued’ and ‘Managed’. If you select ‘SelfIssued’ then the issuer URI for self-issued cards will be emitted. If you select ‘Managed’ you have to set the issuer URI yourself. Defaults to ‘SelfIssued’

Issuer and IssuerPolicy
Specifies the URIs for the issuer and the issuer policy.

TokenType
Specifies the token type. Defaults to SAML 1.0.

PrivacyUrl and PrivacyVersion
Specifies to the URL and version of the associated privacy policy (if any).

Image

ImageUrl
Specifies a custom image to display. Defaults to the official InfoCard icon. 

StandardImageSize
Selects one of the standard images sizes for the official InfoCard icon. Defaults to 114×80.

Width & Height
Specifies the size of the image in pixels. Only relevant when a custom image is used.

Rendering

RenderOnlyIfSupported
When set to true, the control will only render if the client browser supports CardSpace. You have to embed the control into a <div /> and specify the name in the DivToRender attribute. Defaults to false.

DivToRender
Specifies which <div /> to render/make invisible based on client support.

UnsupportedDiv
Optionally specifies a <div /> to render when CardSpace is not supported on the client.

RenderMode
Choose between static and dynamic rendering. Static preserves the space for the control on the client. Defaults to Static.

Misc

HiddenFieldName
Name of the hidden field used to transmit the token back to the page. Defaults to __XMLTOKEN.

AutoPostBack
Specifies if the control posts back after a card has been selected. Defaults to false.

TriggerOnLoad
Specifies if the identity selector should be invoked directly after the page has finished loading. Defaults to false.

XmlToken
Holds the encrypted token after the user has selected a card.

Dominick also did a set of four videos on CardSpace for UK MSDN that I would recommend:

Implementation Strategies

Also find the sample code he used here

Report on EEMA e-Identity Conference in Paris

Marcus Lasance has a new blog called IdentitySpace that brings us a comprehensive piece on the EEMA e-Identity conference held recently in Paris.  Reading it will give you a feeling for some of the conversations that are going on around identity in Europe.  I'll quote randomly to pique your interest:

“Olivier Delos of SEALED gave a good example of how e-Id cards issued by governments can be used to facilitate electronic transactions in the private sector. However ‘anti’ the privacy lobby may make us want to feel against such applications, the presented use case of a Belgian Water Company clearly showed what savings can be made when users are allowed to use their Belgium Citizen ID card as an identity service provider. 150,000 address change notifications are now processed this way in a matter of seconds, with zero manual input, zero errors and a user satisfaction rating of ten out of ten.

“Unfortunately this type of example where businesses are allowed to benefit from an infrastructure provided by government remain rare and citizen to government transactions requiring an e-Identity to be presented amount to only 1.7 transactions per year according to one study. Hardly what you would call a ‘killer application’. So the barrier to acceptance remains a big concern about privacy, even though according to Kim Cameron the technology is designed to enable the relying party to remain hidden from the Identity provider when privacy is a concern. Before ‘Jo public’ readily accepts this is the case, a lot of water will flow under the bridge unfortunately.”

I need to make a clarification. Out of the box, current Citizen Cards use maximum disclosure certificates and omnidirectional keys – they are only suitable for completely “public” identity contexts. But by using them to authenticate to identity providers that support minimum disclosure tokens, we can end up with true privacy enancing systems. If we accomplish this, then working together with privacy advocates, academics and influentials from across society, we can win the trust of Joe Public, who in my view has every right to be suspicious.

“David Ramirez showed an application of Federation using SAML V2.0 in the world of the converged networks of Internet and Mobile data access. For me the example of how SAML could be used to give law enforcement agents or ‘spooks’ access to mobile phone records was a bit unfortunate in view of the earlier privacy discussions, but I guess its one application of this technology which can’t be ignored. [Hope Conor reads this… – Kim]

“Kieron Salt did his expose on European Union funded research project GUIDE, an acronym for Government User Identity for Europe. This project is driven by the objectives of the Manchester Declaration which states that by 2010 all European citizens and businesses shall benefit from locally-issued electronic IDs for use across the EU. But we had seen from previous presentations, that there is plenty of scope here for identity fraudsters until interoperability issues are sorted out.

“GUIDE is not about creating a meta directory of all member states’ Identity data, but rather about creating a gateway providing interoperability between the different standards chosen by the different member states. The gateway for instance should be able to handle transactions between SAML, Shiboleth and WS-Federation schemes, if those are the three standards commonly used in Europe…

“Dr. Ruth Halperin of the London School of Economics and Political Science has conducted an in depth research project into the extent to which citizens across European Countries trust their authorities to exchange data in an appropriate manner across government departments, between governments and commerce…

“Dr. Halperin concludes from her research that in Europe citizens have far more trust in commercial organisations than in their respective governments, an opportunity that did not go unnoticed by the next presenter from Norway representing DNV – an independent foundation Established in 1864 in Norway for the purpose of independent assessment of the quality of ships to aid insurance companies in setting realistic insurance rates… So why not carry on this position to new areas like identity assurance in the digital value chains? PKI and digital signatures are key elements in securing such processes.

Marcus also covers the keynotes, including mine.

Announcing the Information Card Icon

I want to congratulate Mike Jones and my other colleagues for all their work in creating and figuring out how to protect an Information Card icon that can be used by everyone worldwide who supports InfoCard technology.  Creative people, legal people, and marketing folks all helped bring this to fruition.  Here's Mike's post:

I’m very pleased to announce that, as of today, there is now a graphical icon freely available for people to use to indicate that “Information Cards are accepted here”. This icon is intended to provide a common visual cue that Information Cards can be used to provide information to a site or program, similarly to how the RSS icon is used to indicate the availability of syndicated content.

The guidelines for the use of the icon, a frequently asked questions document, a set of png images of the icon rendered in a range of sizes, and the original artwork in Illustrator format are all available together in a download package. Please consult the guidelines and the FAQ before using the icon.  [You can also download the icon package here – Kim]

You’ll notice that the login page for my blog now uses the icon. Hopefully your sites will soon too!

And just for fun, because the icon is, after all, a graphical element, here’s a gallery of the renderings of the icon that we included in the downloads package. Enjoy!

OK Mike – I just updated my login page too.  I used to have the picture of my heroine Elastigirl as my InfoCard icon, but it's time to move on.  I'll continue to honor her through the quote at the top of my blog.

For the curious, Mike's posting includes the definitive series of icon variants – an outstanding display of Warholian excess.

No SAML bashing intended here

Conor Cahill (Conor's Web log of Esoterica) responds to my discussion about SAML protocol and linkability in a piece called SAML bashing – which is, by the way, absolutely NOT my intent.  I'm simply trying to understand how SAML relates to linkability, as I am doing for all the other major identity technologies.  I can't take up all the points he raises at this point in the flow, but encourage the reader to look at his piece… 

Conor mentions the emerging ideas for a SAML 2.0 Enabled Client/Proxy.  I want to make it clear I wasn't analysing these proposals.  I was analysing SAML as everyone knows it today – using the “http redirect” and “post” modes that have been widely deployed in portals all over the world, and don't require changes to the browser.

Kim writes about SAML's use of redirection protocols.. To start with, he forgets to mention a few important facts as part of his discussion: 

  • SAML defines a profile for an Enabled Client/Proxy (ECP) which is an evolution of the Liberty Alliance's LECP protocol. This protocol does *NOT* involve redirection, but instead supports an intelligent client directed by the user driving SSO transactions (a similar model to that adopted by Cardspace).
  • The Browser-Profile that Kim is referring to is one written based upon a use case requirement that the profile work out-of-the-box on unmodified browsers. There is NO other possible solution that will work in this scenario that will protect the users credentials at the IdP.

That said, there are still several statements in Kim's analysis that I feel obligated to respond to. These include:

Note that all of this can occur without the user being aware that anything has happened or having to take any action. For example, the user might have a cookie that identifies her to her identity provider. Then if she is sent through steps 2) to 4), she will likely see nothing but a little flicker in her status bar as different addresses flash by. (This is why I often compare redirection to a world where, when you enter a store to buy something, the sales clerk reaches into your pocket, pulls out your wallet and debits your credit card without you knowing what is going on – trust us.”)

First off, the user only see's nothing if a) they are already authenticated by the IdP, b) they have previously established a federation with the relying party, and c) they have told the IdP that they don't want to be notified when an SSO with this party takes place. I, for one, want things to work this way for me with providers that I trust (and yes, I do trust some providers). The inability to do this type of automatic operation is one of the shortcomings in Cardspace's implementation that I think will eventually be fixed. There is no need to have repeated confirmations of operations that I say may occur without my unnecessary participation.

Secondly, the analogy is way off base, trying to make this seem like I'm bing pick-pocketed by someone I don't know which Kim knows is absolutely not the case. A more proper analogy would be something along the lines of “I give one of my providers permission to reach into my bank account and withdraw money to pay my bill”. I do this all the with providers I trust, such as my electric company, my telephone company (both wired and wireless) and may other companies.

[Much more here…]

I think Conor is misunderstanding my intentions.  I agree that with a completely trustworthy Identity Provider following best practices for end user privacy, Conor's points b) and c) above would likely apply.  But we are looking at linkability precisely to judge the threats in the case where parties to identity transactions are NOT completely trustworthy (or are attacked in ways that undermine their trustworthiness.)  So arguing that the identity provider will behave properly has nothing to do with what I am exploring:  risk.  I'll try to build Conor's concerns into my ongoing discussion.

In terms of Conor's point about letting his telephone company deduct directly from his bank account, that's a use case that “rings true”, but there are few companies to which I would want to give this ability.  That's my point.  We need a spectrum of technologies to handle different use cases and risk profiles.

[Update:  Conor later relents a bit in Perhaps not so much Bashing, bringing up a number of points I hope to make part of this exposition.] 

Linkage in “redirect” protocols like SAML

Moving on from certificates in our examination of identity technology and linkability, we'll look next at the redirection protocols – SAML, WS-Federation and OpenID.  These work as shown in the following diagram.  Let's take SAML as our example. 

In step 1, the user goes to a relying party and requests a resource using an http “GET”.  Assuming the relying party wants proof of identity, it returns 2), an http “redirect” that contains a “Location” header.  This header will necessarily include the URL of the identity provider (IP), and a bunch of goop in the URL query string that encodes the SAML request.    

For example, the redirect might look something like this:

HTTP/1.1 302 Object Moved
Date: 21 Jan 2004 07:00:49 GMT
Location:
https://ServiceProvider.com/SAML/SLO/Browser?SAMLRequest=fVFdS8MwFH0f7D%
2BUvGdNsq62oSsIQyhMESc%2B%2BJYlmRbWpObeyvz3puv2IMjyFM7HPedyK1DdsZdb%........
2F%
50sl9lU6RV2Dp0vsLIy7NM7YU82r9B90PrvCf85W%2FwL8zSVQzAEAAA%3D%
3D&RelayState=0043bfc1bc45110dae17004005b13a2b&SigAlg=http%3A%2F%
2Fwww.w3.org%2F200%2F09%2Fxmldsig%23rsasha1&
Signature=NOTAREALSIGNATUREBUTTHEREALONEWOULDGOHERE
Content-Type: text/html; charset=iso-8859-1

The user's browser receives the redirect and then behaves as a good browser should, doing the GET at the URL represented by the Location header, as shown in 3). 

The question of how the relying party knows which identity provider URL to use is open ended.  In a portal scenario, the address might be hard wired, pointing to the portal's identity provider.  Or in OpenID, the user manually enters information that can be used to figure out the URL of the identity provider (see the associated dangers).

The next question is, “How does the identity provider return the response to the relying party?”  As you might guess, the same redirection mechanism is used again in 4), but this time the identity provider fills out the Location header with the URL of the relying party, and the goop is the identity information required by the RP.  As shown in 5), the browser responds to this redirection information by obediently posting back to the relying party.

Note that all of this can occur without the user being aware that anything has happened or having to take any action.  For example, the user might have a cookie that identifies her to her identity provider.  Then if she is sent through steps 2) to 4), she will likely see nothing but a little flicker in her status bar as different addresses flash by.  (This is why I often compare redirection to a world where, when you enter a store to buy something, the sales clerk reaches into your pocket, pulls out your wallet and debits your credit card without you knowing what is going on — trust us…)

Since the identity provider is tasked with telling the browser where to send the response, it MUST know what relying party you are visiting.  Because it fabricates the returned identity token, it MUST know all the contents of that token.

So, returning to the axes for linkability that we set up in Evolving Technology for Better Privacy, we see that from an identity point of view, the identity provider “sees all” – without the requirement for any collusion.  Knowing each other's identity, the relying party and the identity provider can, in the absence of appropriate policy and suitable auditing, exchange any information they want, either through the redirection channel, or through a “back channel” that dispenses with the user and her browser altogether. 

In fact all versions of SAML include an “artifact” binding intended to facilitate this.  The intention of this mechanism is that only a “handle” need be exchanged through the browser redirection channel, with the assumption that the IP and RP can then hook up and use the handle to “collaborate” about the user without her participation.

In considering the use cases for which SAML was designed, it is important to remember that redirection was not originally designed to put the “user at the center”, but rather was “intended for cases in which the SAML requester and responder need to communicate using an HTTP user agent… for example, if the communicating parties do not share a direct path of communication.”  In other words, an IP/RP collaboration use case.

As Paul Masden reminded us in a recent comment, SAML 2.0 introduced a new element called RelayState that provides another means for synchronizing or exchanging information between the identity provider and the relying party; again, this demonstrates the great amount of trust a user must place in a SAML identity provider.

There are other SAML bindings that vary slightly from the redirect binding described above (for example, there is an HTTP POST binding that gets around the payload size limitations involved with the redirected GET, as Pat Paterson has pointed out).  But nothing changes in terms of the big picture.  In general, we can say that the redirection protocols promote much greater visibility of the IP on the RPs than was the case with X.509. 

I certainly do not see this as all bad.  It can be useful in many cases – for example when you would like your financial institution to verify the identity of a commercial site before you release funds to it.  But the important point is this:  the protocol pattern is only appropriate for a certain set of use cases, reminding us why we need to move towards a multi-technology metasystem. 

It is possible to use the same SAML payloads in more privacy-protecting ways by using a different wire protocol and putting more intelligence and control on the client.  This is the case for CardSpace in non-auditing mode, and Conor Cahor points out that SAML's Enhanced Client or Proxy (ECP) Profile has similar goals.  Privacy is one of the important reasons why evolving towards an “active client” has advantages.

You might ask why, given the greater visibility of IP on RP, I didn't put the redirection protocols at the extreme left of my identity technology privacy spectrum.  The reason is that the probability of RP/RP collusion CAN be greatly reduced when compared to X.509 certificates, as I will show next.

Information Card user education resources

Keith Brown points out that we need some permanent web resources that would teach people about Information Cards, how they work, how to use them, and so on:

As I add support for information cards to Pluralsight, I'm rather surprised that I'm having trouble finding official landing pages for consumers. For example, on our logon page, there will be a button to click to log in using an information card, kind of like what you see on Kim's login page. For people who don't know what an information card is, this might be confusing, so of course we'll want a link that points to some documentation. But right now it seems as though everyone is creating their own descriptions for this. Here's Kim's what is an information card page, for example.

It seems as though it would help adoption if there were some centralized descriptions of this stuff. Do these pages exist and I'm just missing them? Or is it that Microsoft only wants to talk about CardSpace, which is their implementation of the selector? I note that when Kim wants to tell you how to install an identity selector, he points to a WordPress blog called the Pamela Project, which doesn't seem too helpful, but might be interesting for someone wanting to add support for information cards to their WordPress blog.

It seems to me that if the industry really wants consumers to start adopting information cards, somebody's going to have to explain this stuff in terms my mother can understand, and it would help to have a common place where those explanations live.

In another post, discussing the issue with Richard Turner, he adds:

In my opinion, somebody (Microsoft?) needs to break this holding pattern fast. I agree that things aren't going to take off until there are more relying parties. But as a guy who is busy doing just that (adding support for infocard to pluralsight.com), it doesn't make me feel very comfortable that those consumer landing pages I talked about in my post don't already exist on the web. I happen to be very committed to this technology, so I'm going to implement a relying party no matter what. Other websites might not be so inclined.

I agree this would help – and simplify our lives.  When I InfoCard-enabled my site, I had to cobble stuff up from scratch.  It was tedious since none of the materials existed.   Maybe that's why my help screens are a bit, as Keith is too polite to tell you, crude.

It sure would be neat to have a PERMANENT location everyone's Information Card help links can point to.  That would provide consistency, and let us get some really good resources together, including videos. 

I'll bounce this idea around with others here at Microsoft and see how we can play, as Keith says, a leadership role in making this happen.

What does the identity provider know?

I appreciate the correction by Irving Reid in a posting called What did the identity provider know?

…And when did it know it?

Kim Cameron sums up the reasons why we need to understand the technical possibilities for how digital identity information can affect privacy; in short, we can’t make good policy if we don’t know how this stuff actually works.

But I want to call out one assertion he (and he’s not the only one) makes:

 First, part of what becomes evident is that with browser-based technologies like Liberty, WS-Federation and OpenID,  NO collusion is actually necessary for the identity provider to “see everything”.

The identity provider most certainly does not “see everything”. The IP sees which RPs you initiate sessions with and, depending on configuration, has some indication of how long those sessions last. Granted, that is *a lot* of information, but it’s far from “everything”. The IP must collude with the RPs to get any information about what you did at the RP during the session.

Completely right. I'll try to make this clearer as I go on. Without collusion, the IP doesn't know how the user actually behaved while at the RP.  I was too focussed on the “identity channel”, thinking about the fact that the IP knows times, what RPs were visited, and what claims were released for each particular user to each RP.

Collusion takes effort; how much?

Eric Norman, from University of Wisconsin, has a new blog called Fun with Metaphors and an independent spirit that is attractive and informed.    He weighs in to our recent discussion with Collusion takes effort:

Now don't get me wrong here. I'm all for protection of privacy. In fact, I have been credited by some as raising consciousness about 8 years ago (pre-Shibboleth) in the Internet2 community to the effect that privacy concerns need to be dealt with in the beginning and at a fundamental level instead of being grafted on later as an afterthought.

There have been recent discussions in the blogosphere about various parties colluding to invade someone's privacy. What I would like to see during such discussions is a more ecological and risk-assessing approach. I'll try to elaborate.

The other day, Kim Cameron analyzed sundry combinations of colluding parties and identity systems to find out what collusion is possible and what isn't. That's all well and good and useful. It answers questions about what's possible in a techno- and crypto- sense. However, I think there's more to the story.

The essence of the rest of the story is that collusion takes effort and motivation on the part of the conspirators. Such effort would act as a deterrent to the formation of such conspiracies and might even make them not worthwhile.

Just the fact that privacy violations would take collusion might be enough to inhibit them in some cases. This is a lightweight version of separation of duty — the nuclear launch scenario; make sure the decision to take action can't be unilateral.

In some of the cases, not much is said about how the parties that are involved in such a conspiracy would find each other. In the case of RPs colluding with each other, how would one of the RPs even know that there's another RP to conspire with and who the other RP is? That would involve a search and I don't think they could just consult Google. It would take effort.

Just today, Kaliya reported another example. A court has held that email is subject to protection under the Fourth Amendment and therefore a subpoena is required for collusion. That takes a lot of effort.

Anyway, the message here is that it is indeed useful to focus on just the technical and cryptographic possibilities. However, all that gets you is a yes/no answer about what's possible and what's not. Don't forget to also include the effort it would actually take to make such collusions happen.

First of all, I agree that the technical and crypto possibilities are not the whole story of linkability.  But they are a part of the story we do need to understand a lot more objectively than is currently the case.  Clearly this applies to technical people, but I think the same goes for policy makers.  Let's get to the point where the characteristics of the systems can be discussed without emotion or the bias of any one technology.

Now let's turn to one of Eric's main points: the effort required for conspirators to collude would act as a deterrent to the formation of such conspiracies.

First, part of what becomes evident is that with browser-based technologies like Liberty, WS-Federation and OpenID,  NO collusion is actually necessary for the identity provider to “see everything” – in the sense of all aspects of the identity exchange.  That in itself may limit use cases.   It also underlines the level of trust the user MUST place in such an IP.  At the very minimum, all the users of the system need to be made aware of how this works.  I'm not sure that has been happening…

Secondly, even if you blind the IP as to the identity of the RP, you clearly can't prevent the inverse, since the RP needs to know who has made the claims!  Even so,  I agree that this blinding represents something akin to “separation of duty”, making collusion a lot harder to get away with on a large scale.

So I really am trying to set up this continuum to allow for “risk assessment” and concrete understanding of different use cases and benefits.  In this regard Eric and I are in total agreement.

As a concrete example of such risk assessment, people responsible for privacy in government have pointed out to me that their systems are tightly connected, and are often run by entities who provide services across multiple departments.  They worry that in this case, collusion is very easy.  Put another way, the separation of duties is too fragile.

Assemble the audit logs and you collude.  No more to it than that.  This is why they see it as prudent to put in place a system with properties that make routine creation of super-dossiers more difficult.  And why we need to understand our continuum. 

Revealing patterns when there is no need to do so

Irving Reid of Controlled Flight into Terrain has come up with exactly the kind of use case I wanted to see when I was thinking about Paul Madsen's points:

Kim Cameron responds to Paul Madsen responding to Kim Cameron, and I wonder what it is about Canadians and identity…

But I have to admit that I have not personally been that interested in the use case of presenting “managed assertions” to amnesiac web sites.  In other words, I think the cases where you would want a managed identity provider for completely amnesiac interactions are fairly few and far between.  (If someone wants to turn me around me in this regard I’m wide open.)

Shibboleth, in particular, has a very clear requirement for this use case. FERPA requires that educational institutions disclose the least possible information about students, staff and faculty to their partners. The example I heard, back in the early days of SAML, was of an institution that had a contract with an on-line case law research provider such that anyone affiliated with the law school at that institution could look up cases.

In this case, the “managed identity provider” (representing the educational institution) needs to assert that the person visiting right now is affiliated with the law school. However, the provider has no need to know anything more than that, and therefore the institution has a responsibility under FERPA to not give the provider any extra information. “The person looking up Case X right now is the same person who looked up Case Y last week” is one of the pieces of information the institution shouldn’t share with the provider.

Put this way it is obvious that it breaks the law of minimal disclosure to reveal that “the person looking up Case X right now is the same person who looked up Case Y last week” when there is no need to do so.

I initially didn't see that a pseudonymous link between Case X and Case Y would leak very much information.  But on reflection, in the competitive world of academic research, these linkages could benefit an observer by revealing patterns the observer would not otherwise be aware of.  He might not know whose research he was observing, but might nonetheless cobble a paper together faster than the original researcher, beating him in terms of publication date.

I'll include this example in discussing some of the collusion issues raised by various identity technologies.