November 2007 – Kim Cameron's Identity Weblog

CardSpace for the rest of us

I've been a Jon Udell fan for so long that I can't even admit to myself just how long it is! So I'll avoid that calculation and just say I'm really delighted to see the CardSpace team get kudos for its long-tail (no-ssl) work in Jon's recent CardSpace for the rest of us:

Hat tip to the CardSpace team for enabling “long tail” use of Information Card technology by lots of folks who are (understandably) daunted by the prospect of installing SSL certificates onto web servers. Kim Cameron’s screencast walks through the scenario in PHP, but anyone who can parse a bit of XML in any language will be able to follow along. The demo shows how to create a simple http: (not https:) web page that invokes an identity selector, and then parses out and reports the attributes sent by the client.

As Kim points out this is advisable only in low-value scenarios where an unencrypted exchange may be deemed acceptable. But when you count blogs, and other kinds of lightweight or ad-hoc services, there are a lot of those scenarios.

Kim adds the following key point:

Students and others who want to see the basic ideas of the Metasystem can therefore get into the game more easily, and upgrade to certificates once they’ve mastered the basics.

Exactly. Understanding the logistics of SSL is unrelated to understanding how identity claims can be represented and exchanged. Separating those concerns is a great way to grow the latter understanding.

I've never been able to put it this well, even though it's just what I was trying to do. Jon really nails it. I guess that's why he's such a good writer while I have to content myself with being an architect.

Ultimate simplicity: 30 lines of code

With the latest CardSpace bits anyone who is handy with HTML and PHP, Ruby, C#, Python or almost any other language can set up CardSpace on their site in minutes – without the pain and expense of installing a certificate. They can do this without using any of the special libraries necessary to support high security Information Card exchanges.

This approach is only advisable for personal sites like blogs – but of course, there are millions of blogs being born every second, or… something like that. Students and others who want to see the basic ideas of the Metasystem can therefore get into the game more easily, and upgrade to certificates once they've mastered the basics.

I've put together a demo of everything it takes to be successful (assuming you have the right software installed, as described later in this piece).

From the high security end of the spectrum to the long tail

Given the time pressures of shipping Vista, those of us working on CardSpace had to prioritize (i.e. cut) our features in order to get everything tested and out the door on schedule. One assumption we decided to make for V1.0 was that every site would have an X.509 certificate. We wanted our design to start from the high end of the security spectrum so the fundamental security architecture would be right. Our thinking was that if we could get these cases working, enabling the “long tail” of sites that don't have certificates would be possible too.

Let's face it. Getting a certificate, setting up a dedicated external IP address, and configuring your web server to use https is non-trivial for the average person. Nor does it make much sense to require certificates for personal web sites with no actual monetary or hacker value. I would even say that without proper security analysis, vetting of software and rigorous operating procedures, SSL isn't even likey to offer much protection against common attacks. We need to evolve our whole digital framework towards better security practices, not just mandate certificates and think we're done.

So again, when all is said and done, it is best to promote an inclusive Identity Metasystem embracing the full range of identity scenarios – including support for the “long tail” of personal and non-commercial sites. One way to do this is through OpenID support. But in addition, we have extended CardSpace to work with sites that don't have a certificate.

The user experience makes the difference clear – we are careful to clearly point out that the exchange of identity is not encrypted.

Warnings are presented in words and graphics

In spite of this, CardSpace continues to provide significant protection against attack when compared with current browsers. You are shown the DNS name of the site you are visiting as part of the CardSpace ceremony, not on some random screen under the control (or manipulation) of a potentially evil party. And if you have been redirected to a “look-alike” site containing an unknown DNS name, you will get the “Introductory” ceremony rather than the more streamlined “Known site” ceremony. This unexpected behavior has been shown to make people much more careful about what is appearing on their screen. Ruchi from the CardSpace blog has a great discussion of all the potential issues here.

What software is required?

As my little demo shows, if you have a website to which you want to add CardSpace support, all you need to do is add an “object tag” to your login page and parse a bit of xml when you get the Information Card posted back to your site.

On the “client” side, if you are using IE, first you will need to install an updated browser specific extension that will work at a non-SSL site. If you have IE7 you probably already have it as part of the October security update. If not, download it from here.

Second you will need to install an updated version of Cardspace that does the right thing when a website (we call it the “relying party”) does not have a certificate. The latest version of Cardspace can be downloaded as part of .Net Framework 3.5 from here.

For people using Mac and Linux clients, I look forward to the upcoming Internet Identity Workshop as an opportunity to catch up with my friends from Bandit, OpenInfoCard, Higgins and others about open source support for the same functionality. I'll pass on any information I can at that time.

Once you watch the demo, more information is available here and here and here. The code snippets shown are here.

Getting claims when using no-ssl CardSpace

When a user tells CardSpace to “send” identity data from a self-issued card to a web site, it posts a SAML token using the action attribute in the HTML form containing an x-informationCard Object tag.

In the simple, no-ssl case, this information will not be encrypted, so you can just treat it as an XML blob. You can test this out by making the form's action a script like this one:

This script just takes everything that is posted to the web server by CardSpace after processing the invocation form, and reflects it back as an “XML encoding”. The result is shown in my demo, and in the no-ssl zip file as result.xml.

As pedagogical as the XML dump may be, it isn't a good sample of how you would consume claims. For that, let's look at the following script:

GetClaims() shown above is just a way of pulling values out of an XML document – use your own instead. You will see that the givenname and privatepersonalidentifier claims used here are retrieved with this simple code.

I hope all of this will become very clear by watching the demo and looking at the aforementioned zip file, which you can cut and paste for your own experiments.

[Note: the raw XML display code above did not include the stripslashes function when I first posted it, which caused the function to fail in certain php configurations. Thanks to Alex Fung from Hong Kong for the report.]

Claims in the self-issued Information Cards profile

This list of claims is taken from the Identity Selector Interoperability Profile, and specifies a set of claim (attribute) types and the corresponding URIs defined for some commonly used personal information…

The base XML namespace URI that is used by the claim types defined here is as follows: http://schemas.xmlsoap.org/ws/2005/05/identity/claims

For convenience, an XML Schema for the claim types defined here can be found here.

8.5.1. First Name

URI: http://schemas.xmlsoap.org/ws/2005/05/identity/claims/givenname

Type: xs:string

Definition: (givenName in RFC 2256) Preferred name or first name of a subject. According to RFC 2256: “This attribute is used to hold the part of a person's name which is not their surname nor middle name.”

8.5.2. Last Name

URI: http://schemas.xmlsoap.org/ws/2005/05/identity/claims/surname

Type: xs:string

Definition: (sn in RFC 2256) Surname or family name of a subject. According to RFC 2256: “This is the X.500 surname attribute which contains the family name of a person.”

8.5.3. Email Address

URI: http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress

Type: xs:string

Definition: (mail in inetOrgPerson) Preferred address for the “To:” field of email to be sent to the subject, usually of the form @. According to inetOrgPerson using RFC 1274: “This attribute type specifies an electronic mailbox attribute following the syntax specified in RFC 822.”

8.5.4. Street Address

URI: http://schemas.xmlsoap.org/ws/2005/05/identity/claims/streetaddress

Type: xs:string

Definition: (street in RFC 2256) Street address component of a subject?s address information. According to RFC 2256: “This attribute contains the physical address of the object to which the entry corresponds, such as an address for package delivery.” Its content is arbitrary, but typically given as a PO Box number or apartment/house number followed by a street name, e.g. 303 Mulberry St.

8.5.5. Locality Name or City

URI: http://schemas.xmlsoap.org/ws/2005/05/identity/claims/locality

Type: xs:string

Definition: (l in RFC 2256) Locality component of a subject?s address information. According to RFC 2256: “This attribute contains the name of a locality, such as a city, county or other geographic region.” e.g. Redmond.

8.5.6. State or Province

URI: http://schemas.xmlsoap.org/ws/2005/05/identity/claims/stateorprovince

Type: xs:string

Definition: (st in RFC 2256) Abbreviation for state or province name of a subject?s address information. According to RFC 2256: “This attribute contains the full name of a state or province. The values should be coordinated on a national level and if well-known shortcuts exist – like the two-letter state abbreviations in the US – these abbreviations are preferred over longer full names.” e.g. WA.

8.5.7. Postal Code

URI: http://schemas.xmlsoap.org/ws/2005/05/identity/claims/postalcode

Type: xs:string

Definition: (postalCode in X.500) Postal code or zip code component of a subject?s address information. According to X.500(2001): “The postal code attribute type specifies the postal code of the named object. If this attribute value is present, it will be part of the object's postal address – zip code in USA, postal code for other countries.”

8.5.8. Country

URI: http://schemas.xmlsoap.org/ws/2005/05/identity/claims/country

Type: xs:string

Definition: (c in RFC 2256) Country of a subject. According to RFC 2256: “This attribute contains a two-letter ISO 3166 country code.”

8.5.9. Primary or Home Telephone Number

URI: http://schemas.xmlsoap.org/ws/2005/05/identity/claims/homephone

Type: xs:string

Definition: (homePhone in inetOrgPerson) Primary or home telephone number of a subject. According to inetOrgPerson using RFC 1274: “This attribute type specifies a home telephone number associated with a person.” Attribute values should follow the agreed format for international telephone numbers, e.g. +44 71 123 4567.

8.5.10. Secondary or Work Telephone Number

URI: http://schemas.xmlsoap.org/ws/2005/05/identity/claims/otherphone

Type: xs:string

Definition: (telephoneNumber in X.500 Person) Secondary or work telephone number of a subject. According to X.500(2001): “This attribute type specifies an office/campus telephone number associated with a person.” Attribute values should follow the agreed format for international telephone numbers, e.g. +44 71 123 4567.

8.5.11. Mobile Telephone Number

URI: http://schemas.xmlsoap.org/ws/2005/05/identity/claims/mobilephone

Type: xs:string

Definition: (mobile in inetOrgPerson) Mobile telephone number of a subject. According to inetOrgPerson using RFC 1274: “This attribute type specifies a mobile telephone number associated with a person.” Attribute values should follow the agreed format for international telephone numbers, e.g. +44 71 123 4567.

8.5.12. Date of Birth

URI: http://schemas.xmlsoap.org/ws/2005/05/identity/claims/dateofbirth

Type: xs:date

Definition: The date of birth of a subject in a form allowed by the xs:date data type.

8.5.13. Gender

URI: http://schemas.xmlsoap.org/ws/2005/05/identity/claims/gender Type: xs:token

Definition: Gender of a subject that can have any of these exact string values –

0 (meaning unspecified),
1 (meaning Male) or
2 (meaning Female). Using these values allows them to be language neutral.

8.5.14. Private Personal Identifier

URI: http://schemas.xmlsoap.org/ws/2005/05/identity/claims/privatepersonalidentifier

Type: xs:base64binary

Definition: A private personal identifier (PPID) that identifies the subject to a relying party. The word “private” is used in the sense that the subject identifier is specific to a given relying party and hence private to that relying party. A subject?s PPID at one relying party cannot be correlated with the subject?s PPID at another relying party…

8.5.15. Web Page

URI: http://schemas.xmlsoap.org/ws/2005/05/identity/claims/webpage

Type: xs:string

Definition: The Web page of a subject expressed as a URL.

HTML to invoke CardSpace on your site

In an upcoming post called Ultimate Simplicity: 30 lines of code, I show how to tweak a web page so it presents the option of logging in with an information card – without requiring you to dirty your hands with certificates.

If you haven't seen the demo yet, I start from a simple web page like this one:

I add an HTML form like this:

The form has an ID of “ctl00′, and a post action called “dump_input.php”. In other words, when the form is submitted (by clicking on the icon specified in the “img” section) the contents will be posted and the script “dump_input.php” will be run on the web server.

The form contains an x-informationCard object tag, which takes a parameter of “RequiredClaims”. This is followed by the claims the web page designer is asking for – in this case givenname and private personal identifier.

The zip of the sample code is here.

If you copy demo.html to your site, then when using the most recent release of CardSpace, you can navigate to that page, click on the icon, and you will be prompted for an infocard.

The claims supported in CardSpace for simple self-issued cards are defined here – you could cut and past them into the “RequiredClaims” parameter of demo.php to alter the form's behavior.

Touchpaper breached

Light Blue Touchpaper is a blog run by leading international security researchers at the Computer Laboratory, University of Cambridge. In recent posts, researcher Steven Murdoch writes that Touchpaper, which is based on the same WordPress blogging software I use, was breached around the same time as Identityblog (described here).

Steven explains that the attack was the result of several problems in WordPress – a SQL injection vulnerability plus a basic misuse in the way password hashes are stored and used in cookies. The latter problem remains even after release 2.3.1. He writes:

It is disappointing to see that people are still getting this type of thing wrong. In their 1978 summary, Morris and Thompson describe the importance of one way hashing and password salting (neither of which WordPress does properly).

I also pointed this problem out to several people when first experimenting with how to integrate Information Cards into WordPress a couple of years ago. The comments may not have made their way back to people who could fix the problems…

Steven has another recent post that describes more, equally surprising, uses of hashing, and discusses the interplay between hashes and search engines:

One of the steps used by the attacker who compromised Light Blue Touchpaper a few weeks ago was to create an account (which he promoted to administrator; more on that in a future post). I quickly disabled the account, but while doing forensics, I thought it would be interesting to find out the account password. WordPress stores raw MD5 hashes in the user database (despite my recommendation to use salting). As with any respectable hash function, it is believed to be computationally infeasible to discover the input of MD5 from an output. Instead, someone would have to try out all possible inputs until the correct output is discovered.

So, I wrote a trivial Python script which hashed all dictionary words, but that didn’t find the target (I also tried adding numbers to the end). Then, I switched to a Russian dictionary (because the comments in the shell code installed were in Russian) but that didn’t work either. I could have found or written a better password cracker, which varies the case of letters, and does common substitutions (e.g. o ? 0, a ? 4) but that would have taken more time than I wanted to spend. I could also improve efficiency with a rainbow table, but this needs a large database which I didn’t have.

Instead, I asked Google. I found, for example, a genealogy page listing people with the surname “Anthony”, and an advert for a house, signing off “Please Call for showing. Thank you, Anthony”. And indeed, the MD5 hash of “Anthony” was the database entry for the attacker. I had discovered his password.

In both the webpages, the target hash was in a URL. This makes a lot of sense — I’ve even written code which does the same. When I needed to store a file, indexed by a key, a simple option is to make the filename the key’s MD5 hash. This avoids the need to escape any potentially dangerous user input and is very resistant to accidental collisions. If there are too many entries to store in a single directory, by creating directories for each prefix, there will be an even distribution of files. MD5 is quite fast, and while it’s unlikely to be the best option in all cases, it is an easy solution which works pretty well.

Because of this technique, Google is acting as a hash pre-image finder, and more importantly finding hashes of things that people have hashed before. Google is doing what it does best — storing large databases and searching them. I doubt, however, that they envisaged this use though.

They say misery loves company. And if I had wanted company while my blog was being breached, the Cambridge Computer Laboratory would have been about as good company as I could get. But I'm sure they, like me, draw one conclusion above all others: build systems on the basis they will be breached, in order to reduce the consequences to the absolute minimum.

[Thanks to Hans Van Es for pinging me about this.]

NAO's “redaction” adds fuel to the flames

Google's Ben Laurie has a revealing link to correspondence published by the National Auditing Office relating to HMRC's recent identity disaster.

He also explains that the practice of publishing “redacted texts” is itself outmoded in light of the kinds of statistical attacks that can now be mounted. He concludes that, “those who are entrusted with our data have absolutely no idea of the threats it faces, nor the countermeasures one should take to avoid those threats.”

In the wake of the HMRC disaster (nicely summarised by Kim Cameron), the National Audit Office has published scans of correspondence relating to the lost data.

First of all, it's notable that everyone concerned seems to be far more concerned about cost than about privacy. But an interesting question arises in relation to the redactions made to protect the “innocent”. Once more, NAO and HMRC have shown their lack of competence in these matters…

A few years ago it was a popular pastime to recover redacted data from such documents, using a variety of techniques, from the hilarious cut'n'paste attacks (where the redacted data had not been removed, merely covered over with black graphics) to the much more interesting typography related attacks. The way these work is by working backwards from the way that computers typeset. For each font, there are lookup tables that show exactly how wide each character is, and also modifications for particular pairs of characters (for example, “fe” often has less of a gap between the characters than would be indicated by the widths of the two letters alone). This means that if you can accurately measure the width of some text it is possible to deduce which characters must have made up the text (and often what order those characters must appear in). Obviously this isn't guaranteed to give a single result, but often gives a very small number of possibilities, which can then be further reduced by other evidence (such as grammar or spelling).

It seems HMRC and NAO are entirely ignorant of these attacks, since they have left themselves wide open to them. For example, on page 5 of the PDF, take the first line “From: redacted (Benefits and Credits)”. We can easily measure the gap between “:” and “(“, which must span a space, one or more words (presumably names) and another space. From this measurement we can probably make a good shortlist of possible names.

Even more promising is line 3, “cc: redacted@…”. In this case the space between the : and the @ must be filled by characters that make a legal email address and contain no spaces. Another target is the second line of the letter itself “redacted has passed this over to me for my views”. Here we can measure the gap between the left hand margin and the first character of “has” – and fit into that space a capital letter and some other letters, no spaces. Should be pretty easy to recover that name.

And so on.

This clearly demonstrates that those who are entrusted with our data have absolutely no idea of the threats it faces, nor the countermeasures one should take to avoid those threats.

Childrens’ birthdates, addresses and names revealed

Here is more context on the HMRC identity catastrophe.

According to Terri Dowty, Director of Action on Rights for Children (ARCH):

“This appalling security lapse has placed children in the UK in immediate danger especially those who are already vulnerable. Child Benefit records contain every child’s address and date of birth [italics mine – Kim]. We are not surprised that the Chair of HMRC’s Board has resigned immediately.”

Last year Terri Dowty co-authored a report for the British Information Commissioner which highlighted the risks to children’s safety of the government’s policy of creating large, centralised databases containing sensitive information about children. But he says the government chose to dismiss the concerns of the report's authors.

Dowty's experience is a clear instance of my thesis that reduction of identity leakage is still not considered to be a “must-have” rather than a “nice-to-have”.

“The government has recently passed regulations allowing them to build databases containing details of every child in England. They have also announced an intention to create a second national database containing the in-depth personal profiles of children using services. They have batted all constructive criticism away, and repeatedly stressed that children’s data is safe in their hands.

“The events of today demonstrate that this is simply not the case, and all of our concerns for children’s safety are fully justified.”

The report ‘Children’s Databases: Safety and Privacy’ can be downloaded here.

Today the “inconvenient” input of people like Terry Dowty is often dismissed – much the way other security concerns used to be – until computer systems began to fall under the weight of internet and insider attacks…

I urge fellow architects, IT leaders, policy thinkers and technologically aware politicians to consider very seriously the advice of advocates like Terry Dowty. We can deeply benefit from building safe and privacy-enhancing systems that are secure enough to withstand attack and procedural error. Let's work together to translate this thinking to those who are less technical. We need to explain that all the functionality required for government and business can be provided in ways that enhance privacy, rather than diminish it or set society up for failure.

Britain's HMRC Identity Chernobyl

The recent British Identy Chernobyl demands our close examination.

Consider:

the size of the breach – loss of one person's identity information is cause for concern, but HMRC lost the information on 25 million people (7.5 million families)
the actual information “lost” – unencrypted records containing not only personal but also banking and national insurance details (a three-for-one…)
the narrative – every British family with a child under sixteen years of age made vulnerable to fraud and identity theft

According to Bloomberg News,

Political analysts said the data loss, which prompted the resignation of the head of the tax authority, could badly damage the government.

“I think it’s just a colossal error that I think could really rebound on the government’s popularity”, said Lancaster University politics Professor David Denver.

“What people think about governments these days is not so about much ideology, but about competence, and here we have truly massive incompetence.”

Even British Chancellor Alistair Darling said,

“Of course it shakes confidence, because you have a situation where millions of people give you information and expect it to be protected.

Systemic Failure

Meanwhile, in parliament, Prime Minister Gordon Brown explained that security measures had been breached when the information was downloaded and sent by courier to the National Audit Office, although there had been no “systemic failure”.

This is really the crux of the matter. Because, from a technology point of view, the failure was systemic.

From a technology point of view, the failure was systemic.

We are living in an age where systems dealing with our identity must be designed from the bottom up not to leak information in spite of being breached. Perhaps I should say, “redesigned from the bottom up”, because today's systems rarely meet the bar. It's not that data protection wasn't considered when devising them. It is simply that the profound risks were not yet evident, and guaranteeing protection was not seen to be as fundamental as meeting other design goals – like making sure the transactions balanced or abusers were caught.

Isn't it incredible that “a junior official” could simply “download” detailed personal and financial information on 25 million people? Why would a system be designed this way?

To me this is the equivalent of assembling a vast pile of dynamite in the middle of a city on the assumption that excellent procedures would therefore be put in place, so no one would ever set it off.

There is no need to store all of society's dynamite in one place, and no need to run the risk of the collosal explosion that an error in procedure might produce.

Similarly, the information that is the subject of HMRC's identity catastrophe should have been partitioned – broken up both in terms of the number of records and the information components.

In addition, it should have been encrypted – even rights protected from beginning to end. And no official (A.K.A insider) should ever have been able to get at enough of it that a significant breach could occur.

Gordon Brown, like other political leaders, deserves technical advisors savvy enough to explain the advantages of adopting new approaches to these problems. Information technology is important enough to the lives of citizens that political leaders really ought to understand the implications of different technology strategies. Governments need CTOs that are responsible for national technical systems in much the same ways that chancellors and the like are responsible for finances.

Rather than being advised to apologize for systems that are fundamentally flawed, leaders should be advised to inform the population that the government has inherited antiquated systems that are not up to the privacy requirements of the digital age, and put in place solutions based on breach-resistance and privacy-enhancing technologies.

The British information commissioner, Richard Thomas, is conducting a broad inquiry on government data privacy. He is quoted by the Guardian as saying he was demanding more powers to enter government offices without warning for spot-checks.

He said he wanted new criminal penalties for reckless disregard of procedures. He also disclosed that only last week he had sought assurances from the Home Office on limiting information to be stored on ID cards.

“This could not be more serious and has to be a serious wake-up call to the whole of government. We have been warning about these dangers for more than a year.

I have never understood why any politician in his (or her) right mind wouldn't want to be on the privacy-enhancing and future-facing side of this problem.

Hunky-Dory

Paul Madsen at ConnectID writes:

Kim defends CardSpace on the issue of the Display Token.

Personally, I think it's a UI issue. The concern would be mitigated if the identity selector were to simply preface the display token with a caveat:

The following attributes are what the IDP claims to be sending. If you do not trust your IdP, do not click on “Send”.

If the UI doesn't misrepresent the reality of what the DisplayToken is (and isn't), then we're hunky-dory.

And of course, CardSpace is not the only WS-Trust based identity selector in town. The other selectors are presumably under no constraints to deal with DisplayToken in the same way as does CardSpace?

Paul has a good point and I buy the “general idea”. I guess my question would be, should this warning be presented each time an Information Card is used, or just when making the initial decision to depend on a new card?

I think the answer should come from “user studies”: let's find out what approach is more effective. I hear a lot of user interface experts telling us to reduce user communication to what is essential at any specific point in time so that what is communicated is effectively conveyed.

Despite this notion, identity providers should be held accountable for ensuring that the contents of information tokens correspond to the contents of their associated display tokens. This should be mandated in the digital world.

By the way, I love Paul's recollection of the word “Hunky-Dory”. He gives a nice reference. Funny – I always thought it referred to a “certain beverage“.