Britain's HMRC Identity Chernobyl

The recent British Identy Chernobyl demands our close examination. 


  • the size of the breach – loss of one person's identity information is cause for concern, but HMRC lost the information on 25 million people (7.5 million families)
  • the actual information “lost” – unencrypted records containing not only personal but also banking and national insurance details (a three-for-one…)
  • the narrative – every British family with a child under sixteen years of age made vulnerable to fraud and identity theft

According to Bloomberg News,

Political analysts said the data loss, which prompted the resignation of the head of the tax authority, could badly damage the government.

“I think it’s just a colossal error that I think could really rebound on the government’s popularity”, said Lancaster University politics Professor David Denver.

“What people think about governments these days is not so about much ideology, but about competence, and here we have truly massive incompetence.”

Even British Chancellor Alistair Darling said,

“Of course it shakes confidence, because you have a situation where millions of people give you information and expect it to be protected.

Systemic Failure

Meanwhile, in parliament, Prime Minister Gordon Brown explained that security measures had been breached when the information was downloaded and sent by courier to the National Audit Office, although there had been no “systemic failure”.

This is really the crux of the matter. Because, from a technology point of view, the failure was systemic. 

From a technology point of view, the failure was systemic.

We are living in an age where systems dealing with our identity must be designed from the bottom up not to leak information in spite of being breached.  Perhaps I should say, “redesigned from the bottom up”, because today's systems rarely meet the bar.  It's not that data protection wasn't considered when devising them.  It is simply that the profound risks were not yet evident, and guaranteeing protection was not seen to be as fundamental as meeting other design goals – like making sure the transactions balanced or abusers were caught.

Isn't it incredible that “a junior official” could simply “download” detailed personal and financial information on 25 million people?  Why would a system be designed this way? 

To me this is the equivalent of assembling a vast pile of dynamite in the middle of a city on the assumption that excellent procedures would therefore be put in place, so no one would ever set it off.  

There is no need to store all of society's dynamite in one place, and no need to run the risk of the collosal explosion that an error in procedure might produce.  

Similarly, the information that is the subject of HMRC's identity catastrophe should have been partitioned – broken up both in terms of the number of records and the information components.

In addition, it should have been encrypted – even rights protected from beginning to end.  And no official (A.K.A insider) should ever have been able to get at enough of it that a significant breach could occur.

Gordon Brown, like other political leaders, deserves technical advisors savvy enough to explain the advantages of adopting new approaches to these problems.  Information technology is important enough to the lives of citizens that political leaders really ought to understand the implications of different technology strategies.  Governments need CTOs that are responsible for national technical systems in much the same ways that chancellors and the like are responsible for finances.

Rather than being advised to apologize for systems that are fundamentally flawed, leaders should be advised to inform the population that the government has inherited antiquated systems that are not up to the privacy requirements of the digital age, and put in place solutions based on breach-resistance and privacy-enhancing technologies. 

The British information commissioner, Richard Thomas, is conducting a broad inquiry on government data privacy.  He is quoted by the Guardian as saying he was demanding more powers to enter government offices without warning for spot-checks.

He said he wanted new criminal penalties for reckless disregard of procedures. He also disclosed that only last week he had sought assurances from the Home Office on limiting information to be stored on ID cards.

“This could not be more serious and has to be a serious wake-up call to the whole of government. We have been warning about these dangers for more than a year.  

I have never understood why any politician in his (or her) right mind wouldn't want to be on the privacy-enhancing and future-facing side of this problem.


My blog was hacked over the weekend.  It was apparently a cross-site scripting attack carried out through a vulnerability in WordPress.  WordPress has released a fix (Version 2.3.1) and I've now installed it.

ZDNet broke the news on Monday – I was awakened by PR people.  The headline read, “Microsoft privacy guru's site hacked”.  Fifteen minutes of fame:, a Web site run by Microsoft’s chief architect of identity and access, has been hacked and defaced.

The site, which is used by Microsoft’s Kim Cameron to promote discussion around privacy, access and security issues, now contains an “owned by me” message and a link to a third-party site (see screenshot).

Naturally there were more than a few congratulatory messages like this one from “Living in Tension” (whose tagline says he has “Christ in one hand, and the world in the other):

Several years of working in the Information Technology world have unintentionally transformed me into a OSS , Linux, security zealot…

… Tasty little tidbits like this are just too good to be true

I wonder if he would have put it this way had he known my blog is run by commercial hosters (TextDrive) using Unix BSD, MySQL, PHP and WordPress – all OSS products.  There is no Microsoft software involved at the server end – just open source.  

The discussion list at ZDNet is amusing and sobering at the same time.  Of course it starts with a nice “ROTFLMAO” from someone called “Beyond the vista, a Leopard is stalking”: 

This one was priceless . How can Microsoft's Security Guru site get hacked ? Oh my all the MS fanboys claim that Microsoft products are so secure .


But then “ye”, who checks his facts before opening his mouth, has a big ‘aha’:

How can this be? It runs on UNIX!

FreeBSD Apache 29-Jun-2007

Why it's the very same BSD UNIX upon which OS X is based. The very one we've heard makes OS X so ultra secure and hack proof.

This is too much for poor “Stalking Leopard” to bear:

How about explaining as to what a Microsoft employee would be doing using a UNIX server ? I don't think microsoft would be too happy hearing about their employee using… more than their inherently safe IIS server.

Gosh, is the “Stalking Leopard”  caught in a reverse-borg timewarp?

By this point “fredfarkwater” seems to have had enough:

What kind of F-in idiots write in this blog? Apple this or MS that or Linux there….. What difference doesn't it make what OS/platform you choose if it does the job you want it to? A computer is just a computer, a tool, you idiot brainless toads! A system is only as secure as you make it as stated here. You *ucking moron's need a life and to grow up and use these blogs for positive purposes rather than your childish jibbish!

But as passionate as Fred's advice might be, it doesn't seem to be able to save “Linux Geek”, who at this point proclaims:

This is a shining example why you should host on Linux + Apache .

For those who still don't get it, this shows the superiority of Linux and OSS against M$ products.

Back comes a salvo of “It's on Unix”, by mharr; “lol” by toadlife; and “Shut up Fool!” by John E. Wahd.

“Ye” and marksashton are similarly incredulous:

You do realize that you just made an idiot of yourself, right?

Man you are just too much. I'm sure all of the people who use Linux are embarassed by you and people like you who spew such nonsense.

Insults fly fast and furious until “Linux User” tells “Linux Geek”:

I really hope you know  just how idiotic you look with this post! What an ID10T.

It seems the last death rattle of the performance has sounded, but then there's a short “second breath” when “myOSX” has a brainwave:

Maybe he moved the site after it got hacked ???

After that's nixed by “ye”, “Scrat” concludes:

So it appears that was being hosted by TextDrive, Inc using Apache on FreeBSD.

Bad Microsoft!

The truth of the matter is very simple.  I like WordPress, even if it has had some security problems, and I don't want to give it up.

My site practices Data Rejection, so there is no “honeypot” to protect.  My main interest is in having an application I like to use and being part of the blogosphere conversation.  If I'm breached from time to time, it will raise a few eyebrows, as it has done this week, but hopefully even that can help propagate my main message:  always design systems on the basis they will be breached – and still be safe.

Although in the past I have often hosted operational systems myself, in this project I have wanted to understand all the ins and outs and constraints of using a hosted service.  I'm pretty happy with TextDrive and think they're very professional.

After the breach at St. Francis dam
I accept that I'm a target.  Given the current state of blogging software I expect I'll be breached again (this is the second time my site has been hacked through a WordPress vulnerability). 

But,  I'm happy to work with WordPress and others to solve the problems, because there are no silver bullets when it comes to security, as I hope Linux Geek learns, especially in environments where there is a lot of innovation.

That elusive privacy

Craig Burton amused me recently by demonstrating conclusively that my use of a digital birthday for non-disclosure reasons couldn't survive social networking for longer than five digital minutes!  Here's what he says about it in his new wordpress blog (and he's setting up infocard login as we speak)…

Pamela Dingle pointed this post out to me early in Sept. We both decided not to write about it and make fun of Kim and Jackson for violating privacy guidelines so blatantly. But Kim got a good laugh out of my pointing that out so – while late – here it is. Pictures, dates the whole thing. Who cares about privacy anyway eh?

Continue reading That elusive privacy

Massive breach could involve 94 million credit cards

According to Britain's The Register,  the world's largest credit card heist might be twice as large as previously admitted. 

A retailer called TJX was able to create a system so badly conceived, designed and implemented that  94 million accounts could be stolen.  It is thought that the potential cost could reach 1 billion dollars – or even more.  The Register says

The world's largest credit card heist may be bigger than we thought. Much bigger.

According to court documents filed by a group of banks, more than 94 million accounts fell into the hands of criminals as a result of a massive security breach suffered by TJX, the Massachusetts-based retailer.

That's more than double what TJX fessed up to in March, when it estimated some 45.7 million card numbers were stolen during a 17-month span in which criminals had almost unfettered access to the company's back-end systems. Going by the smaller estimate, TJX still presided over the largest data security SNAFU in history. But credit card issuers are accusing TJX of employing fuzzy math in an attempt to contain the damage.

“Unlike other limited data breaches where ‘pastime hackers’ may have accessed data with no intention to commit fraud, in this case it is beyond doubt that there is an extremely high risk that the compromised data will be used for illegal purposes,” read the document, filed Tuesday in US District Court in Boston. “Faced with overwhelming exposure to losses it created, TJX continues to downplay the seriousness of the situation.”

TJX officials didn't return a call requesting comment for this story.

The new figures may mean TJX will have to pay more than previously estimated to clean up the mess. According to the document, Visa has incurred fraud losses of $68m to $83m resulting from the theft of 65 million accounts. That calculates to a cost of $1.04 to $1.28 per card. Applying the same rate to the 29 million MasterCard numbers lost, the total fraud losses alone could top more than $120m.

Research firms have estimated the total loss from the breach could reach $1bn once settlements, once legal settlements and lost sales are tallied. But that figure was at least partly based on the belief that fewer than 46 million accounts were intercepted (more…)

Interestingly, the actual court case is not focused on the systems themselves, but on the representations made about the systems to the banks.  According to eWeek, U.S. District Judge William Young told the plaintiffs,

“You're going to have to prove that TJX made negligent misrepresentations. That it was under a duty to speak and didn't speak and knew what its problems were and didn't say to MasterCard and Visa that they weren't encrypting and the like,” Young said. “That's why MasterCard and Visa acted to allow TJX to get into the electronic, plastic monetary exchange upon which the economic health of the nation now rests.

This was a case where the storage architecture was wrong.  The access architecture was wrong.  The security architecture was missing.  Information was collected and stored in a way that made it too easy to gain access to too much. 

Given the losses involved, if the banks lose against TJX, we can expect them to devise contracts strong enought that they can win against the next “TJX”.  So I'm hopeful that one way or the other, this breach, precisely because of its predictability and cost, will help bring the “need to know” principle into many more systems. 

I'd like to see us discussing potential architectures that can help here rather than leaving every retailer to fend for itself.

Linkage with CardSpace in Auditing Mode

As we said here, systems like SAML and OpenID work without any changes to the browser or client – which is good.  But they depend on the relying party and identity provider to completely control the movement of information, and this turns out to be bad. Why? Well, for one thing, if the user lands at an evil site it can take complete control of the client (let's call this “extreme phishing”) and trick the user into a lot of evil.

Let’s review why this is the case.  Redirection protocols have two legs.  In the first, the relying party sends the user’s browser to the identity provider with a request.  Then the identity provider sends the browser back to the relying party with a response.   Either one can convince the user it's doing one thing while actually doing the opposite.

It’s clear that with this protocol, the user’s system is “passive”. Services are active parties while the browser does what it is told.  Moreover, the services know the contents of the transaction as well as the identities and locations of the other service involved.  This means some classes of linkage are intrinsic to the protocol, even without considering the contents of the identity payload.

What changes with CardSpace?

CardSpace is based on a different protocol pattern in which the user’s system is active too.  Continue reading Linkage with CardSpace in Auditing Mode

Paper argues biometric templates can be “reversed”

Every time biometrics techology is sold to a school we get assurances that the real fingerprint or other biometric is never stored and can't be retrieved.  Supposedly the system just uses a template, a mere string of zeros and ones (as if, in the digital world, there is much more than that…)  

It turns out a Canadian researcher has shown that in the case of face recognition templates a fairly high quality image of a person can be automatically regenerated from templates.  The images calculated using the procedure are of sufficient quality to  give a good visual impression of the person's characteristics.  This work reinforces the conclusions drawn earlier by an Australian researcher, who was able to construct fingerprint images from fingerprint templates.  Continue reading Paper argues biometric templates can be “reversed”

Linkage in “redirect” protocols like SAML

Moving on from certificates in our examination of identity technology and linkability, we'll look next at the redirection protocols – SAML, WS-Federation and OpenID.  These work as shown in the following diagram.  Let's take SAML as our example. 

In step 1, the user goes to a relying party and requests a resource using an http “GET”.  Assuming the relying party wants proof of identity, it returns 2), an http “redirect” that contains a “Location” header.  This header will necessarily include the URL of the identity provider (IP), and a bunch of goop in the URL query string that encodes the SAML request.    

For example, the redirect might look something like this:

HTTP/1.1 302 Object Moved
Date: 21 Jan 2004 07:00:49 GMT
Content-Type: text/html; charset=iso-8859-1

The user's browser receives the redirect and then behaves as a good browser should, doing the GET at the URL represented by the Location header, as shown in 3). 

The question of how the relying party knows which identity provider URL to use is open ended.  In a portal scenario, the address might be hard wired, pointing to the portal's identity provider.  Or in OpenID, the user manually enters information that can be used to figure out the URL of the identity provider (see the associated dangers).

The next question is, “How does the identity provider return the response to the relying party?”  As you might guess, the same redirection mechanism is used again in 4), but this time the identity provider fills out the Location header with the URL of the relying party, and the goop is the identity information required by the RP.  As shown in 5), the browser responds to this redirection information by obediently posting back to the relying party.

Note that all of this can occur without the user being aware that anything has happened or having to take any action.  For example, the user might have a cookie that identifies her to her identity provider.  Then if she is sent through steps 2) to 4), she will likely see nothing but a little flicker in her status bar as different addresses flash by.  (This is why I often compare redirection to a world where, when you enter a store to buy something, the sales clerk reaches into your pocket, pulls out your wallet and debits your credit card without you knowing what is going on — trust us…)

Since the identity provider is tasked with telling the browser where to send the response, it MUST know what relying party you are visiting.  Because it fabricates the returned identity token, it MUST know all the contents of that token.

So, returning to the axes for linkability that we set up in Evolving Technology for Better Privacy, we see that from an identity point of view, the identity provider “sees all” – without the requirement for any collusion.  Knowing each other's identity, the relying party and the identity provider can, in the absence of appropriate policy and suitable auditing, exchange any information they want, either through the redirection channel, or through a “back channel” that dispenses with the user and her browser altogether. 

In fact all versions of SAML include an “artifact” binding intended to facilitate this.  The intention of this mechanism is that only a “handle” need be exchanged through the browser redirection channel, with the assumption that the IP and RP can then hook up and use the handle to “collaborate” about the user without her participation.

In considering the use cases for which SAML was designed, it is important to remember that redirection was not originally designed to put the “user at the center”, but rather was “intended for cases in which the SAML requester and responder need to communicate using an HTTP user agent… for example, if the communicating parties do not share a direct path of communication.”  In other words, an IP/RP collaboration use case.

As Paul Masden reminded us in a recent comment, SAML 2.0 introduced a new element called RelayState that provides another means for synchronizing or exchanging information between the identity provider and the relying party; again, this demonstrates the great amount of trust a user must place in a SAML identity provider.

There are other SAML bindings that vary slightly from the redirect binding described above (for example, there is an HTTP POST binding that gets around the payload size limitations involved with the redirected GET, as Pat Paterson has pointed out).  But nothing changes in terms of the big picture.  In general, we can say that the redirection protocols promote much greater visibility of the IP on the RPs than was the case with X.509. 

I certainly do not see this as all bad.  It can be useful in many cases – for example when you would like your financial institution to verify the identity of a commercial site before you release funds to it.  But the important point is this:  the protocol pattern is only appropriate for a certain set of use cases, reminding us why we need to move towards a multi-technology metasystem. 

It is possible to use the same SAML payloads in more privacy-protecting ways by using a different wire protocol and putting more intelligence and control on the client.  This is the case for CardSpace in non-auditing mode, and Conor Cahor points out that SAML's Enhanced Client or Proxy (ECP) Profile has similar goals.  Privacy is one of the important reasons why evolving towards an “active client” has advantages.

You might ask why, given the greater visibility of IP on RP, I didn't put the redirection protocols at the extreme left of my identity technology privacy spectrum.  The reason is that the probability of RP/RP collusion CAN be greatly reduced when compared to X.509 certificates, as I will show next.

Collusion takes effort; how much?

Eric Norman, from University of Wisconsin, has a new blog called Fun with Metaphors and an independent spirit that is attractive and informed.    He weighs in to our recent discussion with Collusion takes effort:

Now don't get me wrong here. I'm all for protection of privacy. In fact, I have been credited by some as raising consciousness about 8 years ago (pre-Shibboleth) in the Internet2 community to the effect that privacy concerns need to be dealt with in the beginning and at a fundamental level instead of being grafted on later as an afterthought.

There have been recent discussions in the blogosphere about various parties colluding to invade someone's privacy. What I would like to see during such discussions is a more ecological and risk-assessing approach. I'll try to elaborate.

The other day, Kim Cameron analyzed sundry combinations of colluding parties and identity systems to find out what collusion is possible and what isn't. That's all well and good and useful. It answers questions about what's possible in a techno- and crypto- sense. However, I think there's more to the story.

The essence of the rest of the story is that collusion takes effort and motivation on the part of the conspirators. Such effort would act as a deterrent to the formation of such conspiracies and might even make them not worthwhile.

Just the fact that privacy violations would take collusion might be enough to inhibit them in some cases. This is a lightweight version of separation of duty — the nuclear launch scenario; make sure the decision to take action can't be unilateral.

In some of the cases, not much is said about how the parties that are involved in such a conspiracy would find each other. In the case of RPs colluding with each other, how would one of the RPs even know that there's another RP to conspire with and who the other RP is? That would involve a search and I don't think they could just consult Google. It would take effort.

Just today, Kaliya reported another example. A court has held that email is subject to protection under the Fourth Amendment and therefore a subpoena is required for collusion. That takes a lot of effort.

Anyway, the message here is that it is indeed useful to focus on just the technical and cryptographic possibilities. However, all that gets you is a yes/no answer about what's possible and what's not. Don't forget to also include the effort it would actually take to make such collusions happen.

First of all, I agree that the technical and crypto possibilities are not the whole story of linkability.  But they are a part of the story we do need to understand a lot more objectively than is currently the case.  Clearly this applies to technical people, but I think the same goes for policy makers.  Let's get to the point where the characteristics of the systems can be discussed without emotion or the bias of any one technology.

Now let's turn to one of Eric's main points: the effort required for conspirators to collude would act as a deterrent to the formation of such conspiracies.

First, part of what becomes evident is that with browser-based technologies like Liberty, WS-Federation and OpenID,  NO collusion is actually necessary for the identity provider to “see everything” – in the sense of all aspects of the identity exchange.  That in itself may limit use cases.   It also underlines the level of trust the user MUST place in such an IP.  At the very minimum, all the users of the system need to be made aware of how this works.  I'm not sure that has been happening…

Secondly, even if you blind the IP as to the identity of the RP, you clearly can't prevent the inverse, since the RP needs to know who has made the claims!  Even so,  I agree that this blinding represents something akin to “separation of duty”, making collusion a lot harder to get away with on a large scale.

So I really am trying to set up this continuum to allow for “risk assessment” and concrete understanding of different use cases and benefits.  In this regard Eric and I are in total agreement.

As a concrete example of such risk assessment, people responsible for privacy in government have pointed out to me that their systems are tightly connected, and are often run by entities who provide services across multiple departments.  They worry that in this case, collusion is very easy.  Put another way, the separation of duties is too fragile.

Assemble the audit logs and you collude.  No more to it than that.  This is why they see it as prudent to put in place a system with properties that make routine creation of super-dossiers more difficult.  And why we need to understand our continuum. 

Long live minimal disclosure tokens!

Stefan Brands has a nice new piece called, Anonymous Credentials? No, Minimal Disclosure Certificates!  I think he's right about the need to stay away from the moniker “anonymous credentials”.  I adopted it – in spite of the confusion it creates – but I hereby give it up.  If I use it again, slap me around:

“Kim Cameron is in the midst of blogging an excellent series of posts on the important topic of unlinkability; see here and here, for instance. As I had expected from past experience, several commentors on Kim’s post (such as here and here) wrongly equate unlinkability with anonymity. Of course, an unfortunate choice of terminology (‘anonymous credentials’) does not help at all in this respect… 

“In short, ‘anonymous credentials’ are not all about anonymity. They are about the ability to disclose the absolute minimum that is required when presented an identity claim. Similarly, ‘unlinkability’, ‘untraceability’, and ‘selective disclosure’ are not about anonymity per se.

“Anonymity is just an extreme point on the privacy ‘spectrum’ that can be achieved, all depending on what attribute information is encoded in certificates and what of that is disclosed at presentation time. Currently prevalent technologies, such as standard digital signatures and PKI/X.509 certificates, are a poor technology to protect identity claims, since they inescapably leak a lot of identifying information when presenting protected identity claims; in particular, they disclose universally unique identifiers (correlation handles) that can be used to unambiguously link their presentation to their issuance.

I hope people will think hard about the difference between privacy and anonymity to which Stefan calls our attention.  Both are important, but people in the privacy community consider it crucial not to conflate them.  I'll try to find pointers to some of the detailed analysis that has been done by people like Simon Davies of Privacy International and Ann Cavoukian, Privacy Commisioner of Ontario, in this area – not to mention a host of other advocates and policy specialists.

So I'm going to STOP saying anonymous credentials, and even fix my very time-consuming graphic!  (Is that true dedication or what??) 

But I hope Stefan will allow me to say “Minimal Disclosure Tokens” rather than “Minimal Disclosure Certificates”.  I know the word “token” can possibly be confused with a hardware second factor, but its usage has become widely accepted in the world of web services and distributed computing.  Further, I want to get away from the connotations of X.509 “certificates” and forge new ground.  Finally, the word “tokens” ties in to thinking about claims, and allows us to envisage not only “hiding” as a means of minimization, but less draconian mechanisms as well.