The mysterious Mr. Andrews

Some may have seen my piece a few days ago called “So many phish, so little time“. It's about a letter I received from a Mr. Fredrick Andrew, who introduces himself as being an auditor in Singapore but has an email domain name located in Israel. I quoted Mr. Andrew (can I call him Fredrick?) as saying:

‘I have taken pains to find your contact through personal endeavours because a late investor, who bears the same last name with you, has left monies totaling a little over $10 million United States Dollars with Our Bank for the past twelve years and no next of kin has come forward all these years.’

Fredrick “expected (my) prompt response” and wrote, “To affirm your willingness and cooperation to my proposal please do so by email, stating your full names, date of birth, telephone number and fax number.” Although he mentioned that “uttermost CONFIDENTIALITY is of vital importance”, I felt it was only fair to my readers to indicate that “if one day I just stop blogging, you'll know this has come through for me..”

Well, I recently got a message from someone called Ian through my i-name (which does not reveal my actual email address). Here goes:

‘Hi!

‘One of my friend Also receives an email from Mr Fredrick Andrew. Is that a spam or true email?

‘My friend replied to him and they exchanged email for about a week. My friend also send the signed agreement to him. Which my friend thinks that there is nothing wrong if he sends it to Fredrick.

‘Actually Fredrick also called my friend last week. Asking about the Signed Agreement.

‘So what is your verdict? IS the email coming from Fredrick Andrew is tru?

‘Please email me… thanks…’

You know what, Ian? I don't think it was a good idea for your friend to send his personal identifying information to Mr. Andrews. But I may be wrong. Almost all my investment decisions have turned out to be mediocre. I mean, I even think Google is overpriced. And this may be yet another instance of missing on an investment opportunity! So keep me posted on how things turn out…

[tags: , , , ]

British Criminologist Focuses on Identity Technology

In my recent comments about tracking beacons (radio-enabled devices with an unchanging identifier that becomes associated with a human subject) I argued that while they represent a threat to the privacy of the general population, they will not be effective against criminals and terrorists:

‘Criminals will soon come to understand the need to “cover their tracks”. They will gain access to alternate (fraudulently obtained or freshly stolen) tokens and employ the alternate tokens in endeavors that require secrecy. In this case tokens may actually make it easier for criminals to dissimulate their activities. Only bottom rung vandals, those prone to unpremeditated stupidity, and ordinary citizens can be monitored through this type of technology.’

By co-incidence, here is a story from Britain's The Telegraph about criminologist Emily Finch, who is about to publish results of a study which led her team to very similar conclusions:

‘The introduction of identity cards will fail to solve the growing problem of identity theft and could lead to an increase in fraud, according to a new study.

‘Researchers have concluded that the shift from human vigilance to a reliance on new technologies is failing to prevent the activities of fraudsters and in some cases is providing them with new opportunities.

‘Emily Finch, a criminologist at the University of East Anglia, believes that criminals will find ways around the proposed security measures designed to ensure that those applying for identity cards are who they say they are.

‘She and her colleagues reached their conclusions after interviewing criminals and observing the ways they use new technologies to their advantage.

‘Dr Finch, who will today outline her findings at the British Association, said: “There is a worrying assumption that advances in technology will provide the solution to identity theft whereas it is possible that they may actually aggravate the problem.

‘”Our research has shown that fraudsters are tenacious, merely adapting their strategies to circumvent new security measures rather then desisting from fraudulent behaviour.

‘”Studying the way that individuals disclose sensitive information would be far more valuable in preventing identity fraud than the evolution of technologically advanced but ultimately fallible measures to prevent the misuse of personal information after it has been obtained. We don't think identity cards will solve the problem of identity theft, and they have the potential to increase fraudulent behaviour.

‘”The plan is to use documents such as birth certificates and driving licences for authentication, but these are easy to obtain in someone else's name.”

‘The controversial Identity Cards Bill passed its second reading in the Commons with the Government's majority of 67 cut to 31 at the end of June. Under the proposals, citizens would have to disclose details of bank accounts, proof of residency and address, birth certificate, passport number, NHS number, National Insurance number and a credit reference number when applying for ID cards.

‘In America criminals have been able to bribe credit reference agency staff and hack into their databases in order to obtain false references.

‘Fraudsters can easily obtain fake documents such as driving licences and birth certificates, and even passports.

‘Dr Finch studied the recent introduction of chip and pin, a measure designed to cut down on the fraudulent use of other people's bank cards.

‘She added: “Chip and pin has not stopped fraud or even reduced it. It has altered the way people behave, and so fraudsters have just changed their strategies.

‘”The focus has shifted to acquiring the pin – something which is very easy to do if you look at the till.” Dr Finch said staff, who are told to look away when customers enter their details, have become less vigilant. She and a male colleague were able to use each others’ cards to make purchases.

‘Figures published by Cifas, a fraud advice service set up by the credit card industry, suggest instances of identity theft rose by 13 per cent in the first six months of this year compared with the same period last year. The Government has estimated that identity fraud costs the economy more than £1.3 billion a year.’

More when I get access to her study.

[tags: , , , , , ]

Engineering Disaster Lessons for Digital Security

Richard Bejtlich has captured a lot about the kinds of concerns which motivate me to do this blog, and which lay behind my work on the Laws of Identity, in this piece from TaoSecurity.

I watched an episode of Modern Marvels on the History Channel this afternoon. It was Engineering Disasters 11, one in a series of videos on engineering failures. A few thoughts came to mind while watching the show. I will provide commentary on each topic addressed by the episode.

  • ‘First discussed was the 1944 Cleveland liquified natural gas (LNG) fire. Engineers built a new LNG tank out of material that failed when exposed to cold, torching nearby homes and businesses when ignited. 128 people died. Engineers were not aware of the metal's failure properties, and absolutely no defensive measures were in place around the tank to protect civilian infrastructure.

    ‘This disaster revealed the need to (1) implement plans and defenses to contain catastrophe, (2) monitor to detect problems and warn potential victims, and (3) thoroughly test designs against possible environmental conditions prior to implementation. These days LNG tanks are surrounded by berms capable of containing a complete spill, and they are closely monitored for problems. Homes and businesses are also located far away from the tanks.

  • ‘Next came the 1981 Kansas City Hyatt walkway collapse that killed 114 people. A construction change resulted in an incredibly weak implementation that failed under load. Cost was not to blame; a part that might have prevented failure cost less than $1. Instead, lack of oversight, poor accountability, broken processes, a rushed build, and compromise of the original design resulted in disaster. This case introduced me to the term “structural engineer of record,” a person who assigns a seal to the plans used to construct a building. The two engineers of record for the Hyatt plans lost their licenses.

    ‘I wonder what would happen if network architectures were stamped by “security engineers of record?” If they were not willing to afix their stamp, that would indicate problems they could not tolerate. If they are willing to stamp a plan, and massive failure from poor design occurs, the engineer should be fired.

  • ‘The third event was a massive sink hole in 1993 in an Atlanta Marriott hotel parking lot. A sewer drain originally built above ground decades earlier was buried 40 feet under the parking lot. A so-called “safety net” built under the parking lot was supposed to provide additional security by giving hotel owners time to evacuate the premises if a sink hole began to develop.

    ‘Instead, the safety net masked the presence of the sink hole and let it enlarge until it was over 100 feet wide and beyond the net's capacity. Two people standing in the parking lot died when the sewer, sink hole, and net collapsed. This disaster demonstrated the importance of not operating a system (the sewer) outside of its operating design (above ground). The event also showed how products (the net) may introduce a false sense of security and/or unintended consequences.

  • ‘Next came the 1931 Yangzi River floods that killed 145,000 people. The floods were the result of extended rain that overcame levees built decades earlier by amateur builders, usually farmers protecting their lands. The Chinese government's relief efforts were hampered by the Japanese invasion and subsequent civil war. This disaster showed the weaknesses of defenses built by amateurs, for which no one is responsible. It also showed how other security incidents can degrade recovery operations.

    ‘Does your organization operate critical infrastructure that someone else built before you arrived? Perhaps it's the DNS server that no one knows how to administer. Maybe its the time service installed on the Windows server that no one touches. What amateur levee is waiting to break in your organization?

  • ‘The final disaster revolved around the deadly substance asbestos. The story began by extolling the virtues of asbestos, such as its resistance to heat. This extremely user-friendly feature resulted in asbestos deployments in countless products and locations. In 1924 a 33-year-old, 20-year textile veteran died, and her autopsy provided the first concrete evidence of asbestos’ toxicity. A 1930 British study of textile workers revealed abnormally high numbers of asbestos-related deaths. As early as 1918 insurance companies were relucant to cover textile workers due to their susceptibility to early death. As early as the 1930s the asbestos industry suppressed conclusions in research they sponsored when it revealed asbestos’ harmful effects.

    ‘By 1972, the US Occupational Safety and Health Administration arrived on the scene and chose asbestos as the first substance it would regulate. Still, today there are hundreds of thousands of pending legal cases, but asbestos is not banned in the US. This case demonstrated the importance of properly weighing risks against benefits. The need to independently measure and monitor risks outside of a vendor's promises was also shown.

‘I believe all of these cases can teach us something useful about digital security engineering. The main difference between the first four cases and the digital security world is the failure in the analog world is blatantly obvious. Digital failures can be far more subtle; it may take weeks or months (or years) for secuirty failures to be detected, unlike sink holes in parking lots. The fifth case, describing asbestos, is similar to digital security because harmful effects were not immediately apparent.

Much of our work is intended to correct early initiatives involving identity and identification so we don't end up as the subject matter for some future generation's history of engineering disasters.

Queryable fixed tracking devices when wrongly used can result in death (in the literal sense) as surely as the other disasters outlined above. Designing and massively deploying an infrastructure which is an identity-catastrophe-in-waiting is as irresponsible as the actions of earlier generations of engineers who lacked the doubt and capability for self-criticism and re-examination necessary to be an engineering professional.

This is very much what we were trying to get at when proposing the Laws of Identity.

[tags: , , ]

Contactless Payment Cards Move Forward

Britain's David Birch, director of Consult Hyperion, reports on the latest developments in contactless payment systems in an article that appeared recently on Principia. He also reviews the associated security and privacy implications. I recommend you read the whole piece, since it is a thorough look at an important new technology; but here are some morsels to pique your interest:

‘The announcement of schemes such as MasterCard's Paypass, American Express ExpressPay and Visa's contactless initiatives is a sign that contactless smart cards are moving out of mass transit (e.g. London's Oyster card) and into the mass market.

‘Indeed, Datamonitor have forecast that the market for these ‘payment tokens’ will grow at 47 per cent per annum over the next five years. The international payment schemes’ interest is obvious. At a time when it's hard to explain to a consumer why a contact smart card (such as the ‘chip and PIN’ payment cards being deployed around the world) is better than a magnetic stripe card, payment tokens immediately differentiate themselves by offering a completely different (and significantly more convenient) consumer experience.

‘Why? Because the token needs only to be waved close to the terminal. In many cases, it will work fine while still in a bag or briefcase providing it is close enough to the terminal. The distance depends on the type of device used; the type of ‘proximity interface’ chip being discussed in this article will work up to a few centimetres from the terminals…

‘Nokia have said that they think payment tag technology is better than Bluetooth or Infra-red for mobile payments and, in Japan, NTT DoCoMo and Sony have formed a joint venture (FeliCa Networks) to develop a version of the Sony FeliCa contactless chip for embedding into mobile phones and to operate the FeliCa platform for m-commerce. For many consumers, this will be the ultimate in convenience because the phone provides the communications link for managing the payment account as well as the physical payment device. The dreams of the mobile payment community will come true, but not in the way that they thought.

‘Payment tokens

‘So how do payment tokens work to deliver the appropriate levels of both security and privacy? To answer this question, it's necessary to understand how they work. In the general case, the payment token comprises a microprocessor with hardware support for cryptographic operation and an RF interface. There are various standards in this space, but the one most widely used for payment tokens at present is ISO/IEC 14443.

‘In a typical retail environment the retailer's point-of-sale (POS) terminal and the payment token both contain a microprocessor; the microprocessors communicate using a payment protocol (on top of the ISO 14443 protocol for basic data exchange).

‘When it is time to pay, the customer brings their tag close to the POS terminal. The terminal interrogates the card and gets back the serial number and a cryptogram (a one-time code calculated inside the token). It feeds these to the acquiring bank, which passes them back to the issuer. From the serial number, the issuer knows which account to authorise and from the cryptogram the issuer knows that the token is valid.

‘The cryptogram is made up from the serial number and a transaction counter, encrypted using the token security key. This key is inserted in the token during manufacturing; it is derived from the serial number and a bank master key. Once in the token, it is never divulged. This kind of solution provides:

  • Privacy, because the token ID is meaningless to anyone other than the issuing bank which can map that ID to an actual account or card number;
  • Security, because knowing the token ID is insufficient to create a cloned token. Also, a cloned token would not generate a correct cryptogram because it would not have the right security key and if the transaction is replayed the transaction counter will be wrong.

‘Please note that this is an example given for the purpose of discussion; it is not meant to represent any of the operational schemes discussed in this article. The security of this typical example scheme is not absolute. There is no cardholder verification (i.e. a signature or a PIN), but all transactions are authorised online, so a lost or stolen card can be blocked as soon as it is reported (although it has to be said that consumers will generally notice the loss or their keys or mobile phone pretty quickly). For this example scheme, it might be useful to add an online PIN only for transactions above £20 or so. ‘

The attention to privacy considerations in these scenarios is essential.

How many users of public transit would want to generate a computerized record of every place they have gone, the time of day they have traveled, and how long they have remained there – throughout their lives?

If the use of the tokens generalizes and they become an important method of payment, it becomes easy to combine this information with the rest of an individual's purchasing history, potentially including everything from books and magazines to digital media.

Is it is true you would have to ask the issuing authority about who had purchased the contactless tracking device? I don't think so. What if you had some other way to establish the link between the device and the user's identity? For example, requiring another piece of identification – even once – and using it to perform the association.

So in my view, these scenarios call for a more sophisticated cryptographic approach than that used as an example by David. To be clear, in his very imformative article, he certainly leaves such alternatives open. I can understand that in introducing the technology he didn't want to get diverted into a privacy threat analysis.

There are well known mechanisms for doing everything described here while making it impossible to distinguish one individual device from another unless it is being misused (e.g. has been cloned in an attempt to defraud). Let's use them.

Given problems such as terrorism, there may be some who think a fixed tracking ID could be used to monitor the travel of criminal elements. We should make it clear that this won't work for very long.

Criminals will soon come to understand the need to “cover their tracks”. They will gain access to alternate (fraudulently obtained or freshly stolen) tokens and employ the alternate tokens in endeavors that require secrecy. In this case tokens may actually make it easier for criminals to dissimulate their activities. Only bottom rung vandals, those prone to unpremeditated stupidity, and ordinary citizens can be monitored through this type of technology.

Worse, continuing to promulgate fixed beacon technology is a bit like doling out Cruise missile guidance systems to enemy agents. They allow terrorists and agents of organized crime to mount increasingly accurate surgically directed attacks.

Even if someone could imagine a scenario where fixed-beacon tracking were useful enough to justify the security and privacy problems it causes, there are ways the same high level goals could be met without endangering the privacy of the whole population. For example, it is possible to encrypt the fixed identifier of the device under a key which can only be accessed through a highly controlled process – and include the encrypted identifier in the cryptogram which is otherwise anonymous. This would make it possible, in specific circumstances approved by the courts, to follow individual itineraries, without compromising the privacy of every single user of the system by tying identifiers into mineable records.

[tags: , , , , , ]

InfoCard and Identity Metasystem at Microsoft PDC

On Wednesday, John Shewchuk gave a presentation at the Microsoft Professional Developer's Conference (PDC05) on Microsoft's approach to Digital Identity.

Session Level(s): 200
Session Type(s): Breakout
Top Picks(s): Windows Server “Longhorn”
Track(s): Communications
In this session, we discuss Microsoft's vision for an Identity Metasystem using the industry-developed, interoperable WS-* Web services architecture. The Identity Metasystem was designed to give Internet users a practical sense of safety, privacy, and certainty about who they are relating to in cyberspace. This session discusses the rationale behind the architecture of the metasystem, shows how developers can take advantage of the metasystem, and introduces the components of the Windows implementation including the identity technologies codenamed “InfoCard” and Active Directory Federation Services.

The presentation included what I thought were convincing demos – using “PDC Bits”, meaning the software made available to conference attendees – showing the new InfoCard and Indigo working together. Indigo is the code name for the Windows Communication Foundation – our implementation of Web Services (development environment, deployment framework and runtime). Information is available here.

The new InfoCard bits are not only less visually displeasing (!) than the initial (wireframe) beta, but support what we call “managed cards”, meaning identity relationships with identity provider vendors and operators – independent of any particular platform (e.g. Windows, Linux, Unix, etc). Basically, by implementing a Security Token Service (STS), and then giving a user to whom you are willing to issue tokens a (signed) configuration file, your identity provider can be set up as an InfoCard in the user's Identity Selector. For those unfamiliar with the terminology, an STS is simply a service that implements WS-Trust – anyone can build one, and the PDC bits include an example of a simple Identity Provider STS built using Indigo.

Now we have all the pieces in place that make it is possible for third parties to create metasystem components that plug into the Windows Identity Selector through the Infocard metaphor.

Andy Harjanto then gave a more detailed presentation on Thursday:

Developing Federated Identity Applications Using “InfoCard” and the Windows Communications Foundation (“Indigo”)

Session Level(s): 300
Session Type(s): Breakout
Top Picks(s): Windows Server “Longhorn”
Track(s): Communications
“InfoCard” is the Windows user experience for managing and submitting identities in the Identity Metasystem, which allows multiple identity technologies to interoperate. This session focuses on “InfoCard's” roles as an Identity Selector and Identity Provider. We look at federated identity scenarios with real-life code and enhance existing Windows Communications Foundation (formerly codename “Indigo”) applications by integrating with InfoCard. Each of the elements in the Identity metatsystem (Identity Provider, Relying Party, Identity Selector, User) are discussed and built. We also create a simple security token service that interops with “InfoCard”.

I thought Andy did a great job – and it was standing room only. An overflow area had to be set up in the hallway.

I'll make both presentations available for readers of this blog. In addition, the software that was distributed to the conference attendees will be available very soon for general download. I'll keep you posted on how to do this.

The Microsoft Identity and Access Team is hiring

Sorry to get overly Microsoft specific, but when I have the chance to enlarge our team, I just can't resist. I'm sure you'll all forgive me.

The Microsoft Identity and Access Team (IDA), which is where I work, is hiring a number of people in development, test and program management – starting as soon as possible. We're looking for good people – at all levels of experience. The positions are in Seattle, Washington, USA.

I've put together a page here that lets you bypass some of the relevant bureaucracy. In fact, I've set up an i-name that will get you right through to Dale, the person who handles recruiting for us. Tell him you want to send a CV and he'll send you his email address.

Two ways to achieve privacy

Scott Blackmer, a cyber lawyer you may have heard speaking at this year's Burton Group Catalyst Conference, has contributed the following comment to our Separation of Context discussion:

‘As individuals, we should be interested BOTH in solutions based on “privacy through obscurity” and solutions based on “privacy through accountability” — and technology has a role to play in both approaches.’

I think this is a very important point, (although I'll return to the use of the phrase “privacy through obscurity” in a subsequent post).

‘A digital identity system (or metasystem) can facilitate an individual’s technical control over the distribution of sensitive identity attributes (SSN / SIN, national ID number, credit card account, etc.), limiting the number and kind of entities that receive this information – this is privacy through obscurity.

‘Link contracts such as Drummond describes can add a layer of technical and legal accountability for those that are provided the information, by tracking and imposing conditions on how it is used and with whom it is shared — privacy through accountability.

‘One condition that can be imposed by law or contract is not to repurpose the data or share it with third parties without notice and consent, which can further limit the dispersal of information that is particularly useful in correlation attacks.

‘Correlation techniques will still exist, of course, and we’ll never get complete control over all combinations of identifying information that can be collected with little cost or effort. That’s not necessarily bad; correlation technology offers benefits as well as risks. Government agencies use correlation techniques to track down deadbeat dads and potential terrorists; employers and lenders rely on such techniques to avoid hiring fraudsters or extending credit to people who are bad credit risks. Marketers and political parties using correlation techniques are satisfied with probable rather than certain identification, because they just want to pitch their products or candidates at likely prospects, and they don’t pose much of a risk to individuals beyond annoyance.’

I'm not sure I buy the idea that because governments and police – under the guidance of the courts – should be able to do something, anyone else should as well. And I think a potential employer or lender should obtain my consent before sharing the information I supply with others for verification or correlation. (In doing this, the uses made of this information should be controlled and revealed.) Such a regime would be just as effective in preventing the hiring of a fraudster as today's roughshod measures, but would give people a greater degree and sense of control.

‘From an identity management standpoint, we should probably focus on correlation “attacks” – deliberate efforts to piece together personally identifiable information for criminal purposes, such as fraud, money laundering, stalking, or gaining unauthorized access to protected buildings and computer systems. Can a digital identity system make it harder to perpetrate correlation attacks? As a society, for example, perhaps we should make more of an effort to give individuals the option of obscuring data revealing their physical address or current location (because they have an abusive ex-spouse, for example, or they work in an abortion clinic, or somebody has pronounced a fatwa against them). And government agencies and commercial enterprises could make many correlation attacks irrelevant by requiring identifying information that is not so easy to collect as, for instance, an SSN, birthdate, and mother’s maiden name, when issuing an ID or approving a transaction.

‘Both government and business are under pressure today to adopt and rely on “stronger” forms of identification, ID that cannot so easily be obtained or mimicked by fraudulent practices such as correlation attacks, phishing, and social engineering. As Stefan says, stronger ID credentials carry their own security risks, and we should point those out and take them into account in designing digital ID systems. As these stronger forms of official and financial ID are deployed, it will be increasingly important to control how they are used, legally and contractually. Look at all the jurisdictions passing laws on the use of Social Security Numbers today, for example – they will be even more anxious to regulate the use of a super-ID. And individuals will need to know when they are asked for this ID what their technical and contractual options are (if any) for controlling its use and dissemination. Techniques such as link contracts may be very useful in this regard, to provide accountability beyond the areas controlled by regulation.’

I presume Scott is thinking of link contracts as being examples of legal mechanisms constraining the use of information (often called use policy).

Scott has posted his presentation to Catalyst as a fully expounded document called Privacy and Information Management.

[tags: , , , , , ]

So many phish, so little time…

If you don't have your own spam, here are two little phish that turned up in my corporate mail in one day.

From: Mr. Fredrick Andrew. [fredrick_andrew005@walla.com]

Subject: PLEASE YOUR REPLY IS NEEDED URGENTLY

My name is Mr. Fredrick Andrew. I trained and work as an external auditor for the Development Bank of Singapore (DBS). I have taken pains to find your contact through personal endeavours because a late investor, who bears the same last name with you, has left monies totaling a little over $10 million United States Dollars with Our Bank for the past twelve years and no next of kin has come forward all these years.

Isn't that a co-incidence? One of my really lucky breaks!

[Blah. Blah. Blah… – Kim]

Needless to say, Uttermost CONFIDENTIALITY is of vital importance if we are to successfully reap the immense benefits of this transaction. I have intentionally left out the finer details for now until I hear from you. To affirm your willingness and cooperation to my proposal please do so by email, stating your full names, date of birth, telephone number and fax number. I do expect your prompt response. pls do contact me in my email address:

[fredrick_andrw@yahoo.com.sg ]

Waiting to hear from you soon.

Thank you.

Mr. Fredrick Andrew

There is the small problem that when I ping walla.com my IP-location service tells me its in Isreal. Do you think the discrepancy with Singapore matters?

Anyway, if one day I just stop blogging, you'll know this has come through for me!

In the meantime, here's the other one – a lot more sophisticated:

eBay Safeharbor Department Notice

Fraud Alert ID : 00626654

Dear eBay member,

You have received this email because you or someone else had used your identity to make false purchases on eBay. For security reasons, we are required to open an investigation on this matter. We treat online fraud seriously and all cases which cannot be resolved between eBay and the other involved party are forwarded for further investigations to the proper authorities. To speed up this process, you are required to verify your personal information against the eBay account registration data we have on file by following the link below.


Please save this fraud alert id for your reference.

When submitting sensitive information via the website, your information is protected both online and off-line. When our registration/order form asks users to enter sensitive information (such as credit card number and/or social security number), that information is encrypted and is protected with the best encryption software in the industry – SSL.

Please Note – If your account informations are not updated within the next 72 hours, we will assume this account is fraudulent and it will be suspended. We apologize for this inconvenience, but the purpose of this verification is to ensure that your eBay account has not been fraudulently used and to combat fraud.

We apreciate your support and understanding, as we work together to keep eBay a safe place to trade.

Thank you for your patience in this matter.

Regards, Safeharbor Department (Trust and Safety Department)
eBay Inc.

Please do not reply to this e-mail as this is only a notification mail sent to this address and can not be replied to.

Copyright 2005 eBay Inc. All Rights Reserved.
Designated trademarks and brands are the property of their respective owners.
eBay and the eBay logo are trademarks of eBay Inc. which is located on Hamilton Avenue, San

If you look at the source for this one (I've defused it slightly), you'll see it's hard coded to 203.215.162.99, which geobytes.com couldn't find, but melissadata.com placed in Pakistan, the ISP being the Pakistan Software Export Board. I really like the way the Copyright makes everything look official. Note that despite the sophistication of the attack, the text still contains errors in grammar to alert us.

[tags: , , , ]

Probabilistic versus Determinate Linking

For those following the discussion on probabilistic versus determinate linking, it might be worth rewinding for a minute to consider the Fourth Law of Identity.

In presenting the fourth law, I agreed that traditional omnidirectional identifiers, by which I mean identifiers known to all, were appropriate for public contexts.

Here are some examples of what I meant by public contexts:

  • A stable well-known identifier is essential for MSDN, AOL, my bank, or even my Identity Blog. It is beneficial for such public identifiers to stay constant. I want readers to share information about www.identityblog.com. The more easily they can tell each other about the pieces on this and related websites, the better. These are all public things.
  • A well-known identifier is similarly appropriate for a “hot spot” in a shopping center. The hot spot is obviously “there” and a fixed wireless beacon is a helpful part of its presence. Otherwise I might end up exposing payment information to the wrong parties.
  • A well-known identifier is appropriate for a vending machine supporting digital payment. Again, the identifier would just be an extension of its physical presence.
  • A well-known identifier (an email address is a typical example) could be appropriate for a public role, like my role as architect of identity at Microsoft.
  • I could also employ a well-known identifier associated with a protective service offering more granular control. (For example, I use the i-name =Kim.Cameron to protect myself from spam – and it works really well.)

But in defining the fourth law I also argued that omnidirectional identifiers were not sufficient. In the parts of our life where we act as private individuals, we should have access to a technology which prevents collaboration about our identities except under our strict control. In achieving this, we can have two approaches to use of identifiers:

  • Avoid identifiers of any kind. This means (network addresses and information content aside – both separate discussions) that interaction contexts are completely disconnected – whether separated by points in time, or by the identity of the partner.
  • Use unidirectional identifiers, meaning those known only to a single partner – so that an interaction context can be maintained with that partner over time, yet remain disconnected with respect to interactions with other partners. I might subsequently choose to share some unidirectional identifiers between two (or more) partners – if they give me the right incentives. But being the only one who initially knows all the identifiers, collusion between my partners is not possible without my knowledge.

Why would you want to separate interaction contexts this way?

To prevent partners with whom you have shared information of one kind from amalgamating it with information collected about you by other partners, in order to create a “super-dossier” across different aspects of your life. (If this seems improbable, click here, then read this.)

Solove and others have explained that there are outfits which even today attempt to discover the correlations between our profiles at different organizations or sites; and who then assemble super-dossiers, and sell them, even to government buyers. If such correlations are possible, why does it still makes sense to insist on unidirectional identifiers?

I think there are several reasons.

The first is that if we want people to trust the emerging identity metasystem, we need to give them the ability to predict and intuit how it will behave.

Users can easily understand that if they give a telephone number or email address to two different parties, those parties can correlate them. This happens in the so-called “real world” as well.

But if users release no identifying information whatsoever, a system which still sets up invisible correlation handles would really be failing them. If this sounds like an unlikely technical outcome, remember that this is precisely what happens in the typical use of client X.509 certificates. Even PGP is subject to this problem (and worse, reveals the membership of one's entire circle of trust).

But there is another reason – namely, that correlation handles virtually eliminate the cost of discovering correlations, while providing 100% accuracy. We know that if correlation has a significant cost, then there must be a significant and provable cost benefit to justify it. Conversely, if it comes for free, then super-dossiers come for free, and their proliferation – completely outside of the user's control – is more or less inevitable.

I would see this proliferation as catastrophic – partly because people don't want to live in a virtual world where they feel like characters in a Kafka novel; and partly because there is great likelihood it would ultimately bring about rejection of the underlying identity system by many of those most essential to its success – the opinion leaders, those who think deeply about the implications of things, those who innovate and create, those who affect public opinion.

By providing alternatives to the use of correlation handles, not only is the cost of discovering correlation increased, but the probability of the correctness of attempted correlations is reduced. This in turn, implies further hidden cost as misinformation turns into liability. These costs and liabilities combine to discourage commercial super-dossiers constructed without the permission and participation of the individual. Given a prohibitive cost model for super-dossier activity, other less alienating means of developing real relations with customers are likely to be more cost-effective and beneficial all around.

That's really the background to yesterday's discussion about “Data Pollution and Separation of Context”. And since that posting, a number of comments have been made that are helpful in breaking through to a better understanding of how to think about and explain these complex issues.

Tom Gordon‘s contribution rang very true with me, and sounds a warning about the effect Data Pollution and false correlation will have on customers in general:

I have had one large company in the UK use incorrect information when trying to contact me about services I was purchasing from them. However, since they had previously contacted me successfully (and another department in the same company had telephoned me a few days beforehand), it appears they deliberately chose to use outdated information so they would have a failed contact record.

The interesting thing is they used information that was 3 years old, even though the department in question had sent me a letter (to the correct address) a few weeks before.

Certainly that company hadn't cleaned up its customer identity data! The symptoms described often appear when previously independent entities have been brought together under a common umbrella through reorganization, including mergers and acquisitions. The same customer appears in multiple unrelated computer systems, and it's difficult to unify them. Metadirectory helps in this regard. But getting it right depends on what we call the “identity join”. How do we know two accounts refer to the same customer? And how do we keep from making mistakes when figuring this out?

On this subject, Felipe Conill writes:

In my current job I am leading an effort with the goal of presenting data to customers from separate databases that identify the customer differently (different customer identifiers).

One of the many challenges we have to address to solve this problem is the challenge of using the identity of the customer when he logs in from a browser (where the identity datasource is reachable from the internet – which in itself presents all kinds to security risks) to query other data sources giving them information specific to their company.

The risk in doing this is that you don't want to show customer A the bill of customer B. To ensure this does not happen we need to have a mapping table to match who gets to see what.

Instead of doing this messy solution we are putting a common identifier for the customer in all of the datasources where there is customer data like Stefan suggests. Basically bypassing having to do an “identity join” to solve the problem.

I need to stop Felipe for a moment. Perhaps it's just a “vocabulary thing” – I see how a SQL aficionado, for example, might take the word “join” in a much more restrictive sense – but in the vocabulary Craig Burton and I developed, you are not avoiding doing an “identity join” at all. You are performing an identity join, and then using that to push a common identifier into all your systems as a way to represent it permanently.

This makes sense since your customers presumably want a single relationship with your company. I know I have been frustrated for years by the fact that my bank, for example, has still not “gotten it together” to give me a single login to all the services I use there.

Now the question becomes one of how you do the identity join. You probably have enough data to make a very well informed guess about what account should be joined to what. But if the data is important enough, you likely need to ask the user to verify your conclusions. For example, I have seen systems where “modern correlation technology” is used to propose that various accounts might belong to a given user, but which still ask the customer to demonstrate his ability to access them before information is merged.

In the real world you would never get to do this since the same entity does not control all information. You have governments, foreign entities and business competitors that would never agree to have the same identifier from someone.

This is true, but lets suppose they did. Would the user want or accept this? If the user does not, it is my view, and Stefan's mathematical argument makes this point superbly, it is virtually impossible to accurately know what to join to what.

I agree with Kim in that privacy-through-obscurity can be achieved by technological protections of privacy. Correlation errors are inevitable if the scope is big and in a lot of cases intolerable. By the way Kim, congrats for this blog. It really stimulates thinking reading from people at your level!

Thank you for those kind words – your point of view stimulates my thinking as well.

Simon Chen then says:

I agree with Stefan's analysis regarding how the error probability associated with linking identity accounts could become intolerable. However, I disagree with the alternative vision he's proposing: “Now, on the other hand, imagine a world where each user has only one user identifier that is the basis of all his or her interactions…”

I do not feel that all the users in the world can ever agree to use a single identifier representing him/her. The Internet is simply too diverse and mutable for something like this. Therefore, I believe that we can never avoid having to map identities between organizations.

I agree totally with Simon's point, but have to clear things up for Stefan, who would never propose use of a single identifier across contexts. In fact, he later added this clarifying remark:

Regarding Simon Chen's comment, perhaps it was not sufficiently clear that my paragraph “Now, on the other hand, imagine a world where each user has only one user identifier …” was intended in an ironical manner. In fact, my own work in the past 14 years is all about technically achieving, amongst others, the user-controlled approach that Simon outlines. See, for instance, here and here.

So this is good – we are all on track both for separation of contexts and multiple identifiers, but coming at it from slightly different points of view. Simon continues:

There are fundamental error probabilities associated with identity linkage, but I also believe this error can be manageable with new trust models and infrastracture.

I think Stefan made the assumption that the service providers are responsible for updating identity mappings, which can lead to major data integrity problems.

While this is true in the current world, why not delegate this responsibility in the future to the users that own the identity information in the first place.

Now imagine, in the spirit of the First Law of identity, a user with his identity information distributed across multiple service providers, and the providers are part of a trust network with established business relationships. The user can use a personal identity management interface to update how his/her identity information can be mapped and shared with his service providers, and these user-driven changes can be propagated across the trust network through this interface.

The personal identity management interface can be hosted by any service providers in the trust network, and it simply represents a gateway for the user to tap into the trust network and manage his/her identity. In this model, the user can control the number of different service providers (or contexts) that can store identity information about him and how his identity information can be shared across contexts.

Yes, much of this thinking is along the same line as intended by InfoCards – not to imply that they represent a “silver bullet”.

[tags: , ,, , ]

Data Pollution and Separation of Context

Stefan Brands has posted one of the best argued and most important comments yet on the issue of identity correlation, the phenomenon giving rise to the Fourth Law of Identity.

By way of background, this was part of a conversation taking place in an ID Gang discussion group hosted by Berkman Law School. Our friend Drummond Reed posted a comment which, although perfectly innocent in its intent, sent me into Tasmanian Devil Mode.

‘Ever since I saw the shocking powers of modern correlation technology – it only takes 2 to 3 pieces of MANY kinds of perfectly innocent data (e.g., zip code and income) to uniquely identify a person with a 99+% statistical accuracy – I realized that privacy-through-obscurity was hopeless. Which means privacy-through-accountability is the only option.’

Accountability is indeed important, but not in any way a substitute for technological protections of privacy. Thus, although Drummond is a big supporter of the Fourth Law and context-specific identifiers, I felt it was necessary to underline the key importance of the distinction between probabilistic and determinate correlation. So I wrote:

‘The “modern correlation technology” argument made by Drummond easily leads to the wrong conclusions. The zip code plus income example is typical, and gets my goat because it leads some to say “you can be identified with a few pieces of information, so it doesn’t really matter if correlation handles exist.”

‘In Drummond’s example, how accurately has the income been expressed, and what is the size of the zipcode? … “Modern correlation technology” is based on approximations and fuzzy calculation and is very expensive relative to using “database keys”. It is appropriate to *keep it that way* and make it *more expensive still*.’

Stefan then entered the discussion, extending our consideration of the problem of fuzzy calculation (of correlation) to include the inevitability of correlation errors.

Many readers will understand this because they know that to rationalize their identity infrastructure, enterprises have had to go through the well-known pain of doing what Craig Burton and I, over a decade ago, described as the “identity join”. This was the process of determining how the identifiers used in disparate computer and directory systems throughout the enterprise mapped to each other.

Performing this join accurately usually proved laborious – even though that join represented a trivial problem compared to one at the scale of the Internet as a whole. Further, enterprise administrators had many advantages over those trying to employ “modern correlation techniques”. Besides dealing with a relatively small population, they enjoyed unlimited access to data and identifying information, flexible tools, and the ability to ask the data subjects to collaborate! It was still expensive to get everything right.

Stefan goes on to present an abstracted (mathematical) model which could be the basis for an economics of the phenomena described. If there isn't a ground-breaking paper waiting to be written about identity economics, I'll eat my hat.

Here is Stefan's contribution – one which I think is crucial (I've added some emphasis):

In support of Kim's defense of information privacy, here is another observation: there is a world of difference for organizations between

  1. a link between different user identifiers that is 100% guaranteed and
  2. a link that is only suspected (e.g., is Jon A. Smith really the same person as Jonathan A. Smith?).

Consider this. When organizations link up user accounts (also known as records, files, dossiers, etc.) that are indexed by different user identifiers and they have no guarantee of the correctness of the linkage, the aggregated information in the “super-account” may well become completely worthless to them and may even become a liability.

Even a 0.1% error probability in many cases may be intolerable. Imagine the consequences of hooking up the wrong health care or crime-related information on a per-user basis and making a medical or criminal justice error on that basis that is wrong.

Depending on the business of the organization, there may be a significant cost associated with acting on wrong information, not only from a liability perspective, but also from a goodwill, security, or resource cost perspective.

The more user accounts are linked up into one aggregated “super-account”, the higher the error probability. We are dealing with a geometric distribution here. In an abstracted model, if the probability of success in matching two user/account identifiers is p then the probability that n user identifiers that are hooked up contain at least one error (i.e, they do NOT all pertain to the same person/entity — a case of “data pollution“) is 1 – p^{n-1}. To appreciate how fast this error rate goes up when linking more and more user accounts, check out this site (requires java). More sophisticated statistics can be applied directly out of the text books of econometrists.

Now, on the other hand, imagine a world where each user has only one user identifier that is the basis for all his or her interactions with organizations and other users; no more error probability, no more data pollution when hooking up user accounts, regardless of how many! The strongest possible guarantee that different user identifiers (serving as account indices) really pertain to the same person occurs, of course, when user identifiers are “certified” by a trusted issuer; a national (or world…) ID chipcard with three-factor security would be the ultimate linking/profiling tool for organizations that naively believe that aggregating personal information across domains does not come with major security risks of its own.

In short, there is a major difference from the perspective or organizational value between being able to correlate with absolute infallibility and the value of merely being able to guess with high success probability which user identifiers / account indices relate to the same user.

PS For civil liberties arguments in favor of avoiding correlation handles where not strictly needed, see for instance here and here.

I think it would be good to put together a brief paper that explores the problem of obtaining accuracy in doing the identity join, combining the experience gathered from metadirectory deployments and Stephan's mathematical explanation.

[tags: , , , , ]