More precision on the Right to Correlate

Dave Kearns continues to whack me for some of my terminology in discussing data correlation.  He says: 

‘In responding to my “violent agreement” post, Kim Cameron goes a long way towards beginning to define the parameters for correlating data and transactions. I'd urge all of you to jump into the discussion.

‘But – and it's a huge but – we need to be very careful of the terminology we use.

‘Kim starts: “Let’s postulate that only the parties to a transaction have the right to correlate the data in the transaction, and further, that they only have the right to correlate it with other transactions involving the same parties.” ‘

Dave's right that this was overly restrictive.  In fact I changed it within a few minutes of the initial post – but apparently not fast enough to prevent confusion.  My edited version stated:

‘Let’s postulate that only the parties to a transaction have the right to correlate the data in the transaction (unless it is fully anonymized).’

This way of putting things eliminates Dave's concern:

‘Which would mean, as I read it, that I couldn't correlate my transactions booking a plane trip, hotel and rental car since different parties were involved in all three transactions!’

That said, I want to be clear that “parties to a transaction” does NOT include what Dave calls “all corporate partners” (aka a corporate information free-for-all!)  It just means parties (for example corporations) participating directly in some transaction can correlate it with the other transacitons in which they directly participate (but not with the transactions of some other corporation unless they get approval from the transaction participants to do so). 

Dave argues:

‘In the end, it isn't the correlation that's problematic, but the use to which it's put. So let's tie up the usage in a legally binding way, and not worry so much about the tools and technology.

‘In many ways the internet makes anti-social and unethical behavior easier. That doesn't mean (as some would have it) that we need to ban internet access or technological tools. It does mean we need to better educate people about acceptable behavior and step up our policing tools to better enable us to nab the bad guys (while not inconveniencing the good guys).’

To be perfectly clear, I'm not proposing a ban on technology!  I don't do banning!  I do creation. 

So instead, I'm arguing that as we develop our new technologies we should make sure they support the “right to correlation” – and the delegation of that right – in ways that restore balance and give people a fighting chance to prevent unseen software robots from limiting their destinies.

 

Do people care about data correlation?

While I was working on the last couple of posts about data correlation, trusty old RSS brought in a  corroborating piece by Colin McKay at the Office of the Privacy Commissioner of Canada.   Many  in the industry seem to assume people will trade any of their personal information for the smallest trinkets, so more empirical work of the kind reported here seems to be essential.

‘How comfortable, exactly, are online users with their information and online browsing habits being used to track their behaviour and serve ads to them?

‘A survey of Canadian respondents, conducted by TNS Facts and reported by the Canadian Marketing Association, reports that a large number of Canadians and Americans “(69% and 67% respectively) are aware that when they are online their browsing behaviour may be captured by third parties for advertising purposes.”

‘That doesn’t mean they are comfortable with the practice. The same survey notes that “just 33 per cent of Canadians who are members of a site are comfortable with these sites using their browsing information to improve their site experience. There is no difference in support for the use of consumers’ browsing history to serve them targeted ads, be it with the general population, the privacy concerned, or members of a site.”’

If only only 33% are comfortable with using browsing information to improve site experience, I wonder how many will be comfortable with using browsing information to evaluate terminating of peoples’ credit cards (see thread on Martinism)?  Can I take a guess?  How about 1%?  (This may seem high, but I have a friend in the direct marketing world who tells me 1% of the population will believe in anything at all!)  Colin continues:

‘But how much information are users willing to consciously hand over to win access to services, prizes or additional content?

‘A survey of 1800 visitors to coolsavings.com, a coupon and rebate site owned by Q Interactive, has claimed that web visitors are willing “to receive free online services and information in exchange for the use of my data to target relevant advertising to me.”

‘Now, my impression is that visitors to sites like coolsavings.com – who are actively seeking out value and benefits online – would be predisposed to believing that online sites would be able to deliver useful content and relevant ads.

‘That said, Mediapost, who had access to details of the full Q Interactive survey, cautions that users “… continue to put the brakes on hard when asked which specific information they are willing to hand over. The survey found 77.8% willing to give zip code, 64.9% their age and 72.3% their gender, but only 22.4% said they wanted to share the Web sites they visited and only 12% and 12.1% were willing to have their online purchases or the search history respectively to be shared …” ‘

I want to underline Colin's point.  These statistics come from people who actively sought out a coupon site in order to trade information for benefits!  Even so, we are talking about a mere 12% who were willing to have their online purchases or search history shared.  This empirically nixes the notion, held by some, that people don't care about data correlation (an issue I promised to address in my last post.

Colin's conclusions seem consistent with the idea I sketched there of defining a new “right to data correlation” and requiring delegation of that right before trusted parties can correlate individuals across contexts.

‘In both the TNS Facts/CMA and Q Interactive surveys, the results seem to indicate that users are willing to make a conscious decision to share information about themselves – especially if it is with sites they trust and with whom they have an established relationship.

‘A common thread seems to be emerging: consumers see a benefit to providing specific data that will help target information relevant to their needs, but they are less certain about allowing their past behaviour to be used to make inferences about their individual preferences.

‘They may feel their past search and browsing habits might just have a greater impact on their personal and professional life than the limited re-distribution of basic personal information by sites they trust. Especially if those previous habits might be seen as indiscreet, even obscene.’

Colin's conclusion points to the need to be able to “revoke the right to data correlation” that may have been extended to third parties.  It also underlines the need for a built-in scheme for aging and deletion of correlation data.

 

The Right To Correlate

Dave Kearns’ comment in Another Violent Agreement convinces me I've got to apply the scalpel to the way I talk about correlation handles.  Dave writes:

‘I took Kim at his word when he talked “about the need to prevent correlation handles and assembly of information across contexts…” That does sound like “banning the tools.”

‘So I'm pleased to say I agree with his clarification of today:

;”I agree that we must influence behaviors as well as develop tools… [but] there’s a huge gap between the kind of data correlation done at a person’s request as part of a relationship (VRM), and the data correlation I described in my post that is done without a person’s consent or knowledge.” (Emphasis added by Dave)’

Thinking about this some more, it seems we might be able to use a delegation paradigm.

The “right to correlate”

Let's postulate that only the parties to a transaction have the right to correlate the data in the transaction (unless it is fully anonymized).

Then it would follow that any two parties with whom an individual interacts would not by default have the right to correlate data they had each collected in their separate transactions.

On the other hand, the individual would have the right to organize and correlate her own data across all the parties with whom she interacts since she was party to all the transactions.

Delegating the Right to Correlate

If we introduce the ability to delegate, then an individual could delegate her right for two parties to correlate relevant data about her.  For example, I could delegate to Alaska Airlines and British Airways the right to share information about me.

Similarly, if I were an optimistic person, I could opt to use a service like that envisaged by Dave Kearns, which “can discern our real desires from our passing whims and organize our quest for knowledge, experience and – yes – material things in ways which we can only dream about now.”  The point here is that we would delegate the right to correlate to this service operating on our behalf.

Revoking the Right to Correlate

A key aspect of delegating a right is the ability to revoke that delegation.  In other words, if the service to which I had given some set of rights became annoying or odious, I would need to be able terminate its right to correlate.  Importantly, the right applies to correlation itself.  Thus when the right is revoked, the data must no longer be linkable in any way.

Forensics

There are cases where criminal activity is being investigated or proven where it is necessary for law enforcement to be able to correlate without the consent of the individual.  This is already the case in western society and it seems likely that new mechanisms would not be required in a world resepcting the Right to Correlate.

Defining contexts

Respecting the Right to Correlate would not by itself solve the Canadian Tire Problem that started this thread.  The thing that made the Canadian Tire human experiments most odious is that they correlated buying habits at the level of individual purchases (our relations to Canadian Tire as a store)  with  probable behavior in paying off credit cards (Canadian Tire as a credit card issuer).  Paradoxically, someone's loyalty to the store could actually be used to deny her credit.  People who get Canadian Tire credit cards do know that the company is in a position to correlate all this information, but are unlikely to predict this counter-intuitive outcome.

Those of us prefering mainstream credit card companies presumably don't have the same issues at this point in time.  They know where we buy but not what we buy (although there may be data sharing relationships with merchants that I am not aware of… Let me know…).

So we have come to the the most important long-term problem:  The Internet changes the rules of the game by making data correlation so very easy.

It potentially turns every credit card company into a data-correlating Canadian Tire.  Are we looking at the Canadian Tirization of the Internet?

But do people care?

Some will say that none of this matters because people just don't care about what is correlated.  I'll discuss that briefly in my next post.

Kim Cameron: secret RIAA agent?

Dave Kearns cuts me to the polemical quick by tarring me with the smelly brush of the RIAA:

‘Kim has an interesting post today, referencing an article (“What Does Your Credit-Card Company Know About You?” by Charles Duhigg in last week’s New York Times.

‘Kim correctly points out the major fallacies in the thinking of J. P. Martin, a “math-loving executive at Canadian Tire”, who, in 2002, decided to analyze the information his company had collected from credit-card transactions the previous year. For example, Martin notes that “2,220 of 100,000 cardholders who used their credit cards in drinking places missed four payments within the next 12 months.” But that's barely 2% of the total, as Kim points out, and hardly conclusive evidence of anything.

‘I'm right with Cameron for most of his essay, up til the end when he notes:

When we talk about the need to prevent correlation handles and assembly of information across contexts (for example, in the Laws of Identity and our discussions of anonymity and minimal disclosure technology), we are talking about ways to begin to throw a monkey wrench into an emerging Martinist machine. Mr. Duhigg’s story describes early prototypes of the machinations we see as inevitable should we fail in our bid to create a privacy enhancing identity infrastructure for the digital epoch.

‘Change “privacy enhancing” to “intellectual property protecting” and it could be a quote from an RIAA press release!

‘We should never confuse tools with the bad behavior that can be helped by those tools. Data correlation tools, for example, are vitally necessary for automated personalization services and can be a big help to future services such as Vendor Relationship Management (VRM) . After all, it's not Napster that's bad but people who use it to get around copyright laws who are bad. It isn't a cup of coffee that's evil, just people who try to carry one thru airport security. 🙂

‘It is easier to forbid the tool rather than to police the behavior but in a democratic society, it's the way we should act.’

I agree that we must influence behaviors as well as develop tools.  And I'm as positive about Vendor Relationship Management as anyone.  But getting concrete, there's a huge gap between the kind of data correlation done at a person's request as part of a relationship (VRM), and the data correlation I described in my post that is done without a person's consent or knowledge.  As VRM's Saint Searls has said, “Sometimes, I don't want a deep relationship, I just want a cup of coffee”.  

I'll come clean with an example.  Not a month ago, I was visiting friends in Canada, and since I had an “extra car”, was nominated to go pick up some new barbells for the kids. 

So, off to Canadian Tire to buy a barbell.  Who knows what category they put me in when 100% of my annual consumption consists of barbells?  It had to be right up there with low-grade oil or even a Mega Thruster Exhaust System.  In this case, Dave, there was no R and certainly no VRM: I didn't ask to be profiled by Mr. Martin's reputation machines.

There is nothing about miminal disclosure that says profiles cannot be constructed when people want that.  It simply means that information should only be collected in light of a specific usage, and that usage should be clear to the parties involved (NOT the case with Canadian Tire!).  When there is no legitimate reason for collecting information, people should be able to avoid it. 

It all boils down to the matter of people being “in control” of their digital interactions, and of developing technology that makes this both possible and likely.  How can you compare an automated profiling service you can turn on and off with one such as Mr. Martin thinks should rule the world of credit?  The difference between the two is a bit like the difference between a consensual sexual relationship and one based on force.

Returning to the RIAA, in my view Dave is barking up the wrong metaphor.  RIAA is NOT producing tools that put people in control of their relationships or property – quite the contrary.  And they'll pay for that. 

The brands we buy are “the windows into our souls”

You should read this fascinating piece by Charles Duhigg in last week’s New York Times. A few tidbits to whet the appetite:

‘The exploration into cardholders’ minds hit a breakthrough in 2002, when J. P. Martin, a math-loving executive at Canadian Tire, decided to analyze almost every piece of information his company had collected from credit-card transactions the previous year. Canadian Tire’s stores sold electronics, sporting equipment, kitchen supplies and automotive goods and issued a credit card that could be used almost anywhere. Martin could often see precisely what cardholders were purchasing, and he discovered that the brands we buy are the windows into our souls — or at least into our willingness to make good on our debts…

‘His data indicated, for instance, that people who bought cheap, generic automotive oil were much more likely to miss a credit-card payment than someone who got the expensive, name-brand stuff. People who bought carbon-monoxide monitors for their homes or those little felt pads that stop chair legs from scratching the floor almost never missed payments. Anyone who purchased a chrome-skull car accessory or a “Mega Thruster Exhaust System” was pretty likely to miss paying his bill eventually.

‘Martin’s measurements were so precise that he could tell you the “riskiest” drinking establishment in Canada — Sharx Pool Bar in Montreal, where 47 percent of the patrons who used their Canadian Tire card missed four payments over 12 months. He could also tell you the “safest” products — premium birdseed and a device called a “snow roof rake” that homeowners use to remove high-up snowdrifts so they don’t fall on pedestrians…

‘Why were felt-pad buyers so upstanding? Because they wanted to protect their belongings, be they hardwood floors or credit scores. Why did chrome-skull owners skip out on their debts? “The person who buys a skull for their car, they are like people who go to a bar named Sharx,” Martin told me. “Would you give them a loan?”

So what if there are errors?

Now perhaps I’ve had too much training in science and mathematics, but this type of thinking seems totally neanderthal to me. It belongs in the same category of things we should be protected from as “guilt by association” and “racial profiling”.

For example, the article cites one of Martin’s concrete statistics:

‘A 2002 study of how customers of Canadian Tire were using the company's credit cards found that 2,220 of 100,000 cardholders who used their credit cards in drinking places missed four payments within the next 12 months. By contrast, only 530 of the cardholders who used their credit cards at the dentist missed four payments within the next 12 months.’

We can rephrase the statement to say that 98% of the people who used their credit cards in drinking places did NOT miss the requisite four payments.

Drawing the conclusion that “use of the credit card in a drinking establishment predicts default” is thus an error 98 times out of 100.

Denying people credit on a premise which is wrong 98% of the time seems like one of those things regulators should rush to address, even if the premise reduces risk to the credit card company.

But there won’t be enough regulators to go around, since there are thousands of other examples given that are similarly idiotic from the point of view of a society fair to its members. For the article continues,

‘Are cardholders suddenly logging in at 1 in the morning? It might signal sleeplessness due to anxiety. Are they using their cards for groceries? It might mean they are trying to conserve their cash. Have they started using their cards for therapy sessions? Do they call the card company in the middle of the day, when they should be at work? What do they say when a customer-service representative asks how they’re feeling? Are their sighs long or short? Do they respond better to a comforting or bullying tone?

Hmmm.

  • Logging in at 1 in the morning. That’s me. I guess I’m one of the 98% for whom this thesis is wrong… I like to stay up late. Do you think staying up late could explain why Mr. Martin’s self-consciously erroneous theses irk me?
  • Using card to buy groceries? True, I don’t like cash. Does this put me on the road to ruin? Another stupid thesis for Mr. Martin.
  • Therapy sessions? If I read enough theses like those proposed by Martin, I may one day need therapy.  But frankly,  I don’t think Mr. Martin should have the slightest visibility into matters like these.  Canadian Tire meets Freud?
  • Calling in the middle of the day when I should be at work? Grow up, Mr. Martin. There is this thing called flex schedules for the 98% or 99% or 99.9% of us for which your theses continually fail.
  • What I would say if a customer-service representative asked how I was feeling? I would point out, with some vigor, that we do not have a personal relationship and that such a question isn't appropriate. And I certainly would not remain on the line.

Apparently Mr. Martin told Charles Duhigg, “If you show us what you buy, we can tell you who you are, maybe even better than you know yourself.” He then lamented that in the past, “everyone was scared that people will resent companies for knowing too much.”

At the best, this no more than a Luciferian version of the Beatles’ “You are what you eat” – but minus the excessive drug use that can explain why everyone thought this was so deep. The truth is, you are not “what you eat”.

Duhigg argues that in the past, companies stuck to “more traditional methods” of managing risk, like raising interest rates when someone was late paying a bill (imagine – a methodology based on actual delinquency rather than hocus pocus), because they worried that customers would revolt if they found out they were being studied so closely. He then says that after “the meltdown”, Mr. Martin’s methods have gained much more currency.

In fact, customers would revolt because the methodology is not reasonable or fair from the point of view of the vast majority of individuals, being wrong tens or hundreds or thousands of times more often than it is right.

If we weren’t working on digital identity, we could just end this discussion by saying Mr. Martin represents one more reason to introduce regulation into the credit card industry. But unfortunately, his thinking is contagious and symptomatic.

Mining of credit card information is just the tip of a vast and dangerous iceberg we are beginning to encounter in cyberspace. The Internet is currently engineered to facilitate the assembly of ever more information of the kind that so thrills Mr. Martin – data accumulated throughout the lives of our young people that will become progressively more important in limiting their opportunities as more “risk reduction” assumptions – of the Martinist kind that apply to almost no one but affect many – take hold.

When we talk about the need to prevent correlation handles and assembly of information across contexts (for example, in the Laws of Identity and our discussions of anonymity and minimal disclosure technology), we are talking about ways to begin to throw a monkey wrench into an emerging Martinist machine.  Mr. Duhigg's story describes early prototypes of the machinations we see as inevitable should we fail in our bid to create a privacy enhancing identity infrastructure for the digital epoch.

[Thanks to JC Cannon for pointing me to this article..]

Real business on Geneva

Network World writer John Fontana has turned his tweet volume up to MAX this week covering TechEd.  I think it works – I'm enjoying it – though the sheer volume of Fontana Tweet makes it pretty hard to get your usual bird's-eye view of who is eating donuts, listening to new bands and staying up till all hours (can I live without that?).   John also posted a news piece announcing that Microsoft IT has turned on Geneva for widespread production use internally.

Funny, last week I was at the Kuppinger Cole European ID Conference in Munich (more soon).  Dave Kearns (one of John's colleagues at Network World) hosted a panel where he asked Vittorio and me whether Microsoft was actually using the Geneva technology.  

I waved my arms pathetically and explained that our IT department had strict procedures establishing the point in the ship cycle where they will do production deployments.  Well, now Beta 2 is out the door and it's great that our IT has sufficient confidence to move immediately towards widespread internal usage.   

‘LOS ANGELES – Two days after shipping the second beta of its newest identity platform, Microsoft's internal IT department is rolling out the software corporate wide.

‘Geneva, Microsoft's identity platform for the cloud, will support 59 identity applications that Microsoft maintains with 29 business partners.

‘The federated applications include a payroll services and an online company store.

‘The company's IT department will change DNS records today on its internal network so all its identity federations are handled through its Geneva server environment rather than the current five Active Directory Federation Servers (ADFS) the company runs, according to Brian Puhl, a technology architect for Microsoft IT.

‘Microsoft has nearly 410,000 computers and 165,000 users on its network.

‘Puhl laid out the plan Tuesday during a session at Microsoft's annual TechEd conference. He said the cut over initially moves the company from ADFS 1.0 to ADFS 2.0 in Geneva, but that over time Microsoft will take advantage of streamlined support for its Live ID technology, incorporate CardSpace-based identity and roll-out claims-aware applications that are in development at Microsoft. (See graphic of Microsoft's Geneva architecture.)

‘”Geneva is a lot more than ADFS 2.0,” Puhl said.

‘Geneva was released in public beta for the first time Monday and Microsoft plans to make the software generally available at the end of 2009.

‘The identity platform's foundation is the claims-based access model and Security Token Service (STS) technology that Microsoft has been developing over the past few years as part of its industry effort to create a single identity system based on standard protocols.

‘Geneva is made up of the Geneva Server, formerly called Active Directory Federation Services 2.0; Geneva CardSpace Client, a smaller and faster version of the identity client now available with Vista; and the Geneva Framework, which was formerly code-named Zermatt.

‘Also part of the platform is the Microsoft Service Connector, the Microsoft Federation Gateway and the .Net Access Control Service, which are designed to create a sort of identity backbone and connection to the cloud.

‘Microsoft plans to tap that backbone to link to cloud services, including its Business Productivity Online Suite (BPOS). ‘

More here.

FYI: Encryption is “not necessary”

A few weeks ago I spoke at a conference of CIOs, CSOs and IT Mandarins that – of course – also featured a session on Cloud Computing.  

It was an industry panel where we heard from the people responsible for security and compliance matters at a number of leading cloud providers.  This was followed by Q and A  from the audience.

There was a lot of enthusiasm about the potential of cutting costs.  The discussion wasn't so much about whether cloud services would be helpful, as about what kinds of things the cloud could be used for.  A government architect sitting beside me thought it was a no-brainer that informational web sites could be outsourced.  His enthusiasm for putting confidential information in the cloud was more restrained.

Quite a bit of discussion centered on how “compliance” could be achieved in the cloud.  The panel was all over the place on the answer.  At one end of the spectrum was a provider who maintained that nothing changed in terms of compliance – it was just a matter of oursourcing.  Rather than creating vast multi-tenant databases, this provider argued that virtualization would allow hosted services to be treated as being logically located “in the enterprise”.

At the other end of the spectrum was a vendor who argued that if the cloud followed “normal” practices of data protection, multi-tenancy (in the sense of many customers sharing the same database or other resource) would not be an issue.  According to him, any compliance problems were due to the way requirements were specified in the first place.  It seemed obvious to him that compliance requirements need to be totally reworked to adjust to the realities of the cloud.

Someone from the audience asked whether cloud vendors really wanted to deal with high value data.  In other words, was there a business case for cloud computing once valuable resources were involved?  And did cloud providers want to address this relatively constrained part of the potential market?

The discussion made it crystal clear that questions of security, privacy and compliance in the cloud are going to require really deep thinking if we want to build trustworthy services.

The session also convinced me that those of us who care about trustworthy infrastructure are in for some rough weather.  One of the vendors shook me to the core when he said, “If you have the right physical access controls and the right background checks on employees, then you don't need encryption”.

I have to say I almost choked.  When you build gigantic, hypercentralized, data repositories of valuable private data – honeypots on a scale never before known – you had better take advantage of all the relevant technologies allowing you to build concentric perimeters of protection.  Come on, people – it isn't just a matter of replicating in the cloud the things we do in enterprises that by their very nature benefit from firewalled separation from other enterprises, departmental isolation and separation of duty inside the enterprise, and physical partitioning.  

I hope people look in great detail at what cloud vendors are doing to innovate with respect to the security and privacy measures required to safely offer hypercentralized, co-mingled sensitive and valuable data.