Robots reshaping social networks

In May I was fascinated by a story in the Atlantic  on The Ecology Project - a group “interested in a question of particular concern to social-media experts and marketers: Is it possible not only to infiltrate social networks, but also to influence them on a large scale?” 

The Ecology Project was turning the Turing Test on its side, and setting up experiments to see how potentially massive networks of “SocialBots” (social robots) might be able to impact human social networks by interacting with their members.  

In the first such experiment it invited teams from around the world to manufacture SocialBots  and picked 500 real Twitter users, the core of whom shared “a fondness for cats”.  At the end of their two-week experiment, network graphs showed that the teams’ bots had insinuated themselves strikingly into the center of the target network.

The Web Ecology Blog summarized the results this way:

With the stroke of midnight on Sunday, the first Socialbots competition has officially ended. It’s been a crazy last 48 hours. At the last count, the final scores (and how they broke down) were:

  • Team C: 701 Points (107 Mutuals, 198 Responses)
  • Team B: 183 Points (99 Mutuals, 28 Responses)
  • Team A: 170 Points (119 Mutuals, 17 Responses)

This leaves the winner of the first-ever Socialbots Cup as Team C. Congratulations!

You also read those stats right. In under a week, Team C’s bot was able to generate close to 200 responses from the target network, with conversations ranging from a few back and forth tweets to an actual set of lengthy interchanges between the bot and the targets. Interestingly, mutual followbacks, which played so strong as a source for points in Round One, showed less strongly in Round Two, as teams optimized to drive interactions.

In any case, much further from anything having to do with mutual follows or responses, the proof is really in the pudding. The network graph shows the enormous change in the configuration of the target network from when we first got started many moons ago. The bots have increasingly been able to carve out their own independent community — as seen in the clustering of targets away from the established tightly-knit networks and towards the bots themselves.

The Atlantic story summarized the implications this way:

Can one person controlling an identity, or a group of identities, really shape social architecture? Actually, yes. The Web Ecology Project’s analysis of 2009’s post-election protests in Iran revealed that only a handful of people accounted for most of the Twitter activity there. The attempt to steer large social groups toward a particular behavior or cause has long been the province of lobbyists, whose “astroturfing” seeks to camouflage their campaigns as genuine grassroots efforts, and company employees who pose on Internet message boards as unbiased consumers to tout their products. But social bots introduce new scale: they run off a server at practically no cost, and can reach thousands of people. The details that people reveal about their lives, in freely searchable tweets and blogs, offer bots a trove of personal information to work with. “The data coming off social networks allows for more-targeted social ‘hacks’ than ever before,” says Tim Hwang, the director emeritus of the Web Ecology Project. And these hacks use “not just your interests, but your behavior.”

A week after Hwang’s experiment ended, Anonymous, a notorious hacker group, penetrated the e-mail accounts of the cyber-security firm HBGary Federal and revealed a solicitation of bids by the United States Air Force in June 2010 for “Persona Management Software”—a program that would enable the government to create multiple fake identities that trawl social-networking sites to collect data on real people and then use that data to gain credibility and to circulate propaganda.

“We hadn’t heard of anyone else doing this, but we assumed that it’s got to be happening in a big way,” says Hwang. His group has published the code for its experimental bots online, “to allow people to be aware of the problem and design countermeasures.”

The Ecology Project source code is available here.  Fascinating.  We're talking very basic stuff that none-the-less takes social engineering in an important and disturbingly different new direction. 

As is the case with the use of robots for social profiling, the use of robots to reshape social networks raises important questions about attribution and identity (the Atlantic story actually described SocialBots as “fake identities”).  

Given that SocialBots will inevitably and quickly evolve, we can see that the ability to demonstrate that you are a natural flesh-and-blood person rather than a robot will increasingly become an essential ingredient of digital reality.  It will be crucial that such a proof can be given without requiring you to identify yourself,  relinquish your anonymity, or spend your whole life completing grueling captcha challenges. 

I am again struck by our deep historical need for minimal disclosure technology like U-Prove, with its amazing ability to enable unlinkable anonymous assertions (like liveness) and yet still reveal the identities of those (like the manufacturers of armies of SocialBots) who abuse them through over-use.

 

New paper on Wi-Fi positioning systems

Regular readers will have come across (or participated in shaping) some of my work over the last year as I looked at the different ways that device identity and personal identity collide in mobile location technology.

In the early days following Google's Street View WiFi snooping escapades, I became increasingly frustrated that public and official attention centered on Google's apparently accidental collection of unencrypted network traffic when there was a much worse problem staring us in the face.

Unfortunately the deeper problem was also immensely harder to grasp since it required both a technical knowledge of networked devices and a willingness to consider totally unpredicted ways of using (or misusing) information.

As became clear from a number of the conversations with other bloggers, even many highly technical people didn't understand some pretty basic things – like the fact that personal device identifiers travel in the clear on encrypted WiFi networks… Nor was it natural for many in our community to think things through from the perspective of privacy threat analysis.

This got me to look at the issues even more closely, and I summarized my thinking at PII 2010 in Seattle.

A few months ago I ran into Dr. Ann Cavoukian, the Privacy Commissioner of Ontario, who was working on the same issues.  We decided to collaborate on a very in-depth look at both the technology and policy implications, aiming to produce a document that could be understood by those in the policy community and still serve as a call to the technical community to deal appropriately with the identity issues, seeking what Ann calls “win-win” solutions that favor both privacy and innovation.

Ann's team deserves all the credit for the thorough literature research and clear exposition.  Ann expertly describes the policy issues and urges us as technologists to adopt Privacy By Design principles for our work. I appreciate having had the opportunity to collaborate with such an innovative group.  Their efforts give me confidence that even difficult technical issues with social implications can be debated and decided by the people they affect.

Please read WiFi Positioning Systems: Beware of Unintended Consequences and let us know what you think – I invite you to comment (or tweet or email me) on the technical, policy and privacy-by-design aspects of the paper.

Change of status

My work status has gone through “some changes” recently.

A number of readers have written to me about Mary Jo Foley's report on a “goodbye party” thrown at Microsoft a few weeks ago when I officially gave up my role as Chief Architect of Identity.  Others saw Vittorio Bertocci‘s kind recollection of the progress we made over the years.

When Tim Cole interviewed me about my plans a few days later at the European Identity Conference, I hadn't made the slightest progress in terms of thinking about my future…  I did say, though, that I hoped to keep my hand in the identity and social computing space to the extent that people found my input useful.

One way to do this was to look for opportunities to participate in interesting efforts on a per-project basis.  It turns out that within a few days I was asked to do this with Microsoft over the summer.  Not exactly a complete change (!) but it still feels liberating and different.

Don't worry – I won't bore you with reports on my gigs going forward, but thought in the interests of full disclosure, you should know how this particular situation is evolving :)

Takeaway:  Life is good, and even more than ever, this blog represents my own views, which can't be blamed on anyone else even when I wish they could.

Google opposing the “Right to be forgotten”

In Europe there has been a lot of discussion about “the Right to be Forgotten” (see, for example, Le droit à l’oubli sur Internet).  The notion is that after some time, information should simply fade away (counteracting digital eternity).    

In America, the authors of the Social Network Users’ Bill of Rights have called their variant of this the “Right to Withdraw”.  

Whatever words we use, the right, if recognized, would be a far-reaching game-changer – and as I wrote here, represent a “cure as important as the introduction of antibiotics was in the world of medicine”.

Against this backdrop, the following report by CIARAN GILES of the Associated Press gives us much to think about. It appears Google is fighting head-on against the “the Right to be Forgotten”.  It seems to be willing to take on any individual or government who dares to challenge the immutable right of its database and algorithms to define you through something that has been written – forever, and whether it's true or not.

MADRID – Their ranks include a plastic surgeon, a prison guard and a high school principal. All are Spanish, but have little else in common except this: They want old Internet references about them that pop up in Google searches wiped away.

In a case that Google Inc. and privacy experts call a first of its kind, Spain's Data Protection Agency has ordered the search engine giant to remove links to material on about 90 people. The information was published years or even decades ago but is available to anyone via simple searches.

Scores of Spaniards lay claim to a “Right to be Forgotten” because public information once hard to get is now so easy to find on the Internet. Google has decided to challenge the orders and has appealed five cases so far this year to the National Court.

Some of the information is embarrassing, some seems downright banal. A few cases involve lawsuits that found life online through news reports, but whose dismissals were ignored by media and never appeared on the Internet. Others concern administrative decisions published in official regional gazettes.

In all cases, the plaintiffs petitioned the agency individually to get information about them taken down.

And while Spain is backing the individuals suing to get links taken down, experts say a victory for the plaintiffs could create a troubling precedent by restricting access to public information.

The issue isn't a new one for Google, whose search engine has become a widely used tool for learning about the backgrounds about potential mates, neighbors and co-workers. What it shows can affect romantic relationships, friendships and careers.

For that reason, Google regularly receives pleas asking that it remove links to embarrassing information from its search index or least ensure the material is buried in the back pages of its results. The company, based in Mountain View, Calif., almost always refuses in order to preserve the integrity of its index.

A final decision on Spain's case could take months or even years because appeals can be made to higher courts. Still, the ongoing fight in Spain is likely to gain more prominence because the European Commission this year is expected to craft controversial legislation to give people more power to delete personal information they previously posted online.

“This is just the beginning, this right to be forgotten, but it's going to be much more important in the future,” said Artemi Rallo, director of the Spanish Data Protection Agency. “Google is just 15 years old, the Internet is barely a generation old and they are beginning to detect problems that affect privacy. More and more people are going to see things on the Internet that they don't want to be there.”

Many details about the Spaniards taking on Google via the government are shrouded in secrecy to protect the privacy of the plaintiffs. But the case of plastic surgeon Hugo Guidotti vividly illustrates the debate.

In Google searches, the first link that pops up is his clinic, complete with pictures of a bare-breasted women and a muscular man as evidence of what plastic surgery can do for clients. But the second link takes readers to a 1991 story in Spain's leading El Pais newspaper about a woman who sued him for the equivalent of euro5 million for a breast job that she said went bad.

By the way, if it really is true that the nothing should ever interfere with the automated pronouncements of the search engine – even truth – does that mean robots have the right to pronounce any libel they want, even though we don't?

Privacy Bill of Rights establishes device identifiers as PII

In my view the Commercial Privacy Bill of Rights drafted by US Senators McCain and Kerry would significantly strengthen the identify fabric of the Internet through its proposal that “a unique persistent identifier associated with an individual or a networked device used by such an individual” must be treated as personally identifiable information (Section 3 – 4 – vii).   This clear and central statement marks a real step forward.  Amongst other things, it covers the MAC addresses of wireless devices and the serial numbers and random identifiers of mobile phones and laptops.

From this fact alone the bill could play a key role in limiting a number of the most privacy-invasive practices used today by Internet services – including location-based services.  For example, a company like Apple could no longer glibly claim, as it does in its current iTunes privacy policy, that device identifiers and location information are “not personally identifying”.  Nor could it profess, as iTunes also currently does, that this means it can “collect, use, transfer, and disclose”  the information “for any purpose”.  Putting location information under the firm control of users is a key legislative requirement addressed by the bill.

The bill also contributes both to the security of the Internet and to individual privacy by unambiguously embracing “Minimal Disclosure for a Constrained Use” as set out in Law 2 of the Laws of Identity.  Title III explicitly establishes a “Right to Purpose Specification; Data Minimization; Constraints on Distribution; and Data Integrity.”

Despite these real positives, the bill as currently formulated leaves me eager to consult a bevy of lawyers – not a good sign.  This may be because it is still a “working draft”, with numerous provisions that must be clarified. 

For example, how would the population at large ever understand the byzantine interlocking of opt-in and opt-out clauses described in Section 202?  At this point, I don't.

And what does the list of exceptions to Unauthorized Use in Section 3 paragraph 8 imply?  Does it mean such uses can be made without notice and consent?

I'll be looking for comments by legal and policy experts.  Already, EPIC has expressed both support and reservations:

Senators John Kerry (D-MA) and John McCain (R-AZ) have introduced the “Commercial Privacy Bill of Rights Act of 2011,” aimed at protecting consumers’ privacy both online and offline. The Bill endorses several “Fair Information Practices,” gives consumers the ability to opt-out of data disclosures to third-parties, and restricts the sharing of sensitive information.

But the Bill does not allow for a private right of action, preempts better state privacy laws, and includes a “Safe Harbor” arrangement that exempts companies from significant privacy requirements.

EPIC has supported privacy laws that provide meaningful enforcement, limit the ability of companies’ to exploit loopholes for behavioral targeting, and ensure that the Federal Trade Commission can investigate and prosecute unfair and deceptive trade practices, as it did with Google Buzz. For more information, see EPIC: Online Tracking and Behavioral Profiling and EPIC: Federal Trade Commission.

Kerry McCain bill proposes “minimal disclosure” for transaction

Steve Satterfield at Inside Privacy gives us this overview of central features of new Commercial Privacy Bill of Rights proposed by US Senators Kerry and McCain (download it here):

  • The draft envisions a significant role for the FTC and includes provisions requiring the FTC to promulgate rules on a number of important issues, including the appropriate consent mechanism for uses of data.  The FTC would also be tasked with issuing rules obligating businesses to provide reasonable security measures for the consumer data they maintain and to provide transparent notices about data practices.
  • The draft also states that businesses should “seek” to collect only as much “covered information” as is reasonably necessary to provide a transaction or service requested by an individual, to prevent fraud, or to improve the transaction or service
  • “Covered information” is defined broadly and would include not just “personally identifiable information” (such as name, address, telephone number, social security number), but also “unique identifier information,” including a customer number held in a cookie, a user ID, a processor serial number or a device serial number.  Unlike definitions of “covered information” that appear in separate bills authored by Reps. Bobby Rush (D-Ill.) and Jackie Speier (D-Cal.), this definition specifically covers cookies and device IDs.
  • The draft encompasses a data retention principle, providing that businesses should only retain covered information only as long as necessary to provide the transaction or service “or for a reasonable period of time if the service is ongoing.” 
  • The draft contemplates enforcement by the FTC and state attorneys general.  Notably — and in contrast to Rep. Rush's bill — the draft does not provide a privacy right of action for individuals who are affected by a violation. 
  • Nor does the bill specifically address the much-debated “Do Not Track” opt-out mechanism that was recommended in the FTC's recent staff report on consumer privacy.  (You can read our analysis of that report here.) 

As noted above, the draft is reportedly still a work in progress.  Inside Privacy will provide additional commentary on the Kerry legislation and other congressional privacy efforts as they develop.   

Press conference will be held tomorrow at 12:30 pm.  [Emphasis above is mine – Kim]

Readers of Identityblog will understand that I see this development, like so many others, as inevitable and predictable consequences of many short-sighted industry players breaking the Laws of Identity.

 

WSJ: Federal Prosecutors investigate smartphone apps

If you have kept up with the excellent Wall Street Journal series on smartphone apps that inappropriately collect and release location information, you won't be surprised at their latest chapter:  Federal Prosecutors are now investigating information-sharing practices of mobile applications, and a Grand Jury is already issuing subpoenas.  The Journal says, in part:

Federal prosecutors in New Jersey are investigating whether numerous smartphone applications illegally obtained or transmitted information about their users without proper disclosures, according to a person familiar with the matter…

The criminal investigation is examining whether the app makers fully described to users the types of data they collected and why they needed the information—such as a user's location or a unique identifier for the phone—the person familiar with the matter said. Collecting information about a user without proper notice or authorization could violate a federal computer-fraud law…

Online music service Pandora Media Inc. said Monday it received a subpoena related to a federal grand-jury investigation of information-sharing practices by smartphone applications…

In December 2010, Scott Thurm wrote Your Apps Are Watching You,  which has now been “liked” by over 13,000 people.  It reported that the Journal had tested 101 apps and found that:

… 56 transmitted the phone's unique device identifier to other companies without users’ awareness or consent.  Forty-seven apps transmitted the phone's location in some way. Five sent a user's age, gender and other personal details to outsiders.  At the time they were tested, 45 apps didn't provide privacy policies on their websites or inside the apps.

In Pandora's case, both the Android and iPhone versions of its app transmitted information about a user's age, gender, and location, as well as unique identifiers for the phone, to various advertising networks. Pandora gathers the age and gender information when a user registers for the service.

Legal experts said the probe is significant because it involves potentially criminal charges that could be applicable to numerous companies. Federal criminal probes of companies for online privacy violations are rare…

The probe centers on whether app makers violated the Computer Fraud and Abuse Act, said the person familiar with the matter. That law, crafted to help prosecute hackers, covers information stored on computers. It could be used to argue that app makers “hacked” into users’ cellphones.

[More here]

The elephant in the room is Apple's own approach to location information, which should certainly be subject to investigation as well.   The user is never presented with a dialog in which Apple's use of location information is explained and permission is obtained.  Instead, the user's agreement is gained surreptitiously, hidden away  on page 37 of a 45 page policy that Apple users must accept in order to use… iTunes.  Why iTunes requires location information is never explained.  The policy simply states that the user's device identifier and location are non-personal information and that Apple “may collect, use, transfer, and disclose non-personal information for any purpose“.

Any purpose?

Is it reasonable that companies like Apple can  proclaim that device identifiers and location are non-personal and then do whatever they want with them?  Informed opinion seems not to agree with them.  The International Working Group on Data Protection in Telecommunications, for example, asserted precisely the opposite as early as 2004.  Membership of the Group included “representatives from Data Protection Authorities and other bodies of national public administrations, international organisations and scientists from all over the world.”

More empirically, I demonstrated in Non-Personal information, like where you live that the combination of device identifier and location is in very many cases (including my own) personally identifying.  This is especially true in North America where many of us live in single-family dwellings.

[BTW, I have not deeply investigated the approach to sharing of location information taken by other smartphone providers – perhaps others can shed light on this.]

Google Indoors featured on German TV

Germans woke up yesterday to a headline story on Das Erste's TV Morning Show announcing a spiffy new Internet service – Google indoors

The first's lead-in and Google Indoors spokesman

A spokesman said Google was extending its Street View offering so Internet users could finally see inside peoples’ homes.  Indeed, Google indoors personnel were already knocking on doors, patiently explaining that if people had not already gone through the opt-out process, they had “opted in”…

Google Indoors greeted by happy customer

… so the technicians needed to get on with their work:

Google Indoors camera-head enters appartment

Google's deep concern about peoples’ privacy had let it to introduce features such as automated blurring of faces…

Automated privacy features and product placements with revenue shared with residents
 
… and the business model of the scheme was devilishly simple: the contents of peoples’ houses served as product placements charged to advertisers, with 1/10 of a cent per automatically recognized brand name going to the residents themselves.  As shown below, people can choose to obfuscate products worth more than 5,000 Euros if concerned about attracting thieves – an example of the advanced privacy options and levels the service makes possible.

Google Indoors app experience

Check out the video.  Navigation features within houses are amazing!  From the amount of effort and wit put into it by a major TV show, I'd wager that even if Google's troubles with Germany around Street View are over, its problems with Germans around privacy may not be. 

Frankly, Das Erste (meaning “The First”) has to be congratulated on one of the best crafted April Fools you will have witnessed.  I don't have the command of German language or politics (!) to understand all the subtleties, but friends say the piece is teeming with irony.  And given Eric Schmidt's policy of getting as close to “creepy” as possible, who wouldn't find the video at least partly believable?

[Thanks to Kai Rannenberg for the heads up.]

Malcolm Compton on power imbalance and security

Australia's CRN reports that former Australian Privacy Commissioner Malcolm Crompton has called for the establishment of a formal privacy industry to rethink identity management in an increasingly digital world:

Addressing the Cards & Payments Australasia conference in Sydney this week, Crompton said the online environment needed to become “safe to play” from citizens’ perspective.

While the internet was built as a “trusted environment”, Crompton said governments and businesses had emerged as “digital gods” with imbalanced identification requirements.

Power allocation is where we got it wrong,” he said, warning that organisations’ unwarranted emphasis on identification had created money-making opportunities for criminals.

Malcolm puts this well.  I too have come to see that the imbalance of power between individual users and Internet business is one of the key factors blocking the emergence of a safe Internet. 

CRN continues:

Currently, users were forced to provide personal information to various email providers, social networking sites, and online retailers in what Crompton described as “a patchwork of identity one-offs”.

Not only were login systems “incredibly clumsy and easy to compromise”; centralised stores of personal details and metadata created honeypots of information for identity thieves, he said…

Refuting arguments that metadata – such as login records and search strings – was unidentifiable, Crompton warned that organisations hording such information would one day face a user revolt

He also recommended the use of cloud-based identification management systems such as Azigo, Avoco and OpenID, which tended to give users more control of their information and third-party access rights.

User-centricity was central to Microsoft chief identity architect Kim Cameron’s ‘Laws of Identity’ (pdf), as well as Canadian Privacy Commissioner Ann Cavoukian’s seven principles of ‘Privacy by Design’ (pdf).

Full article here.

Lazy headmasters versus the Laws of Identity

Ray Corrigan routinely combines legal and technological insight at B2fxxx – Random thoughts on law, the Internet and society, and his book on Digital Decision Making is essential.  His work often leaves me feeling uncharacteristically optimistic – living proof that a new kind of legal thinker is emerging with the technological depth needed to be a modern day Solomon.

I hadn't noticed the UK's new Protection of Freedoms Bill until I heard cabinet minister Damian Green talk about it as he pulverized the UK's centralized identity database recently.  Naturally I turned to Ray Corrigan for comment, only to discover that the political housecleaning had also swept away the assumptions behind widespread fingerprinting in Britain's schools, reinstating user control and consent. 

According to TES Connect:

The new Protection of Freedoms Bill gives pupils in schools and colleges the right to refuse to give their biometric data and compels schools to make alternative provision for them.  The several thousand schools that already use the technology will also have to ask permission from parents retrospectively, even if their systems have been established for years…

It turns out that Britain's headmasters, apparently now a lazy bunch, have little stomach for trivialities like civil liberties.  And writing about this, Ray's tone seems that of a judge who has had an impetuous and over-the-top barrister try to bend the rules one too many times.  It is satisfying to see Ray send them home to study the Laws of Identity as scientific laws governing identity systems.   I hope they catch up on their homework…

The Association of School and College Leaders (ASCL) is reportedly opposing the controls on school fingerprinting proposed in the UK coalition government's Protection of Freedoms Bill.

I always understood the reason that unions existed was to protect the rights of individuals. That ASCL should give what they perceive to be their own members’ managerial convenience priority over the civil rights of kids should make them thoroughly ashamed of themselves.  Oh dear – now head teachers are going to have to fill in a few forms before they abuse children's fundamental right to privacy – how terrible.

Although headteachers and governors at schools deploying these systems may be typically ‘happy that this does not contravene the Data Protection Act’, a number of leading barristers have stated that the use of such systems in schools may be illegal on several grounds. As far back as 2006 Stephen Groesz, a partner at Bindmans in London, was advising:

“Absent a specific power allowing schools to fingerprint, I'd say they have no power to do it. The notion you can do it because it's a neat way of keeping track of books doesn't cut it as a justification.”

The recent decisions in the European Court of Human rights in cases like S. and Marper v UK (2008 – retention of dna and fingerprints) and Gillan and Quinton v UK (2010 – s44 police stop and search) mean schools have to be increasingly careful about the use of such systems anyway. Not that most schools would know that.

Again the question of whether kids should be fingerprinted to get access to books and school meals is not even a hard one! They completely decimate Kim Cameron's first four laws of identity.

1. User control and consent – many schools don't ask for consent, child or parental, and don't provide simple opt out options

2. Minimum disclosure for constrained use – the information collected, children's unique biometrics, is disproportionate for the stated use

3. Justifiable parties – the information is in control of or at least accessible by parties who have absolutely no right to it

4. Directed identity – a unique, irrevocable, omnidirectional identifier is being used when a simple unidirectional identifier (eg lunch ticket or library card) would more than adequately do the job.

It's irrelevant how much schools have invested in such systems or how convenient school administrators find them, or that the Information Commissioner's Office soft peddled their advice on the matter (in 2008) in relation to the Data Protection Act.  They should all be scrapped and if the need for schools to wade through a few more forms before they use these systems causes them to be scrapped then that's a good outcome from my perspective.

In addition just because school fingerprint vendors have conned them into parting with ridiculous sums of money (in school budget terms) to install these systems, with promises that they are not really storing fingerprints and they can't be recreated, there is no doubt it is possible to recreate the image of a fingerprint from data stored on such systems. Ross, A et al ‘From Template to Image: Reconstructing Fingerprints from Minutiae Points’ IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 29, No. 4, April 2007 is just one example of how university researchers have reverse engineered these systems. The warning caveat emptor applies emphatically to digital technology systems that buyers don't understand especially when it comes to undermining the civil liberties of our younger generation.