Azure Active Directory B2C is now in public preview

For the last several years I’ve been working on a new technology and capability that we are calling “Azure Active Directory B2C.”   I’m delighted that I’m finally able to tell you about it, and share the ideas behind it.

For me it is the next step in the journey to give individual consumers, enterprises and governments the identity systems they need in this period of continuously more digital interaction and increasing threats to our security and privacy.

I don’t normally put official Microsoft content on these pages, but given how important the B2C initiative is, how closely I’ve been involved, and how well it has been received, I think it makes sense to show you Microsoft’s announcement about “B2C Basic”.  It appeared on the Azure Active Directory Blog.  Stuart Kwan does a great job of introducing you to the product.

I hope you’ll take a look at his introduction.  I’ll be posting a number of pieces which expand on it – exploring issues we faced, giving you the background on the thinking behind the architecture and implementation, and telling you about the “B2C Premium” offering that is coming soon. I think the combination of Basic’s accessibility and Premium’s feature completeness really offers a new paradigm and amazing opportunities for everyone.

Introducing Microsoft Azure Active Directory B2C

By Stuart Kwan

With Azure Active Directory B2C we’re extending Azure AD to address consumer identity management for YOUR applications:

  • Essential identity management for web, PC, and mobile apps: Support sign in to your application using popular social networks like Facebook or Google, or create accounts with usernames and passwords specifically for your application. Self-service password management and profile management are provided out of the box. Phone-based multi-factor authentication enables an extra measure of protection.
  • Highly customizable and under your control: Sign up and sign in experiences are in the critical path of the most important activities in your applications. B2C gives you a high degree of control over the look and feel of these experiences, while at the same time reducing the attack surface area of your application – you never have to handle a user’s password. Microsoft is completely under the covers and not visible to your end users. Your user data belongs to you, not Microsoft, and is under your control.
  • Proven scalability and availability: Whether you have hundreds of users or hundreds of millions of users, B2C is designed to handle your load, anywhere in the world. Azure AD is deployed in more than two dozen datacenters, and services hundreds of millions of users with billions of authentications per day. Our engineers monitor the service 24/7.
  • Unique user protection features: Microsoft invests deeply in protection technology for our users. We have teams of domain experts that track the threat landscape. We’re constantly monitoring sign up and sign in activity to identify attacks and adapt our protection mechanisms. With B2C we’ll apply these anomaly, anti-fraud, and account compromise detection systems to your users.
  • Pay as you go: Azure Active Directory is a global service benefiting from tremendous economies of scale, allowing us to pass these savings along to you. We offer the B2C service on a consumption basis – you only pay for the resources that you use. Developers can take advantage of the free tier of the service when building their application.

B2C uses the same familiar programming model of Azure Active Directory. You can quickly and easily connect your application to B2C using industry standards OAuth 2.0 and OpenID Connect for authentication, and OData v3 for user management via our Graph API. Web app, web API, mobile and PC app scenarios are fully supported. The same open source libraries that are used with Azure Active Directory can be used with B2C to accelerate development.

If you want, you can get started right now! The rest of this post takes a look at how B2C works in detail.

How it works

The best way to describe B2C is to see it in action. Let’s look at an example. Our heroes, Proseware, have a consumer-facing web site. The site uses B2C for identity management. In this case that means sign in, and user self-service sign up, profile management, and password reset. Here’s the Proseware homepage:

A new user would click sign up to create a new account. They have the choice of creating an account using Google, Facebook, or by creating a Proseware account:

One quick note. The Microsoft button doesn’t work yet, but it will soon. It isn’t available at the start of the preview as we have more work to do in our converged programming model before we enable this.

What’s a Proseware account? As it turns out, there are many people out there who don’t always want to use a social account to sign in. You probably have your own personal decision tree for when you use your Facebook, Google, Microsoft or other social account to sign in, or when you create an account specifically for a site or app. In B2C a Proseware account is what we call a local account. It’s an account that gets created in the B2C tenant using an email address or a flat string as a username, and a password that is stored in the tenant. It’s local because it only works with apps registered in your B2C tenant. It can’t be used to sign in to Office 365, for example.

If a person decides to sign up with a social account, B2C uses information from the social account to pre-fill the user object that will be created in the B2C tenant, and asks the user for any other attributes configured by the developer:

Here we can see the user is also asked to enter a Membership Number and Offering Type. These are custom attributes the Proseware developer has added to the schema of the B2C tenant.

If a person decides to sign up with a Proseware account, B2C gathers the attributes configured by the developer plus information needed to create a local account. In this case the developer has configured local accounts using email as username, so the person signing up is also asked to verify their email address:

B2C takes care of verifying the person signing up has control of that email address before allowing them to proceed. Voila, the user is signed up and signed in to Proseware!

You might ask yourself, how much code did I need to write to make this elaborate sign up screen? Actually, almost none. The sign up page is being rendered by Azure AD B2C, not by the Proseware application. I didn’t have to write any code at all for the logic on that page. I only had to write the HTML and CSS so the page rendered with a Proseware look and feel. The logic for verifying the user’s email address and everything else on the page is B2C code. All I had to do was send an OpenID Connect request to B2C requesting the user sign up flow. I’ll go into more detail on this later when I talk about how I wrote the app and configured the B2C tenant.

Let’s look at a return visit. The user returns and clicks sign-in:

If the user clicks one of the social network providers, B2C will direct the person to the provider to sign in. Upon their return B2C also picks up attributes stored in the directory and returns them to the app, signing the user in.

If the user clicks the Proseware account button, they’ll see the local account sign in page, enter their name and password, and sign in:

That’s it! Now I’ll show you how I built this example.

Configuring Azure AD B2C

Step one was to get an Azure AD B2C tenant. You can do this by going to the Azure AD section of the Azure management portal and creating a B2C tenant (for a shortcut, see the B2C getting started page). B2C tenants are a little different from regular Azure AD tenants. For example, in a regular tenant, by default users can see each other in the address book. That’s what you’d expect in a company or school – people can look each other up. In a B2C tenant, by default users cannot see each other in the address book. That’s what you’d expect – your consumer users shouldn’t be able to browse each other!

Once you have a B2C tenant, you register applications in the tenant and configure policies which drive the behavior of sign in, sign up, and other user experiences. Policies are the secret sauce of Azure AD B2C. To configure these policies, you jump through a link to the new Azure management portal:

This is also the place where you find controls for setting up applications, social network providers, and custom attributes. I’m going to focus on sign up policy for this example. Here’s the list of sign up policies in the tenant. You can create more than one, each driving different behavior:

For the Proseware example I created the B2C_1_StandardSignUp policy. This policy allows a user to sign up using Facebook, Google, or email-named local accounts:

In sign up attributes I indicated what attributes should be gathered from the user during sign up. The list includes custom attributes I created earlier, Membership Number and Offering Type:

When a user completes sign up they are automatically signed in to the application. Using Application Claims I select what attributes I want to send to the application from the directory at that moment:

I’m not using multifactor authentication in this example, but if I did it’s just a simple on/off switch. During sign up the user would be prompted to enter their phone number and we would verify it in that moment.

Finally, I configured user experience customizations. You might have noticed that the sign up and sign-in experiences have a Proseware look and feel, and there isn’t much if any visual evidence of Microsoft or Azure AD. We know that for you to build compelling consumer-facing experiences you have to have as much control as possible over look and feel, so B2C is very customizable even in this initial preview. We do this by enabling you to specify HTML and CSS for the pages rendered by B2C. Here’s what the sign up page would look like with the default look and feel:

But if I configure a B2C with a URL to a web page I created with Proseware-specific look and feel:

Then the sign up experience looks like this:

You can probably imagine a number of different approaches for this kind of customization. We’re partial to this approach, as opposed to say an API-based approach, because it means our servers are responsible for correct handling of things like passwords, and our protection systems can gather the maximum signal from the client for anomaly detection. In an API-based approach, your app would need to gather and handle passwords, and some amount of valuable signal would be lost.

One quick side note. In the initial preview it is possible to do HTML/CSS customization of all the pages except the local account sign in page. That page currently supports Azure AD tenant-branding style customization. We’ll be adding the HTML/CSS customization of the sign in page before GA. Also, we currently block the use of JavaScript for customization, but we expect to enable this later.

That’s a quick look at how I set up a sign up policy. Configuring other policies like sign in and profile management is very similar. As I mentioned earlier, you can create as many policies as you want, so you can trigger different behaviors even within the same app. How to do that? By requesting a specific policy at runtime! Let’s look at the code.

Building an app that uses B2C

The programming model for Azure AD B2C is super simple. Every request you send to B2C is an OAuth 2.0 or OpenID Connect request with one additional parameter, the policy parameter “p=”. This instructs B2C which policy you want to apply to the request. When someone clicks the sign up button on the Proseware web app, the app sends this OpenID Connect sign-in request:

nonce= WzRMD9LC95HeHvDz&

The policy parameter in this example invokes the sign up policy called b2c_1_standardsignup. The OpenID Connect response contains an id_token as usual, carrying the claims I configured in the policy:



Decoding the id_token from the response yields:

typ: “JWT”,
alg: “RS256″,
kid: “IdTokenSigningKeyContainer”
exp: 1442127696,
nbf: 1442124096,
ver: “1.0”,
iss: “”,
acr: “b2c_1_standardsignup”,
sub: “Not supported currently. Use oid claim.”,
aud: “9bdade37-a70b-4eee-ae7a-b38e2c8a1416″,
nonce: “WzRMD9LC95HeHvDz”,
iat: 1442124096,
auth_time: 1442124096,
oid: “2c75d1d5-59af-479b-a9c3-d841ff298216″,
emails: [
idp: “localAccountAuthentication”,
name: “Stuart Kwan”,
extension_MembershipNumber: “1234”,
extension_OfferingType: “1”

Here you can see the usual claims returned by Azure Active Directory and also a few more. The custom attributes I added to the directory and requested of the user during sign up are returned in the token as extension_MembershipNumber and extension_OfferingType. You can also see the name of the policy that generated this token in the acr claim. By the way, we are in the process of taking feedback on claim type names and aligning ourselves better with the standard claim types in the OpenID Connect 1.0 specification. You should expect things to change here during the preview.

Since Azure AD B2C is in fact, Azure AD, it has the same programming model as Azure AD. Which means full support for web app, web API, mobile and PC app scenarios. Data in the directory is managed with the REST Graph API, so you can create, read, update, and delete objects the same way you can in a regular tenant. And this is super important – you can pick and choose what features and policies you want to use. If you want to build the user sign up process entirely yourself and manage users via the Graph API, you can absolutely do so.

B2C conforms to Azure AD’s next generation app model, the v2 app model. To build your application you can make protocol calls directly, or you can use the latest Azure Active Directory libraries that support v2. To find out more visit the B2C section of the Azure AD developer guide – we’ve got quickstart samples, libraries, and reference documentation waiting for you. Just for fun, I built the Proseware example using Node.js on an Ubuntu Linux virtual machine running on Microsoft Azure (shout out to @brandwe for helping me with the code!).

How much will it cost?

B2C will be charged on a consumption basis. You pay only for the resources you use. There will be three meters, billed monthly:

  1. Number of user accounts in the directory
  2. Number of authentications
  3. Number of multi-factor authentications

An authentication is defined as any time an application requests a token for a resource and successfully receives that token (we won’t charge for unsuccessful requests). When you consider the OAuth 2.0 protocol, this counts as when a user signs in with a local account or social account, and also when an application uses a refresh token to get a new access token.

You can find the B2C pricing tiers on the pricing page. There will be a free tier for developers who are experimenting with the service. The current B2C preview is free of charge and preview tenants are capped at 50,000 users. We can raise that cap for you on a case by case basis if you contact us. We’ll lift the cap when billing is turned on. Do you have hundreds of millions of users? No problem. Bring ‘em on!

What’s next

We’ve already worked with many developers to build apps using Azure AD B2C as part of a private preview program. Along the way we’ve gathered a healthy backlog of features:

  1. Full UX customization: Not just the aforementioned HTML/CSS customization of the local account sign in page, but also the ability to have your URL appear in the browser for every page rendered by B2C. That will remove the last visible remnant of Microsoft from the UX.
  2. Localization: Of course you have users all over the world speaking many languages. Sign in, sign up, and other pages need to render appropriately using strings you provide in the languages you want to support.
  3. Token lifetime control: The ability to control the lifetimes of Access Tokens, ID Tokens and Refresh Tokens is important both for user experience and for you to tune your consumption rate.
  4. A hook at the end of sign up: A number of people have said they want the ability to check a user who is signing up against a record in a different system. A little hook at the end of sign up would allow them to do this, so we’re considering it.
  5. Support for more social networks.
  6. Support for custom identity providers: This would be the ability to, say, add an arbitrary SAML or OpenID Connect identity provider to the tenant.
  7. A variety of predefined reports: So that you can review the activity in your tenant at a glance and without having to write code to call an audit log API.
  8. And more, this is just a fraction of the list…

You can track our progress by following the What’s New topic in the B2C section of the Azure AD developer guide, which you can find in the documentation pages and also by following this blog.

By the way, the proper name of this preview is the Azure Active Directory B2C Basic preview. We’re planning a Premium offering as well, with features that take policies to the next level. But that’s for another blog post!

Please write us

We’re eager to hear your feedback! We monitor stackoverflow (tag: azure-active-directory) for development questions. If you have a feature suggestion, please post it in the Azure Active Directory User Voice site and put “AADB2C:” in the title of your suggestion.

Stuart Kwan (Twitter: @stuartkwan)
Principal Program Manager
Azure Active Directory


24 year old student lights match: Europe versus Facebook

If you are interested in social networks, don't miss the slick video about Max Schrems’ David and Goliath struggle with Facebook over the way they are treating his personal information.  Click on the red “CC” in the lower right-hand corner to see the English subtitles.

Max is a 24 year old law student from Vienna with a flair for the interview and plenty of smarts about both technology and legal issues.  In Europe there is a requirement that entities with data about individuals make it available to them if they request it.  That's how Max ended up with a personalized CD from Facebook that he printed out on a stack of paper more than a thousand pages thick (see image below). Analysing it, he came to the conclusion that Facebook is engineered to break many of the requirements of European data protection.  He argues that the record Facebook provided him finds them to be in flagrante delicto.  

The logical next step was a series of 22 lucid and well-reasoned complaints that he submitted to the Irish Data Protection Commissioner (Facebook states that European users have a relationship with the Irish Facebook subsidiary).  This was followed by another perfectly executed move:  setting up a web site called Europe versus Facebook that does everything right in terms using web technology to mount a campaign against a commercial enterprise that depends on its public relations to succeed.

Europe versus Facebook, which seems eventually to have become an organization, then opened its own YouTube channel.  As part of the documentation, they publicised the procedure Max used to get his personal CD.  Somehow this recipe found its way to reddit  where it ended up on a couple of top ten lists.  So many people applied for their own CDs that Facebook had to send out an email indicating it was unable to comply with the requirement that it provide the information within a 40 day period.

If that seems to be enough, it's not all.  As Max studied what had been revealed to him, he noticed that important information was missing and asked for the rest of it.  The response ratchets the battle up one more notch: 

Dear Mr. Schrems:

We refer to our previous correspondence and in particular your subject access request dated July 11, 2011 (the Request).

To date, we have disclosed all personal data to which you are entitled pursuant to Section 4 of the Irish Data Protection Acts 1988 and 2003 (the Acts).

Please note that certain categories of personal data are exempted from subject access requests.
Pursuant to Section 4(9) of the Acts, personal data which is impossible to furnish or which can only be furnished after disproportionate effort is exempt from the scope of a subject access request. We have not furnished personal data which cannot be extracted from our platform in the absence of is proportionate effort.

Section 4(12) of the Acts carves out an exception to subject access requests where the disclosures in response would adversely affect trade secrets or intellectual property. We have not provided any information to you which is a trade secret or intellectual property of Facebook Ireland Limited or its licensors.

Please be aware that we have complied with your subject access request, and that we are not required to comply with any future similar requests, unless, in our opinion, a reasonable period of time has elapsed.

Thanks for contacting Facebook,
Facebook User Operations Data Access Request Team

What a spotlight

This throws intense light on some amazingly important issues. 

For example, as I wrote here (and Max describes here), Facebook's “Like” button collects information every time an Internet user views a page containing the button, and a Facebook cookie associates that page with all the other pages with “Like” buttons visited by the user in the last 3 months. 

If you use Facebook, records of all these visits are linked, through cookies, to your Facebook profile – even if you never click the “like” button.  These long lists of pages visited, tied in Facebook's systems to your “Real Name identity”, were not included on Max's CD. 

Is Facebook prepared to argue that it need not reveal this stored information about your personal data because doing so would adversely affect its “intellectual property”? 

It will be absolutely amazing to watch how this issue plays out, and see just what someone with Max's media talent is able to do with the answers once they become public. 

The result may well impact the whole industry for a long time to come.

Meanwhile, students of these matters would do well to look at Max's many complaints:








Pokes are kept even after the user “removes” them.

Filed with the Irish DPC

Complaint (PDF)
Attachments (ZIP)



Shadow Profiles.
Facebook is collecting data about people without their knowledge. This information is used to substitute existing profiles and to create profiles of non-users.

Filed with the Irish DPC

Complaint (PDF)
Attachments (ZIP)



Tags are used without the specific consent of the user. Users have to “untag” themselves (opt-out).
Info: Facebook announced changes.

Filed with the Irish DPC

Complaint (PDF)
Attachments (ZIP)



Facebook is gathering personal data e.g. via its iPhone-App or the “friend finder”. This data is used by Facebook without the consent of the data subjects.

Filed with the Irish DPC

Complaint (PDF)
Attachments (ZIP)



Deleted Postings.
Postings that have been deleted showed up in the set of data that was received from Facebook.

Filed with the Irish DPC

Complaint (PDF)
Attachments (ZIP)



Postings on other Users’ Pages.
Users cannot see the settings under which content is distributed that they post on other’s pages.

Filed with the Irish DPC

Complaint (PDF)
Attachments (ZIP)



Messages (incl. Chat-Messages) are stored by Facebook even after the user “deleted” them. This means that all direct communication on Facebook can never be deleted.

Filed with the Irish DPC

Complaint (PDF)
Attachments (ZIP)



Privacy Policy and Consent.
The privacy policy is vague, unclear and contradictory. If European and Irish standards are applied, the consent to the privacy policy is not valid.

Filed with the Irish DPC

Complaint (PDF)
Attachments (ZIP)



Face Recognition.
The new face recognition feature is an inproportionate violation of the users right to privacy. Proper information and an unambiguous consent of the users is missing.

Filed with the Irish DPC

Complaint (PDF)
Attachments (ZIP)



Access Request.
Access Requests have not been answered fully. Many categories of information are missing.

Filed with the Irish DPC

Complaint (PDF)
Attachments (ZIP)



Deleted Tags.
Tags that were “removed” by the user, are only deactivated but saved by Facebook.

Filed with the Irish DPC

Complaint (PDF)
Attachments (ZIP)



Data Security.
In its terms, Facebook says that it does not guarantee any level of data security.

Filed with the Irish DPC

Complaint (PDF)
Attachments (ZIP)



Applications of “friends” can access data of the user. There is no guarantee that these applications are following European privacy standards.

Filed with the Irish DPC

Complaint (PDF)
Attachments (ZIP)



Deleted Friends.
All removed friends are stored by Facebook.

Filed with the Irish DPC

Complaint (PDF)
Attachments (ZIP)



Excessive processing of Data.
Facebook is hosting enormous amounts of personal data and it is processing all data for its own purposes.
It seems Facebook is a prime example of illegal “excessive processing”.

Filed with the Irish DPC

Complaint (PDF)
Attachments (ZIP)



Facebook is running an opt-out system instead of an opt-in system, which is required by European law.

Filed with the Irish DPC

Complaint (PDF)
Attachments (ZIP)



Letter from the Irish DPC.


Letter (PDF)



Letter to the Irish DPC concerning the new privacy policy and new settings on Facebook.


Letter (PDF)



Like Button.
The Like Button is creating extended user data that can be used to track users all over the internet. There is no legitimate purpose for the creation of the data. Users have not consented to the use.

Filed with the Irish DPC

Complaint (PDF)
Attachments (ZIP)



Obligations as Processor.
Facebook has certain obligations as a provider of a “cloud service” (e.g. not using third party data for its own purposes or only processing data when instructed to do so by the user).

Filed with the Irish DPC

Complaint (PDF)
Attachments (ZIP)



Picture Privacy Settings.
The privacy settings only regulate who can see the link to a picture. The picture itself is “public” on the internet. This makes it easy to circumvent the settings.

Filed with the Irish DPC

Complaint (PDF)
Attachments (ZIP)



Deleted Pictures.
Facebook is only deleting the link to pictures. The pictures are still public on the internet for a certain period of time (more than 32 hours).

Filed with the Irish DPC

Complaint (PDF)
Attachments (ZIP)



Users can be added to groups without their consent. Users may end up in groups that lead other to false impressions about a person.

Filed with the Irish DPC

Complaint (PDF)
Attachments (ZIP)



New Policies.
The policies are changed very frequently, users do not get properly informed, they are not asked to consent to new policies.

Filed with the Irish DPC

Complaint (PDF)
Attachments (ZIP)


Head over to the Office of Inadequate Security

First of all, I have to refer readers to the Office of Inadequate Security, apparently operated by I suggest heading over there pretty quickly too – the office is undoubtedly going to be so busy you'll have to line up as time goes on.

So far it looks like the go-to place for info on breaches – it even has a twitter feed for breach junkies.

Recently the Office published an account that raises a lot of questions:

I just read a breach disclosure to the New Hampshire Attorney General’s Office with accompanying notification letters to those affected that impressed me favorably. But first, to the breach itself:, a site that allows students to book trips for school vacation breaks, suffered a breach in their system that they learned about on June 9 after they started getting reports of credit card fraud from customers. An FAQ about the breach, posted on explains:

StudentCity first became concerned there could be an issue on June 9, 2011, when we received reports of customers travelling together who had reported issues with their credit and debit cards. Because this seemed to be with 2011 groups, we initially thought it was a hotel or vendor used in conjunction with 2011 tours. We then became aware of an account that was 2012 passengers on the same day who were all impacted. This is when we became highly concerned. Although our processing company could find no issue, we immediately notified customers about the incident via email, contacted federal authorities and immediately began a forensic investigation.

According to the report to New Hampshire, where 266 residents were affected, the compromised data included students’ credit card numbers, passport numbers, and names. The FAQ, however, indicates that dates of birth were also involved.

Frustratingly for StudentCity, the credit card data had been encrypted but their investigation revealed that the encryption had broken in some cases. In the FAQ, they explain:

The credit card information was encrypted, but the encryption appears to have been decoded by the hackers. It appears they were able to write a script to decode some information for some customers and most or all for others.

The letter to the NH AG’s office, written by their lawyers on July 1, is wonderfully plain and clear in terms of what happened and what steps StudentCity promptly took to address the breach and prevent future breaches, but it was the tailored letters sent to those affected on July 8 that really impressed me for their plain language, recognition of concerns, active encouragement of the recipients to take immediate steps to protect themselves, and for the utterly human tone of the correspondence.

Kudos to and their law firm, Nelson Mullins Riley & Scarborough, LLP, for providing an exemplar of a good notification.

It would be great if StudentCity would bring in some security experts to audit the way encryption was done, and report on what went wrong. I don't say this to be punitive, I agree that StudentCity deserves credit for at least attempting to employ encryption. But the outcome points to the fact that we need programming frameworks that make it easy to get truly robust encryption and key protection – and to deploy it in a minimal disclosure architecture that keeps secrets off-line. If StudentCity goes the extra mile in helping others learn from their unfortunate experience, I'll certainly be a supporter.

Change of status

My work status has gone through “some changes” recently.

A number of readers have written to me about Mary Jo Foley's report on a “goodbye party” thrown at Microsoft a few weeks ago when I officially gave up my role as Chief Architect of Identity.  Others saw Vittorio Bertocci‘s kind recollection of the progress we made over the years.

When Tim Cole interviewed me about my plans a few days later at the European Identity Conference, I hadn't made the slightest progress in terms of thinking about my future…  I did say, though, that I hoped to keep my hand in the identity and social computing space to the extent that people found my input useful.

One way to do this was to look for opportunities to participate in interesting efforts on a per-project basis.  It turns out that within a few days I was asked to do this with Microsoft over the summer.  Not exactly a complete change (!) but it still feels liberating and different.

Don't worry – I won't bore you with reports on my gigs going forward, but thought in the interests of full disclosure, you should know how this particular situation is evolving :)

Takeaway:  Life is good, and even more than ever, this blog represents my own views, which can't be blamed on anyone else even when I wish they could.

What Could Google Do With the Data It's Collected?

Niraj Chokshi has published a piece in The Atlantic where he grapples admirably with the issues related to Google's collection and use of device fingerprints (technically called MAC Addresses).  It is important and encouraging to have journalists like Niraj taking the time to explore these complex issues.  

But I have to say that such an exploration is really hard right now. 

Whether on purpose or by accident, the Google PR machine is still handing out contradictory messages.  In particular, the description in Google's Refresher FAQ titled “How does this location database work?” is currently completely different from (read: the opposite of) what its public relations people are telling journalists like Nitaj.  I think reestablishing credibility around location services requires the messages to be made consistent so they can be verified by data protection authorities.

Here are some excerpts from the piece – annotated with some comments by me.  [Read the whole article here.] 

The Wi-Fi data Google collected in over 30 countries could be more revealing than initially thought…

Google's CEO Eric Schmidt has said the information was hardly useful and that the company had done nothing with it. The search giant has also been ordered (or sought) to destroy the data. According to their own blog post, Google logged three things from wireless networks within range of their vans: snippets of unencrypted data; the names of available wireless networks; and a unique identifier associated with devices like wireless routers. Google blamed the collection on a rogue bit of code that was never removed after it had been inserted by an engineer during testing.

[The statement about rogue code is an example of the PR ambiguity Nitaj and other journalists must deal with.  Google blogs don't actually blame the collection of unique identifiers on rogue code, although they seem crafted to leave people with that impression.  Spokesmen only blame rogue code for the collection of unencrypted data content (e.g. email messages.) – Kim]

Each of the three types of data Google recorded has its uses, but it's that last one, the unique identifier, that could be valuable to a company of Google's scale. That ID is known as the media access control (MAC) address and it is included — unencrypted, by design — in any transfer, blogger Joe Mansfield explains.

Google says it only downloaded unencrypted data packets, which could contain information about the sites users visited. Those packets also include the MAC address of both the sending and receiving devices — the laptop and router, for example.

[Another contradiction: Google PR says it “only” collected unencrypted data packets, but Google's GStumbler report  says its cars did collect and record the MAC addresses from encrypted data frames as well. – Kim]

A company as large as Google could develop profiles of individuals based on their mobile device MAC addresses, argues Mansfield:

Get enough data points over a couple of months or years and the database will certainly contain many repeat detections of mobile MAC addresses at many different locations, with a decent chance of being able to identify a home or work address to go with it.

Now, to be fair, we don't know whether Google actually scrubbed the packets it collected for MAC addresses and the company's statements indicate they did not. [Yet the GStumbler report says ALL MAC addresses were recorded – Kim].  The search giant even said it “cannot identify an individual from the location data Google collects via its Street View cars.”  Add a step, however, and Google could deduce an individual from the location data, argues Avi Bar-Zeev, an employee of Microsoft, a Google competitor.

[Google] could (opposite of cannot) yield your identity if you've used Google's services or otherwise revealed it to them in association with your IP address (which would be the public IP of your router in most cases, visible to web servers during routine queries like HTTP GET). If Google remembered that connection (and why not, if they remember your search history?), they now have your likely home address and identity at the same time. Whether they actually do this or not is unclear to me, since they say they can't do A but surely they could do B if they wanted to.

Theoretically, Google could use the MAC address for a mobile device — an iPod, a laptop, etc. — to build profiles of an individual's activity. (It's unclear whether they did and Google has indicated that they have not.) But there's also value in the MAC addresses of wireless routers.

Once a router has been associated with a real-world location, it becomes useful as a reference point. The Boston company Skyhook Wireless, for example, has long maintained a database of MAC addresses, collected in a (slightly) less-intrusive way. Skyhook is the primary wireless positioning system used by Apple's iPhone and iPod Touch. (See a map of their U.S. coverage here.) When your iPod Touch wants to retrieve the current location, it shares the MAC addresses of nearby routers with Skyhook which pings its database to figure out where you are.

Google Latitude, which lets users share their current location, has at least 3 million active users and works in a similar way. When a user decides to share his location with any Google service on a non-GPS device, he sends all visible MAC addresses in the vicinity to the search giant, according to the company's own description of how its location services works.

[Update: Google's own “refresher FAQ” states that a user of its geo-location services, such as Latitude, sends all MAC addresses “currently visible to the device” to Google, but a spokesman said the service only collects the MAC addresses of routers. That FAQ statment is the basis of the following argument.]

This is disturbing, argues blogger Kim Cameron (also a Microsoft employee), because it could mean the company is getting not only router addresses, but also the MAC addresses of devices such as laptops and iPods. If you are sitting next to a Google Latitude user who shares his location, Google could know the address and location of your device even though you didn't opt in. That could then be compared with all other logged instances of your MAC address to develop a profile of where the device is and has been.

Google denies using the information it collected and, if the company is telling the truth, then only data from unencrypted networks was intercepted anyway, so you have less to worry about if your home wireless network is password-protected. (It's still not totally clear whether only router MAC addresses were collected. Google said it collected the information for devices “like a WiFi router.”) Whether it did or did not collect or use this information isn't clear, but Google, like many of its competitors, has a strong incentive to get this kind of location data.

[Again, and I really do feel for Niraj, the PR leaves the impression that if you have passwords and encryption turned on you have nothing to worry about, but Googles’ GStumbler report says that passwords and encryption did not prevent the collection of the MAC addresses of phones and laptops from homes and businesses. – Kim]

I really tuned in to these contradictory messages when a reader first alerted me to Niraj's article.   It looked like this:

My comments earned their strike-throughs when a Google spokesman assured the Atlantic “the Service only collects the MAC addresses of routers.”  I pointed out that my statement was actually based on Google's own FAQ, and it was their FAQ (“How does this location database work?”) – rather than my comments – that deserved to be corrected.  After verifying that this was true, Niraj agreed to remove the strikethrough.

How can anyone be expected to get this story right given the contradictions in what Google says it has done?

In light of this, I would like to see Google issue a revision to its “Refresher FAQ” that currently reads:

The “list of MAC addresses which are currently visible to the device” would include the addresses of nearby phones and laptops.  Since Google PR has assured Niraj that “the service only collects the MAC addresses of routers”, the right thing to do would be to correct the FAQ so it reads:

  • “The user’s device sends a request to the Google location server with the list of MAC addresses found in Beacon Frames announcing a Network Access Point SSID and excluding the addresses of end user devices like WiFi enabled phones and laptops.”

This would at least reassure us that Google has not delivered software with the ability to track non-subscribers and this could be verified by data protection authorities.  We could then limit our concerns to what we need to do to ensure that no such software is ever deployed in the future.


Cloud computing: an unsatisfied customer?

Gunnar Peterson has written many good things about architecture and identity over the last few years. Now he lays down the guantlet and challenges cloud advocates with a great video that throws all the fundamental issues into sardonic relief. Everyone involved with the cloud should watch this video repeatedly and come back with really good answers to all that is implied and questioned… albeit through humor.

Not Invented Here

There's a new comic strip about software with the, um, mysterious title, Not Invented Here (I just caught the preposterous domain name:…   The strip deals with issues like security, and comments posted by readers say things like, “I DEMAND you take the bug out of my company's conference room immediately!” and “Wow, it is as if you have a mole in our office!”.   So, with the authors’ permission, here's a taste.

It all starts off innocently enough:

Wait.  I think I've met these people.

Yikes.  Maybe I am these people!

And if you're in the business, you can't miss this one, which will take you over to the NIH site.

If you're wondering where this can possibly come from, the strip is by Bill Barnes and Paul Southworth.  I don't know Paul yet, but readers may know Bill's work from Unshelved, which has been making librarians guffaw for years (an  easy task?)  The truth is, Bill knows a lot about what goes on with software – in fact one of his gigs was herding cats during the first version of CardSpace.  Now he's totally dedicated to his strips – should be a lot of fun – and enlightening too. 

Identity Roadmap Presentation at PDC09

Earlier this week I presented the Identity Keynote at the Microsoft Professional Developers Conference (PDC) in LA.  The slide deck is here, and the video is here.

After announcing the release of the Windows Identity Foundation (WIF) as an Extension to .NET, I brought forward three architect/engineers to discuss how claims had helped them solve their development problems.   I chose these particular guests because I wanted the developer audience to be able to benefit from the insights they had previously shared with me about the advantages – and challenges – of adopting the claims based model.  Each guest talks about the approach he took and the lessons learned.

Andrew Bybee, Principal Program Manager from Microsoft Dynamics CRM, talked about the role of identity in delivering the “the Power of Choice” – the ability for his customers to run his software wherever they want, on premises or in the cloud or in combination, and to offer access to anyone they choose.

Venky Veeraraghavan, the Program Manager in charge of identity for SharePoint, talks about what it was like to completely rethink the way identity works in Sharepoint so it takes advantage of the claims based architecture to solve problems that previously had been impossibly difficult.  He explores the problems of “Multi-hop” systems and web farms, especially the “Dreaded Second Hop” – which he admits “really, really scares us…”  I find his explanation riveting and think any developer of large scale systems will agree.

Dmitry Sotnikov, who is Manager of New Product Research at Quest Software, presents a remarkable Azure-based version of a product Quest has previously offered only “on premise”.  The service is a backup system for Active Directory, and involved solving a whole set of hard identity problems involving devices and data as well as people.

Later in the presentation, while discussing future directions, I announce the Community Technical Preview of our new work on REST-based authorization (a profile of OAuth), and then show the prototype of the mutli-protocol identity selector Mike Jones unveiled at the recent IIW.   And finally, I talk for the first time about “System.Identity”, work on user-centric next generation directory that I wanted to take to the community for feedback.  I'll be blogging about this a lot and hopefully others from the blogosphere will find time to discuss it with me.


Kim Cameron's excellent adventure

I need to correct a few of the factual errors in recent posts by James Governor and Jon Udell.  James begins by describing our recent get-together:

We talked about Project Geneva, a new claims based access platform which supersedes Active Directory Federation Services, adding support for SAML 2.0 and even the open source web authentication protocol OpenID.

Geneva is big news for OpenID. As David Recordon, one of the prime movers behind the standard said on Twitter yesterday:

Microsoft’s Live ID is adding support for OpenID. Goodbye proprietary identity technologies for the web! Good work MSFT

TechCrunch took the story forward, calling out de facto standardization:

Login standard OpenID has gotten a huge boost today from Microsoft, as the company has announced that users will soon be able to login to any OpenID site using their Windows Live IDs. With over 400 million Windows Live accounts (many of which see frequent use on the Live’s Mail and Messenger services), the announcement is a massive win for OpenID. And Microsoft isn’t just supporting OpenID – the announcement goes as far as to call it the de facto login standard [the announcement actually calls it “an emerging, de facto standard” – Kim] 

But that’s not what this post is supposed to be about. No I am talking about the fact [that] later yesterday evening Kim hacked his way into a party at the standard using someone else’s token!  [Now this is where I think some “small tweaks” start to be called for… – Kim]

It happened like this. I was talking to Mary Branscombe, Simon Bisson and John Udell when suddenly Mary jumped up with a big smile on her face. Kim, who has a kind of friendly bear look about him, had arrived. She ran over and then I noticed that a bouncer had his arm across Kim’s chest (”if your name’s not down you’re not coming in”). Kim had apparently wandered upstairs without getting his wristband first. Kim disappeared off downstairs, and I figured he might not even come back. A few minutes later though and there he was. I assumed he had found an organizer downstairs to give him a wristband… When he said that he actually had taken the wristband from someone leaving the party, and hooked it onto his wrist me and John practically pissed our pants laughing. As Jon explains (in Kim Cameron's Excellent Adventure):

If you don’t know who Kim is, what’s cosmically funny here is that he’s the architect for Microsoft’s identity system and one of the planet’s leading authorities on identity tokens and access control.

We stood around for a while, laughing and wondering if Kim would reappear or just call it a night. Then he emerged from the elevator, wearing a wristband which — wait for it — belonged to John Fontana.  Kim hacked his way into the party with a forged credential! You can’t make this stuff up!

While there is certainly some cosmic truth to this description, and while I did in fact back away slightly from the raucus party at the precise moment James says he and Jon “pissed their pants”, John Fontana did NOT actually give me his wristband.  You see, he didn't have a wristband either. 

So let's go through this step by step.  It all began with the invite that brought me to the party in the first place:

As a spokesperson for PDC2008, we’re looking forward to having you join us at the Rooftop Bar of the Standard Hotel for the Media/Analyst party on October 27th at 7:00pm

This invite came directly from the corporate Department of Parties.

I point this out just to ward off any unfair accusations that I just wanted to raid the party's immense Martini bar. Those who know me also know nothing could be further from the truth. You have to force a Martini into my hands.  My attendance represented nothing but Duty.  But I digress.

Protocol Violation

The truth of the matter is that I ran into John Fontana in the cafe of the Standard and we arrived at the party together.  He had been invited because this was, ummm, a Press party and he was, ummm, Press. 

However, it didn’t take more than a few seconds for us to see that the protocol for party access control had not been implemented correctly.   We just assumed this was a bug due to the fact that the party was celebrating a Beta, and that we would have to work our way past it as all beta participants do. 

Let’s just say the token-issuing part of the party infrastructure had crashed, whereas the access control point was operating in an out-of-control fashion.

Looking at it from an architectural point of view, the admission system was based on what is technically called “bearer” tokens (wristbands). Such tokens are NOT actually personalized in any way, or tied to the identity of the person they are given to through some kind of proof. If you “have” the token, you ARE the bearer of the token.

So one of those big ideas slowly began to take root in our minds.  Why not become bearers of the requisite tokens, thereby compensating for the inoperative token-issuing system?

Well, at that point, since not a few of the people leaving the party knew us,  John and I explained our “aha”, and pointed out the moribund token-issuing component.  As is typical of people seeing those in need of help, we were showered with offers of assistance.

I happened to be rescued by an unknown bystander with incredibly nimble and strong fingers and deep expertise with wristband technology.  She was able to easily dislodge her wristband and put it on me in such a way that it’s integrity was totally intact. 

There was no forged token.  There was no stolen token.  It was a real token.  I just became the bearer.

When we got back upstairs, the access control point evaluated my token – and presto – let me in to join a certain set of regaling hedonists basking in the moonlight.  

But sadly – and unfairly –  John’s token was rejected since its donor, lacking the great skill of mine, had damaged it during the token transplant.

Despite the Martini now in my hand, I was overcome by that special sadness you feel when escaping ill fate wrongly allotted to one more deserving of good fortune than you.  John slipped silently out of the queue and slinked off to a completely different party.

So that's it, folks.  Yet the next morning, I had to wake up, and confont again my humdrum life.  But I do so inspired by the kindness of both strangers and friends (have I gone too far?)