More on distributed query

Dave Kearns responded to my post on the Identity Bus with Getting More Violent All the Time (note to the Rating Board: he's talking about violent agreement… which is really rough):

What Kim fails to note… is that a well designed virtual directory (see Radiant Logic's offering, for example) will allow you to do a SQL query to the virtual tables! You get the best of both: up to date data (today's new hires and purchases included) with the speed of an SQL join. And all without having to replicate or synchronize the data. I'm happy, the application is happy – and Kim should be happy too. We are in violent agreement about what the process should look like at the 40,000 foot level and only disagree about the size and shape of the paths – or, more likely, whether they should be concrete or asphalt.

Neil Macehiter answers by making an important distinction that I didn't emphasize enough:

But the issue is not with the language you use to perform the query: it's where the data is located. If you have data in separate physical databases then it's necessary to pull the data from the separate sources and join them locally. So, in Kim's example, if you have 5000 employees and have sold 10000 computers then you need to pull down the 15000 records over the network and perform the join locally (unless you have an incredibly smart distributed query optimiser which works across heterogeneous data stores). This is going to be more expensive than if the computer order and employee data are colocated.

Clayton Donley, who is the Senior Director of Development for Oracle Identity Management, understands exactly what I'm trying to get at and puts it well in this piece:

Dave Kearns has followed up on Kim Cameron's posting from Friday.

  1. Kim says that sometimes you need to copy data in order to join it with other data
  2. Dave says the same thing, except indicates that you wouldn't copy the data but just use “certain virtual directory functionality”

Actually, in #2, that functionality would likely be persistent cache, which if you look under the covers is exactly the same as a meta-directory in that it will copy data locally. In fact, the data may even be stored (again!) in a relational database (SQLServer in the Radiant Logic example he provides).

Let's use laser focus and only look at Kim's example of joining purchase orders with user identity.

Let's face it. Most applications aren't designed to go to one database when you're dealing solely with transactional data and another database when you're dealing with a combination of transactional data and identities.

If we model this through the virtual directory and indicate that every time an application joins purchase orders and identities that it does so (even via SQL instead of LDAP) through the virtual directory, you've now said the following:

  1. You're okay with re-modelling all of these data relationships in a virtual directory — even those representing purchase order information.
  2. You're okay with moving a lot of identity AND transactional information into a virtual directory's local database.
  3. You're okay with making this environment scalable and available for those applications.

Unfortunately, this doesn't really hold up. There are a lot more issues, but even after just these first three (or even the first one) you begin to realize that while virtual directory makes sense for identity, it may not make sense as the ONLY way to get identity. I think the same thing goes for an identity hub that ONLY thinks in terms of virtualization.

The real solution here is a combination of virtualization with more standardized publish/subscribe for delivery of changes. This gets us away from this ad-hoc change discovery that makes meta-directories miserable, while ensuring that the data gets where it needs to go for transactions within an application.

I discourage people from thinking that metadirectory implies “ad-hoc change discovery”.  That's a defect of various metadirectory implementations, not a characteristic of the technology or architecture.  As soon as applications understand they are PART OF a wider distributed fabric, they could propagate changes using a publication pattern that retains the closed-loop verification of self-converging metadirectory.  

Internet as extension of mind

Ryan Janssen at drstarcat.com  published an interview recently that led me to think back over the various phases of my work on identity.  I generally fear boring people with the details, but Ryan explored some things that are very important to me, and I appreciate it. 

After talking about some of the identity problems of the enterprise, he puts forward a description of metadirectory that I found interesting because it starts from current concepts like claims rather than the vocabulary of X.500: 

…. Kim and the ZOOMIT team came up with the concept of a “metadirectory”. Metadirectory software essentially tries to find correlation handles (like a name or email) across the many heterogeneous software environments in an enterprise, so network admins can determine who has access to what. Once this is done, it then takes the heterogeneous claims and transforms them into a kind of claim the metadirectory can understand. The network admin can then use the metadirectory to assign and remove access from a single place. 

Zoomit released their commercial metadirectory software (called “VIA”) in 1996 and proceeded to clean the clock of larger competitors like IBM for the next few years until Microsoft acquired the company in the summer of 1999. Now anyone who is currently involved in the modern identity movement and the issues of “data portability” that surround it has to be feeling a sense of deja vu because these are EXACTLY the same problems that we are now trying to solve on the internet—only THIS time we are trying to take control of our OWN claims that are spread across innumerable heterogeneous systems that have no way to communicate with each other. Kim’s been working on this problem for SIXTEEN years—take note!

Yikes.  Time flies when you're having fun.

When I asked Kim what his single biggest realization about Identity in the 16 years since he started working on it was, he was slow to answer, but definitive when he did—privacy. You see, Kim is a philosopher as well as a technologist. He sees information technology (and the Internet in particular) as a social extension of the human mind. He also understands that the decisions we make as technologists have unintended as well as intended consequences. Now creating technology that enables a network administrator to understand who we are across all of a company’s systems is one thing, but creating technology that allows someone to understand who we are across the internet, particularly as more and more of who we are as humans is stored there, and particularly if that someone isn’t US or someone we WANT to have that complete view, is an entirely other problem.

Kim has consistently been one the strongest advocates for obscuring ANY correlation handles that would allow ANY Identity Provider or Relying Party to have a more complete view of us than we explicitly give them. Some have criticized his concerns as overly cautious in a world where “privacy is dead”. When you think of your virtual self as an extension of your personal self though, and you realize that the line between the two is becoming increasingly obscured, you realize that if we lose privacy on the internet, we, in a very real sense, lose something that is essentially human. I’m not talking about the ability to hide our pasts or to pretend to be something we’re not (though we certainly will lose that). What we lose is that private space that makes each of us unique. It’s the space where we create. It’s the space that continues to ensure that we don’t all collapse into one.

Yes, it is the space on which and through which Civilization has been built.

Talking about the Identity Bus

During the Second European Identity Conference, Kuppinger-Cole did a number of interviews with conference speakers. You can see these on the Kuppingercole channel at YouTube.

Dave Kearns, Jackson Shaw, Dave Olds and myself had a good old time talking with Felix Gaehtgens about the “identity bus”.  I had a real “aha” during the interview while I was talking with Dave about why synchronization and replication are an important part of the bus.  I realized part of the disconnect we've been having derives from the differing “big problems” each of us find ourselves confronted with.

As infrastructure people one of our main goals is to get over our “information chaos” headaches…  These have become even worse as the requirements of audit and compliance have matured.  Storing information in one authoritative place (and one only) seems to be a way to get around these problems.  We can then retrieve the information through web service queries and drastically reduce complexity…

What does this worldview make of application developers who don't want to make their queries across the network?   Well, there must be something wrong with them…  They aren't hip to good computing practices…  Eventually they will understand the error of their ways and “come around”…

But the truth is that the world of query looks different from the point of view of an application developer. 

Let's suppose an application wants to know the name corresponding to an email address.  It can issue a query to a remote web service or LDAP directory and get an answer back immediately.  All is well and accords with our ideal view.

But the questions application developers want to answer aren't always of the simple “do a remote search in one place” variety.

Sometimes an application needs to do complex searches involving information “mastered” in multiple locations.   I'll make up a very simple “two location” example to demonstrate the issue:  

“What purchases of computers were made by employees who have been at the company for less than two years?”

Here we have to query “all the purchases of computers” from the purchasing system, and “all empolyees hired within the last two years” from the HR system, and find the intersection.

Although the intersection might only represent a few records,  performing this query remotely and bringing down each result set is very expensive.   No doubt many computers have been purchased in a large company, and a lot of people are likely to have been hired in the last two years.  If an application has to perform this type of  query with great efficiency and within a controlled response time,  the remote query approach of retrieving all the information from many systems and working out the intersection may be totally impractical.   

Compare this to what happens if all the information necessary to respond to a query is present locally in a single database.  I just do a “join” across the tables, and the SQL engine understands exactly how to optimize the query so the result involves little computing power and “even less time”.  Indexes are used and distributions of values well understood: many thousands of really smart people have been working on these optimizations in many companies for the last 40 years.

So, to summarize, distributed databases (or queries done through distributed services) are not appropriate for all purposes. Doing certain queries in a distributed fashion works, while in other cases it leads to unacceptable performance.

The result is that many application developers “don't want to go there” – at least some of the time.  Yet their applications must be part of the identity fabric.  That is why the identity metasystem has to include application databases populated through synchronization and business rules.

On another note, I recommend the interview with Dave Kearns on the importance of context to identity. 

Converging Metadirectory and Virtual Directory

Phil Hunt, now at Oracle, is the visionary responsible for a lot of the innovation in Virtual Directory. From his recent response to my ideas on second generation metadirectory, it looks like we are actually thinking about things in similar ways, where meta and virtual work together.

As you may know, there has been an ongoing discussion on what does the next generation of meta-directory look like. Kim Cameron’s latest post elaborates on what he thinks is needed for the next generation of “metadirectory”.

  • By “next generation application” I mean applications based on web service protocols. Our directories need to integrate completely into the web services fabric, and application developers must to be able to interact with them without knowing LDAP.
  • Developers and users need places they can go to query for “core attributes”. They must be able to use those attributes to “locate” object metadata. Having done so, applications need to be able to understand what the known information content of the object is, and how they can reach it.
  • Applications need to be able to register the information fields they can serve up.

These are actually some of the key reasons I have been advocating for a new approach to developing identity services APIs for developers. We are actually very close in our thinking. Here are my thoughts:

  • There should be a new generation of APIs that de-couple developers from dependence on particular vendor implementations, protocols, and potentially even data schemas when it comes to accessing identity information. Applications should be able to define their requirements for data and simply let the infrastructure deal with how to deliver it.
  • Instead of thinking of core attributes as those attributes that are used in common (e.g. such as surname is likely the same everywhere). I would like to propose we alter the definition slightly in terms of “authoritativeness”. Application developers should think about what data is core to their application. What data is the application authoritative for? If an application isn’t authoritative over an attribute, it probably shouldn’t be storing or managing that attribute. Instead, this “non-core” attribute should be obtained from the “identity network” (or metaverse as Kim calls it). An application’s “core” data should only be the data for which the application is authoritative. In that sense, I guess I may be saying the opposite of Kim. But the idea is the same, an application should have a sense of what is core and not core.
  • Applications need to register the identity data they consume, use, and update. Additionally, applications need to register the transactions they intend to perform with that data. This enables identity services to be built around an application that can be performant to the application’s requirements.

What I have just described was actually part of the original inspiration behind CARML (Client Attributes Requirements Markup Language) put forward by Oracle that the Liberty Alliance is working on now. It was our belief that in order to enable applications to connect to diverse identity service infrastructures, something like CARML was needed to make the identity network both possible, adaptive, and intelligent.

But, while CARML was cool in itself, the business benefit to CARML was that knowing how an application consumes and uses identity data would not only help the identity network but it would also greatly improve the ability of auditors to perform privacy impact assessments.

We’ve recently begun an open source project at OpenLiberty called the IGF Attribute Services API that does exactly what Kim is talking about (by the way, I’m looking for nominations for a cool project name – let me know your thoughts). The Attribute Services API is still in early development stages – we are only at milestone 0.3. But that said, now is a great time for broader input. I think we are beginning to show that a fully de-coupled API that meets the requirements above is possible and dramatically easier to use and yet at the same time, much more privacy centric in its approach.

The key to all of this is to get as many applications as possible in the future to support CARML as a standard form of declaration. CARML makes it possible for identity infrastructure product vendors and service providers to build the identity network or next generation of metadirectory as described by Kim.

I haven’t seen CARML – perhaps it is still a private proposal? [UPDATE: I’ve been advised that CARML and the IGF Attribute Servces API are the same thing.] I think having a richer common representation for people will be the most important ingredient for success. I’m a little bit skeptical about confining developers to a single API – is this likely to fly in a world where people want to innovate? But details aside, it sounds like CARML will be a helpful input to an important industry discussion. Above all, this needs to be a wide-ranging and inclusive discussion, where we take lots of input. To get “as many applications as possible” involved we need to win the participation and support of application developers – this is not just an “infrastructure’ problem.

Now for something completely different.

French GuardsIt looks like Dave Kearns might be (?) mad at me… His recent post was entitled Your mother was a hamster and your father smelt of elderberries! Of course I would have taken that as a compliment except that I recognized it from The Holy Grail Scene 8, where the “French Guard” precedes it with, “I don’t wanna talk to you no more, you empty headed animal food trough wiper! I fart in your direction.

The olive branch (or was it a birch rod?) to which Dave refers is this:

Kim has now responded (“Through the looking glass“) to my Humpty Dumpty post, and we’re beginning to sound like a couple of old philosophes arguing about whether or not to include “le weekend” and “hamburguer” and other Franglais in the French dictionary.

We really aren’t that far apart.

In his post, Kim recalls launching the name “metadirectory” back in ‘95 with Craig Burton and I certainly don’t dispute that. In fact, up until 1999, I even agreed somewhat with his definition:

“In my world, a metadirectory is one that holds metadata – not actual objects, but descriptions of objects and their locations in other physical directories.”

But as I continued in that Network World column:

“Unfortunately, vendors such as Zoomit took the term ‘metadirectory’ and redefined it so it could be used to describe what I’d call an überdirectory – a directory that gathers and holds all the data from all your other directories.”

Since no one took up my use of “uberdirectory,” we started using “metadirectory” to describe the situations which required a new identity store and “virtual directory” for those that didn’t.

So perhaps we’re just another couple of blind men trying to describe an elephant.

Gee – have we been having this discussion ever since 1999? Look – I agree that we are both dealing with legitimate aspects of the elephant. Olive branch accepted.

Now that that’s out of the way, maybe I can call upon Dave to lay down his birch rod too. He keeps saying I propose ”a directory that gathers and holds ALL the data from ALL your other directories.” Dave, this is just untrue and unhelpful. “ALL” was never the goal – or the practice – of metadirectory, and you know it. The goal was to represent the “object core” – the attributes shared across many applications and that need therefore to be kept consistent and synchronized if stored in multiple places. Our other goal was to maintain the knowledge about what objects “were called” in different directories and databases (thus the existence of “connector space”).

Basically, the ”ALL” argument is a red herring (and if you want, you can say hareng rouge instead…)

More on the second generation of metadirectory

Oracle’s Clayton Donley has joined the Metadirectory discussion and maybe his participation will help clarify things.

He writes:

I was reading this posting from my friend and colleague, Phil Hunt, in which he talks about the ongoing discussion between Dave Kearns and Kim Cameron about the death of meta-directories.

Not only is he correct in pointing out that Kim’s definition of Meta 2.0 is exactly what virtual directory has been since 1.0, but it’s interesting to see that some virtual directory vendors continue to push something that looks very much like meta-directory 1.0.

Before we go further, I want to peak at how Clayton’s virtual directory works:

… If the request satisfies the in-bound security requirements, the next step is to invoke any global level mappings and plug-ins. Mapping and plug-ins have the ability modify the operation such as changing the name or value of attributes. The next step after global-plug-ins is to determine which adapter(s) can handle the request. This determination is made based on the information provided in the operation.

The primary information used is the DN of the operation – the search base in the search or the DN of the entry in an all other LDAP operations like a bind or add. OVD will look at the DN and determine which adapters could potentially support an operation for that DN. This is possible because each adapter in its configuration tells what LDAP namespace it’s responsible for.

In the case where multiple adapters can support the incoming DN namespace (for example a search who’s base is the root of the directory namespace such as dc=oracle,dc=com), then OVD will perform the operation on each adapter. The order of precedence is configurable based on priority, attributes or supported LDAP search filters.

Pretty cool. But let’s do a historical reality check. The first metadirectory, which shipped twelve years ago, included the ability to do real-time queries that were dispatched to multiple LDAP systems depending on the query (and to several at once). The metadirectory provided the “glue” to know which directory service agents could answer which queries. The system performed the assembly of results across originating directory service agents – in other words mutliple LDAP services produced by multiple vendors.

And guess what? The distributed queries were accessed as part of “the metaverse”. The metaverse was in no way limited to “a local store”.

The metaverse was the joined information field comprising all the objects in the metadirectory. Only the smallest set of “core” attributes was stored in the local database or synchronized throughout the system. This set of attributes composed the “object root” – the things that MUST BE THE SAME in each of the applications and stores in a management continent. There actually aren’t that many of them. For example, in normal circumstances, my surname should be the same in all the systems within my enterprise. So it makes sense to synchronize surname between systems so that it actually stays the same over time.

As metadriectories started to compete in the marketplace, the problem of provisioning and managing core attributes came to predominate over that of connecting to application specific ones. Basically, I think it was just early. That doesn’t mean one should counterpose metadirectory and virtual directory, or congratulate oneself too much for ”owning” distributed query. The problem of distributed information is complex and needs multiple tools – even the dreaded “caching”.

Let me return to what I said would be the focus of “second generation metadirectory”:

Providing the framework by which next-generation applications can become part of the distributed data infrastructure. This includes publishing and subscription. But that isn’t enough. Other applications need ways to find it, name it, and so on.

If Clayton and Phil think virtual directories already do this, I can see that I wasn’t clear enough. So here are a few precisions:

  • By “next generation application” I mean applications based on web service protocols. Our directories need to integrate completely into the web services fabric, and application developers must to be able to interact with them without knowing LDAP.
  • Developers and users need places they can go to query for “core attributes”. They must be able to use those attributes to “locate” object metadata. Having done so, applications need to be able to understand what the known information content of the object is, and how they can reach it.
  • Applications need to be able to register the information fields they can serve up.

Today’s virtual directories just don’t do this any better or any worse than metadirectories do. Virtual directories expose some of the fabric, just as today’s metadirectories do, but they don’t get at the big prize. It’s what I have called the unified field of information. Back in the 90’s more than one analyst friend made fun of me for thinking this was possible. But today it is not only possible, it is necessary.

Identity bus and administrative domain

Novell's Dale Olds, who will be on Dave Kearns’ panel at the upcoming European Identity Conference, has added the “identity bus” to the metadirectory / virtual directory mashup.  He says in part :

Meta directories synchronize the identity data from multiple sources via a push or pull protocols, configuration files, etc. They are useful for synchronizing, reconciling, and cleaning data from multiple applications, particularly systems that have their own identity store or do not use a common access mechanism to get their identity data. Many of those applications will not change, so synchronizing with a metadirectory works well.

Virtual directories are useful to pull identity data through the hub from various sources dynamically when an application requests it. This is needed in highly connected environments with dynamic data, and where the application uses a protocol which can be connected to the virtual directory service. I am also well aware that virtual directory fans will want to point out that the authoritative data source is not the service itself, but my point here is that, if the owners shut down the central service, applications can’t access the data. It’s still a political hub.

Personally, I think all this meta and virtual stuff are useful additions to THE key identity hub technology — directory services. When it comes to good old-fashioned, solid scalable, secure directory services, I even have a personal favorite. But I digress.

The key point here as I see it is ‘hub’ vs. ‘bus’ — a central hub service vs. passing identity data between services along the bus.

The meta/virtual/directory administration and configuration is the limiting problem. In directory-speak, the meta/virtual/directory must support the union of all schema of all applications that use it. That means it’s not the mass of data, or speed of synchronization that’s the problem — it’s the political mass of control of the hub that becomes immovable as more and more applications rendezvous on it.

A hub is like the proverbial silo. In the case of meta/virtual/directories the problem goes beyond the inflexibility of large identity silos like Yahoo and Google — those silos support a limited set of very tightly coupled applications. In enterprise deployments, many more applications access the same meta/virtual/directory service. As those applications come and go, new versions are added, some departments are unwilling to move, the central service must support the union of all identity data types needed by all those applications over time. It’s not whether the service can technically achieve this feat, it’s more an issue of whether the application administrators are willing to wait for delays caused by the political bottleneck that the central service inevitably becomes.

Dale makes other related points that are well worth thinking about.  But let me zoom in on the relation between metadirectory and the identity bus.

As Dale points out in his piece, I think of the “bus” as being a “backplane” loosely connecting distributed services.  The bus exends forever in all directions, since ultimately distributed computing doesn't have a boundary.

In spite of this, the fabric of distributed services isn't an undifferentiated slate.  Services and systems are grouped into continents by the people and organizations running and using them.  Let's call these “administrative domains”.  Such domains may be defined at any scale – and often overlap.

The magic of the backplane or “bus”, as Stuart Kwan called it, is that we can pass identity claims across loosely coupled systems living in multiple discontinuous administrative domains. 

But let's be clear.  The administrative domains still continue to exist, and we need to manage and rationalize them as much tomorrow as we did yesterday.

I see metadirectories (meaning directories of directories) as the glue for stitching up these administrative continents so digital objects can be managed and co-ordinated within them. 

That is the precondition for hoisting the layer of loosely coupled systems that exists above administrative domains.  And I don't think it matters one bit whether a given digital object is accessed by a remote protocol, synchronization, or stapling a set of claims to a message – each has its place.

Complex and interesting issues.  And my main concern here is not terminology, but making sure the things we have learned about metadirectory (or whatever you want to call it) are properly integrated into the evolving distributed computing architecture.  A lot of us are going to be at the European Identity Conference in Munich later this month, so I look forward to the sessions and discussions that will take place there.

Through the looking glass

You have to like the way, in his latest piece on metadirectory, Dave Kearns summons Lewis Carroll to chide me for using the word “metadirectory” to mean whatever I want:

“When I use a word,” Humpty Dumpty said, in rather a scornful tone, “it means just what I choose it to mean—neither more nor less.”
“The question is, ” said Alice, “whether you can make words mean so many different things.”
“The question is,” said Humpty Dumpty. “which is to be master—that's all.

Dave continues:

Kim talks about a “second generation” metadirectory. Metadirectory 2.0 if you will. First time I've heard about it. First time anyone has heard about it, for that matter. There is no such animal. Every metadirectory on the market meets the definition which Kim provides as “first generation”.

It's time to move on away from the huge silo that sucks up data, disk space, RAM and bandwidth and move on to a more lithe, agile, ubiquitous and pervasive identity layer. Organized as an identity hub which sees all of the authoritative sources and delivers, via the developer's chosen protocol, the data the application needs when and where it's needed.

It's funny.  I remember sitting around in Craig Burton's office in 1995 while he, Jamie Lewis and I tried to figure out what we should call the new kind of multi-centered logical directory that each of us had come to understand was needed for distributed computing. 

After a while, Craig threw out the word “metadirectory”.  I was completely amazed.  My colleagues and I had also come up with the word “metadirectory”, but we figured the name would be way too “intellectual” – even though the idea of a “directory of directories” was exactly right.

Craig just laughed the way he always does when you say something naive.  Anyone who knows Craig will be able to hear him saying, “Kim, we can call it whatever we want.  If we call it what it really is, how can that be wrong?”

So guess what?  The thing we were calling a metadirectory was a logical directory, not a physical one.  We figured that the output of one instance was the input to the next – there was no center.  The metadirectory would speak all protocols, support different technologies and schemas, support referral to specific application directories, and preserve the application-related characteristics of the constituent data stores.   I'll come back to these ideas going forward because I think they are still super important.

My message to Dave is that I haven't changed what I mean by metadirectory one iota since the term was first introduced in 1995.  I've always seen what is now called virtual directory as an aspect of metadirectory.  In fact, I shipped a product that included virtual directory in 1996.  It not only synchronized, but it did what we used to call “chaining” and “referral” in order to create composite views across multiple physical directories.  It did this not only at the server, but optionally on the client.

Of course, there were implementations of metadirectory that were “a bit more focussed”.  Customers put specific things at the top of their list of “must-haves”, and that is what everyone in the industry tried to build.

But though certain features predominated in the early days of metadirectory, that doesn't mean that those features ARE metadirectory.   We still live in the age of the logical directory, and ALL the aspects of the metadirectory that address that fact will continue to be important.

[Read the rest of Dave's post here.]

Joined like heads and tails

Dave Kearns has expanded further on his view of distributed data, metadirectory and virtual directory.  It seems like some of our disagreement is a matter of terminology.  Dave grudgingly admits (poor Linus and his blanket!) that application developers should be permitted to use databases:

The application database (for those who cling to it like Linus and his blanket) now can serve two purposes – one to subscribe to virtual directory data and one to publish!

The question becomes whether we need more than publish / subscribe relationships between services.   I think we do.  It is this higher level (meta level) of service and information that I call metadirectory. 

Let's make it clear that I see metadirectory as an evolving thing. 

  • First generation metadirectory dealt exclusively with a managing applications that had been conceived without reference to each other – or to any common framework  (In truth, this is still an issue – see Jeff Bohren's recent posting called “Which is better, Phillips or Flat-head?“). 
  • Second generation metadirectory has an additional focus:  providing the framework by which next-generation applications can become part of the distributed data infrastructure.  This includes publishing and subscription.  But that isn't enough.  Other applications need ways to find it, name it, and so on. 

A real distributed information architecture requires services that join objects across contexts, arbitrate truth, advertise schema possibilities and provide the grid through which virtual directory queries can be dispatched.  

These services are what I call metadirectory – the framework for distributed storage.  One may choose to call the queries in this framework “virtual directory”.  But such “virtual directory” requires a “real” framework. 

Dave suggests we read a piece called “The second wave:  Linking identities to contexts” by Michel Prompt (CEO of Radiant Logic).  It is good and I recommend it to everyone.  It raises many issues that are worth thinking about:

If for each application, we can find the unique identifier associated with a person, and we can speak the applicationspecific protocol (LDAP, RDBMS, API, Web services, etc.,) then we can retrieve a specific identity profile associated with that person when we need it. Knowing an identifier and its associated protocol is sufficient to access a specific definition of an identity.

Common access alone, however, is not correlation. It will not tell us that UserId A is in fact EmployeeId 235, and that both underlying profiles are aspects of the identity of Person Y.

Some correlation mechanism thus needs to be deployed, based possibly on matching some common attributes for each profile. If no rules can be produced, then the matching must be done manually, a painstaking process but in many cases unavoidable for at least a subset of the identity data.

Michel has started to talk about the metadata needed to create a framework for distributed query.  Some service needs to know that “UserId A is in fact EmployeeId 235”.  That is clearly glue that creates a “directory of directories”.  Michel might call it a “directory of contexts”, but I don't think the difference is substantive.

A directory of directories: metadirectory

Michel continues:

By defining such a process we can create a “hub” where each person has a “global identifier” associated with the corresponding “local” source identifiers (e.g. UserId A, EmployeeId 235, etc.) If this virtual hub has the capability to write back to each source, we can use it to manage the account/identity life-cycle for each source. And when we need any specific aspect of an identity, we can retrieve it dynamically using the Identity Hub pointer.

Hmmm.  Michael calls it a “hub”, not a metadirectory.  But it is the same thing. 

Since our Identity Hub is stripped down to the minimum information required, the amount of synchronization and data transformation (complex tasks by definition) is reduced to the strict minimum. Only the different (local) references for components of a given identity are stored or synchronized. When we need a specific aspect of identity, we can retrieve it dynamically using the Identity Hub pointer, and the common virtual access layer.

Hmmm.

If data transformation is a complex task, it is because there are different ways of representing data in the distributed system.  If that's the case, the problem doesn't go away with a virtual directory – it gets worse!  The application that calls into a first data source gets its representation, and if it then calls into a second data source, it gets a second representation.  The application is now on its own to figure out what is what.  Far from simplifying – in fact complex transforms need to be done in more locations.

A continuum

In terms of synchronization, the proposal made by Michel and Dave is good for some use cases but not right for others.  Again, we need to support a spectrum of choices. 

You don't always want to synchronize a common identifier.  Especially when working with identity data that is in danger of breach and insider attack, it is a better strategy to use different identifiers in different systems, so knowledge of the “joining glue” is required in order to assemble information across contexts (for example, personal information and financial information). 

And sometimes, you want to synchronize more than just an identifier.  

Real examples

A conversation like this needs real examples.  In most enterprises, the Human Resources Database is the authoritative source for information on employees.  We want our email address books and mail stores and message transfer agents to be up to date with the latest HR information. 

According to the argument being made by Dave and Michel, all our address books and all our mail switches and mail boxes should be sending each query directly into the”authoritative”  human resources database.

But everyone with any experience in the enterprise knows the people who run the HR databases WILL NOT go for this.  They don't want all the technical systems of the enterprise hitting on their systems in real time with every possible query.

My point here is that it will be necessary to offload information from the HR system to other systems.  No one can look seriously at these issues without admitting that SOME synchronization is required (which admittedly should be real time).  On the other hand, we don't want parallel unrelated architectures.

So we are led to the conclusion that we need a spectrum of synchronization and remote access capabilities. We should be able to use policy to define what information is stored where, and how to get to information that is not stored locally – e.g., combine metadirectory and virtual directory functionality.

Attention application developers: Obey Dave Kearns!

Dave Kearns, knife freshly sharpened, responded to my recent post on metadirectory with the polemic, “Killing the Metadirectory“:

… My interpretation is that the metadirectory has finally given way to the virtual directory as the synchronization engine for identity data. Kim interprets it differently. He talks about the “Identity Bus” and says that “…you still need identity providers. Isn’t that what directories do? You still need to transform and arbitrate claims, and distribute metadata. Isn’t metadirectory the most advanced technology for that? ” And I have to answer, “no.” The metadirectory is last century's technology and it's day is past.

The Virtual Directory, the “Directory as a Service” is the model for today and tomorrow. Data that is fresh, always available and available anywhere is what we need. The behemoth metadirectory with it's huge datastore and intricate synchronization schedule (yet is never quite up to date) are just not the right model for the nimble, agile world of today's service driven computing. But the “bus” Kim mentions could be a good analogy here – the metadirectory is a lumbering, diesel-spewing bus. The virtual directory? It's a zippy little Prius…  [Full article here]

Who would want to get in the way of Dave's metaphors?  He's on a streak.  But he's making a fundamental mistake, taking an extreme position that is uncharacteristically naive.  I hope he'll rethink it.

Applications drive infrastructure

Here's the problem.  Infrastructure people cannot dictate how application developers should build their applications.  Applications – providing human and business value – drive infrastructure, not the other way around.  Infrastructure people who don't get this are doomed. 

Dave's neat little story about web service query needs to be put in the crucible of application development.  We need to get real.

Telling application developers how to live 

Real-time query across web services solves some identity problems very well.  In these cases, application developers will be happy to use them.  But it doesn't solve all their identity needs, or even most of them.  When Dave Kearns starts to tell real live application developers they shouldn't put identity information in their databases, they'll tell him to take his zippy Prius and shove off. 

Application developers like to use databases and tables.  They have become expert at doing joins across tables and objects to produce quite magical results.  As people and things become truly first class objects in our applications, developers will want even more to include them in their databases. 

Think for a minute about the kinds of queries you need to do when you start building enterprise social networks.  “Show me all the friends of friends who work in a class of projects similar to the ones I work in…”  You need to do joins, eh?  So it's not just existing enterprise applications that have the need to support distributed storage – it's the emerging ones too.

Even thinking for a moment just about Microsoft applications – SharePoint provides a good example  – the developers ran into the need to maintain local tables so they can get the kind of performance and complex query they need.  Virtual directory doesn't help them one iota in solving this kind of problem.  Nor do web service queries.

Betting big time against the house 

I admire many aspects of Dave's thinking about identity.  But I pity anyone who follows his really ideological argument that virtual directory solves everything and distributed storage just isn't needed.  We need both.

He's asking readers to bet against databases.  He's asking them to bet against the programming model used by application developers.  He's asking them to forget about performance.  He's asking them to take all the use cases in the world and stuff them into his Prius – which is actually more like a hobby horse than a car.

Once you have identity data distributed across stores you either have chaos or you have metadirectory.  I'll explore this more in upcoming posts.

Meanwhile, if anyone wants to bet against the future of databases and integration of identity information into them, drop me a note and I'll set up a page to take your money.  And at the same time, I recommend that you start training for a second career.

This said, I'm as strong a believer in using web services to query for claims in real time as Dave is.  So on that we very much agree.

Metadirectory and claims

My friend and long-time collaborator Jackson Shaw seems to have intrigued both Dave Kearns and Eric Norlin in an  amusing (if wicked) post called You won't have me to kick around anymore

You won't have me to kick around anymore!

No, not me. Hewlett-Packard.

I heard about a month ago that HP was going to bow out of the IDM business. I didn't want to post anything because I felt it would compromise the person that told me. But, now that it has made the news:

Check out Burton Group's blog entry on this very topic

Burton Group has been contacted by HP customers who report that HP is no longer going to seek new customers for its Identity Center product. We have contacted HP and the company confirms that HP Software has decided to focus its investment in identity management products exclusively on existing customers and not on pursuing additional customers or market share. HP is in the process of reaching out to each customer regarding the change.

Seriously – you thought HP was a contender in this space???!!! No, no, Nanette. Thanks for playing. Mission failure…

Let's be honest. The meta-directory is dead. Approaches that look like a meta-directory are dead. We talk about Identity 2.0 in the context of Web services and the evolution of digital identity but our infrastructure, enterprise identity “stuff” is decrepit and falling apart. I have visions of identity leprosy with this bit and that bit simply falling off because it was never built with Web services in mind…

There is going to be a big bang in this area. HP getting sucked into the black hole is just a step towards that…

As graphic as the notion of identity leprosy might be, it was the bit on metadirectory that prompted Dave Kearns to write,

That’s a quote from Quest’s Jackson Shaw. Formerly Microsoft’s Jackson Shaw. Formerly Zoomit’s Jackson Shaw. This is a guy who was deeply involved in metadirectory technology for more than a dozen years. I can only hope that Microsoft is listening.

Back at Jackson's blog we find out that he was largely responding to a session he liked very much given by Neil MacDonald at a recent Gartner Conference.  It was called “Everything You Know About Identity Management Is Wrong.”  Observing that customers are dissatisfied with the cost of hand tailoring their identity and access management, Jackson says,

Neil also introduced the concept of “Identity as a service” to the audience. At the Directory Experts Conference, John Fontana wrote “Is Microsoft’s directory, identity management a service of the future?”   What I am stating is quite simple: I believe a big-bang around identity is coming and it will primarily be centered around web services. I hope the resultant bright star that evolves from this will simplify identity for both web and enterprise-based identity infrastructure.

Active Directory, other directories and metadirectory “engines” will hopefully become dial tone on the network and won't be something that has to be managed – at least not to the level it has to be today.

Without getting overly philosophical, there is a big difference between being, metaphorically,  a “dial tone” – and being “dead”.   I buy Jackson's argument about dial tone, but not about “dead”. 

Web services allow solutions to be hooked together on an identity bus (I called it a backplane in the Laws of Identity).  Claims are the electrons that flow on that bus.  This is as important to information technology as the development of printed circuit boards and ICs were to electronics.  Basically, if we were still hand-wiring our electronic systems, personal computers would be the size of shopping centers and would cost billions of dollars.  An identity bus offers us the possibility to mix and match services in a dynamic way with potential efficiencies and innovations of the same magnitude.

In that sense, claims-based identity drastically changes the identity landscape.

But you still need identity providers.  Isn't that what directories do?  You still need to transform and arbitrate claims, and distribute metadata.  Isn't metadirectory the most advanced technology for that?  In fact, I think directory / metadirectory is integral to the claims based model.  From the beginning, directory allowed claims to be pulled.  Metadirectory allowed them to be pulled, pushed, synchronized, arbitrated and integrated.  The more we move toward claims, the more these capabilities will become important. 

The difference is that as we move towards a common, bus-based architecture, these capabilities can be simplified and automated.   That's one of the most interesting current areas of innovation. 

Part of this process will involve moving directory onto web services protocols.  As that happens, the ability to dispatch and assemble queries in a distributed fashion will become a base functionality of the system – that's what web services are good at.  So by definition, what we now call “virtual directory” will definitely be a base capability of emerging identity systems.