What Kim fails to note… is that a well designed virtual directory (see Radiant Logic's offering, for example) will allow you to do a SQL query to the virtual tables! You get the best of both: up to date data (today's new hires and purchases included) with the speed of an SQL join. And all without having to replicate or synchronize the data. I'm happy, the application is happy – and Kim should be happy too. We are in violent agreement about what the process should look like at the 40,000 foot level and only disagree about the size and shape of the paths – or, more likely, whether they should be concrete or asphalt.
Neil Macehiter answers by making an important distinction that I didn't emphasize enough:
But the issue is not with the language you use to perform the query: it's where the data is located. If you have data in separate physical databases then it's necessary to pull the data from the separate sources and join them locally. So, in Kim's example, if you have 5000 employees and have sold 10000 computers then you need to pull down the 15000 records over the network and perform the join locally (unless you have an incredibly smart distributed query optimiser which works across heterogeneous data stores). This is going to be more expensive than if the computer order and employee data are colocated.
- Kim says that sometimes you need to copy data in order to join it with other data
- Dave says the same thing, except indicates that you wouldn't copy the data but just use “certain virtual directory functionality”
Actually, in #2, that functionality would likely be persistent cache, which if you look under the covers is exactly the same as a meta-directory in that it will copy data locally. In fact, the data may even be stored (again!) in a relational database (SQLServer in the Radiant Logic example he provides).
Let's use laser focus and only look at Kim's example of joining purchase orders with user identity.
Let's face it. Most applications aren't designed to go to one database when you're dealing solely with transactional data and another database when you're dealing with a combination of transactional data and identities.
If we model this through the virtual directory and indicate that every time an application joins purchase orders and identities that it does so (even via SQL instead of LDAP) through the virtual directory, you've now said the following:
- You're okay with re-modelling all of these data relationships in a virtual directory — even those representing purchase order information.
- You're okay with moving a lot of identity AND transactional information into a virtual directory's local database.
- You're okay with making this environment scalable and available for those applications.
Unfortunately, this doesn't really hold up. There are a lot more issues, but even after just these first three (or even the first one) you begin to realize that while virtual directory makes sense for identity, it may not make sense as the ONLY way to get identity. I think the same thing goes for an identity hub that ONLY thinks in terms of virtualization.
The real solution here is a combination of virtualization with more standardized publish/subscribe for delivery of changes. This gets us away from this ad-hoc change discovery that makes meta-directories miserable, while ensuring that the data gets where it needs to go for transactions within an application.
I discourage people from thinking that metadirectory implies “ad-hoc change discovery”. That's a defect of various metadirectory implementations, not a characteristic of the technology or architecture. As soon as applications understand they are PART OF a wider distributed fabric, they could propagate changes using a publication pattern that retains the closed-loop verification of self-converging metadirectory.