{"id":943,"date":"2008-03-24T22:25:48","date_gmt":"2008-03-25T06:25:48","guid":{"rendered":"\/?p=943"},"modified":"2008-03-24T23:22:09","modified_gmt":"2008-03-25T07:22:09","slug":"joined-like-heads-and-tails","status":"publish","type":"post","link":"https:\/\/www.identityblog.com\/?p=943","title":{"rendered":"Joined like heads and tails"},"content":{"rendered":"<p>Dave Kearns\u00a0has <a href=\"http:\/\/www.vquill.com\/2008\/03\/its-unsanitary-kim.html\" class=\"broken_link\">expanded further<\/a> on his view of\u00a0distributed data, metadirectory and virtual directory.\u00a0 It seems like some of our <a href=\"\/?p=942\">disagreement <\/a>is\u00a0a matter of terminology.\u00a0 Dave\u00a0grudgingly admits (poor Linus and his blanket!) that application developers should be\u00a0permitted to\u00a0use databases:<\/p>\n<blockquote><p>The application database (for those who cling to it like Linus and his blanket) now can serve two purposes &#8211; one to subscribe to virtual directory data and one to publish!<\/p><\/blockquote>\n<p>The question\u00a0becomes whether we need\u00a0<strong>more<\/strong> than\u00a0publish \/ subscribe relationships between services.\u00a0\u00a0\u00a0I think we do.\u00a0 It is this higher level (meta level) of service and information that I call metadirectory.\u00a0<\/p>\n<p>Let&#39;s\u00a0make it clear\u00a0that I\u00a0see metadirectory\u00a0as\u00a0an evolving thing.\u00a0<\/p>\n<ul>\n<li>First generation metadirectory dealt exclusively with\u00a0a\u00a0managing\u00a0applications\u00a0that had been conceived without reference to each other &#8211; or to any common framework\u00a0\u00a0(In truth,\u00a0this is still an issue &#8211; see Jeff Bohren&#39;s recent posting called &#8220;<a href=\"http:\/\/idlogger.wordpress.com\/2008\/03\/22\/which-is-better-phillips-or-flat-head\/\">Which is better, Phillips or Flat-head?<\/a>&#8220;).\u00a0<\/li>\n<li>Second generation metadirectory has an additional focus:\u00a0\u00a0providing the framework by which next-generation applications can become part of the distributed data\u00a0infrastructure.\u00a0 This includes publishing and subscription.\u00a0\u00a0But that\u00a0isn&#39;t enough.\u00a0 Other applications need ways to find it, name it, and so on.\u00a0<\/li>\n<\/ul>\n<p>A real distributed information architecture requires\u00a0services that join objects across contexts, arbitrate truth, advertise schema possibilities and provide the grid through which virtual directory queries can be dispatched.\u00a0\u00a0<\/p>\n<p>These services are what I call metadirectory &#8211; the\u00a0framework\u00a0for distributed storage.\u00a0\u00a0One\u00a0may choose to call the <em>queries<\/em> in this framework\u00a0&#8220;virtual directory&#8221;.\u00a0\u00a0But such &#8220;virtual directory&#8221; requires a &#8220;real&#8221; framework.\u00a0<\/p>\n<p>Dave suggests we read a piece called &#8220;<a href=\"http:\/\/www.radiantlogic.com\/main\/pdf\/Page74.pdf\" class=\"broken_link\">The second wave:\u00a0 Linking identities to contexts<\/a>&#8221; by Michel Prompt (CEO of <a href=\"http:\/\/www.radiantlogic.com\/main\/\" class=\"broken_link\">Radiant Logic<\/a>).\u00a0 It is good and I recommend it to everyone.\u00a0\u00a0It raises many issues that are worth thinking about:<\/p>\n<blockquote><p>If for each application, we can find the unique identifier associated with a person, and we can speak the applicationspecific protocol (LDAP, RDBMS, API, Web services, etc.,) then we can retrieve a specific identity profile associated with that person when we need it. Knowing an identifier and its associated protocol is sufficient to access a specific definition of an identity.<\/p>\n<p>Common access alone, however, is not correlation. It will not tell us that UserId A is in fact EmployeeId 235, and that both underlying profiles are aspects of the identity of Person Y.<\/p>\n<p><em>Some correlation mechanism thus needs to be deployed, based possibly on matching some common attributes for each profile. If no rules can be produced, then the matching must be done manually, a painstaking process but in many cases unavoidable for at least a subset of the identity data.<\/em><\/p><\/blockquote>\n<p>Michel has started to talk about the metadata needed to create a framework for distributed query.\u00a0 Some service needs to know that &#8220;UserId A is in fact EmployeeId 235&#8221;.\u00a0 That is\u00a0clearly glue that creates a &#8220;directory of directories&#8221;.\u00a0 Michel might call it a &#8220;directory of contexts&#8221;, but I don&#39;t think the difference is substantive.<\/p>\n<p><strong>A\u00a0directory of directories:\u00a0metadirectory<\/strong><\/p>\n<p>Michel continues:<\/p>\n<blockquote><p>By defining such a process we can create a \u201chub\u201d where each person has a \u201cglobal identifier\u201d associated with the corresponding \u201clocal\u201d source identifiers (e.g. UserId A, EmployeeId 235, etc.) If this virtual hub has the capability to write back to each source, we can use it to manage the account\/identity life-cycle for each source. And when we need any specific aspect of an identity, we can retrieve it dynamically using the Identity Hub pointer.<\/p><\/blockquote>\n<p>Hmmm.\u00a0 Michael calls it a &#8220;hub&#8221;, not a metadirectory.\u00a0\u00a0But it is the same thing.\u00a0<\/p>\n<blockquote><p>Since our Identity Hub is stripped down to the minimum information required, the amount of synchronization and data transformation (complex tasks by definition) is reduced to the strict minimum. Only the different (local) references for components of a given identity are stored or synchronized. When we need a specific aspect of identity, we can retrieve it dynamically using the Identity Hub pointer, and the common virtual access layer.<\/p><\/blockquote>\n<p>Hmmm.<\/p>\n<p>If\u00a0data transformation is a complex task, it is because there are different ways of representing data in the distributed system.\u00a0 If that&#39;s the case,\u00a0the problem\u00a0doesn&#39;t go away with a virtual directory &#8211; it gets worse!\u00a0 The application that calls into\u00a0a\u00a0first data source gets its representation, and if it then calls into a\u00a0second data source, it gets a\u00a0second representation.\u00a0 The application is now on its own to figure out what is what.\u00a0\u00a0Far from simplifying\u00a0&#8211; in fact complex transforms need to be done in more locations.<\/p>\n<p><strong>A continuum<\/strong><\/p>\n<p>In terms of synchronization, the proposal made by Michel and Dave is good for some use cases but not right for others.\u00a0 Again, we need to support a spectrum of choices.\u00a0<\/p>\n<p>You don&#39;t always want to synchronize a common identifier.\u00a0 Especially when working with identity data that is in danger of breach and insider attack, it\u00a0is a better strategy\u00a0to use different identifiers in different systems, so knowledge of the &#8220;joining glue&#8221; is required in order to\u00a0assemble information across contexts (for example, personal information and financial information).\u00a0<\/p>\n<p>And sometimes, you want to synchronize more than just an identifier.\u00a0\u00a0<\/p>\n<p><strong>Real examples<\/strong><\/p>\n<p>A conversation like this needs real examples.\u00a0 In most enterprises, the Human Resources Database is the authoritative source for information on employees.\u00a0 We want our email address books and mail stores and message transfer agents to be up to date with the latest HR information.\u00a0<\/p>\n<p>According to the argument being made by Dave and Michel, all our address books and all our mail switches and mail boxes should be sending each query directly into the&#8221;authoritative&#8221; \u00a0human resources database.<\/p>\n<p>But everyone with any experience in the enterprise knows the people who run the HR databases\u00a0WILL NOT go\u00a0for this.\u00a0 They don&#39;t want all the technical systems of the enterprise hitting on their systems in real time with every possible query.<\/p>\n<p>My point here is that it will be necessary to offload information from the HR system to other systems.\u00a0\u00a0No one can look seriously at these issues without\u00a0admitting that SOME synchronization is required (which admittedly should be real time).\u00a0 On the other hand, we don&#39;t want parallel unrelated architectures.<\/p>\n<p>So we are led to the conclusion that we need a spectrum of synchronization and remote access\u00a0capabilities.\u00a0We should be able\u00a0to use policy to define what information is stored where, and how to get to information that is not stored locally &#8211; e.g., combine metadirectory and virtual directory functionality.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We need a spectrum of synchronization and remote access capabilities<\/p>\n","protected":false},"author":68,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[58],"tags":[],"_links":{"self":[{"href":"https:\/\/www.identityblog.com\/index.php?rest_route=\/wp\/v2\/posts\/943"}],"collection":[{"href":"https:\/\/www.identityblog.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.identityblog.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.identityblog.com\/index.php?rest_route=\/wp\/v2\/users\/68"}],"replies":[{"embeddable":true,"href":"https:\/\/www.identityblog.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=943"}],"version-history":[{"count":0,"href":"https:\/\/www.identityblog.com\/index.php?rest_route=\/wp\/v2\/posts\/943\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.identityblog.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=943"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.identityblog.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=943"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.identityblog.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=943"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}