In Part 1 of this series, I showed that enterprise-wide application integration projects based on SOA (service-oriented architecture) bear an uncanny resemblance to the enterprise data warehouse projects that business intelligence (BI) departments have been undertaking for years. The staged roll-out approach to SOA described in Part 1 and repeated in Figure 1 here is based closely on the experience of data warehouse projects from as far back as the mid-1990s. This observation led me to examine the skills and techniques BI developers can bring to SOA. Part 2 discussed one of the key crossover areas: data modeling, while Part 3 dealt with user interface issues. In this final installment, I’ll focus on metadata.
Figure 1: Rolling Out an Integration Infrastructure
I’m sure most BI experts would admit that metadata has not been one of BI’s greatest success stories. So it may seem a little strange that I’m suggesting that BI developers have much to offer SOA in this area. However, the sad truth is that metadata is even less well understood in the operational world from whence SOA springs. So, BI developers can share their experiences of the challenges at the very least. But, as we’ll see, there’s actually more to contribute.
First, let’s briefly review the metadata experience in BI over the past twenty years or more. Metadata was identified as a key component of a data warehouse architecture from the very beginning. The very first paper describing a data warehouse was published in 1988 in the IBM Systems Journal by myself and Paul Murphy. It contained a component called a “business data directory,” which is clearly a precursor of the metadata store. The same article described the contents of the business data directory and its sources. It’s important to note that it was seen primarily as supporting end users to understand the contents of the warehouse (business metadata) and secondarily as a means of driving some of the warehouse infrastructure (technical metadata).
As warehousing evolved through the ’90s, the term metadata began to be used extensively. Of the two categories previously described, technical metadata received the greater focus, despite the greater importance ascribed to business metadata in an architectural sense.
Software vendors focused on technical metadata for the simple reason that their tools needed it. This was particularly the case for extract, transform and load (ETL) tools where developers could create metadata as part of defining the ETL processes, while obvious sources for technical definitions of the source and target data already existed in database catalogues and data dictionaries. Technical metadata is used almost exclusively by warehouse designers and developers and is usually relatively stable over time, or slowly changing. However, there is a subset of technical metadata called operational metadata, which could be described as the real-time component of technical metadata. This metadata describes the ongoing status of various aspects of the system, and again is of interest to the IT community. There is a wide variety of tools that create, support, control and manage technical metadata, and most BI projects today could claim a reasonable degree of success in this area, albeit usually with a fairly limited subset of the total amount of metadata that might be considered.
The same cannot be said for business metadata, and it is in this area that I see most BI implementations struggle. The lack of success can be attributed to three main factors. From a purely technology-focused viewpoint, you can build a warehouse without automating any business metadata. Business analysts describe business terminology, meanings and relationships to IT, who use this information to build databases and ETL processes. The technical metadata becomes a proxy for the original business metadata, of course, but the original business content is lost; or at best stored in unstructured text documents or spreadsheets. Second, you can also roll out a warehouse to the business users without business metadata, because the first users are the ones who defined the warehouse (or, more likely, data mart) content and structure, and thus understand it implicitly. By the time the warehouse usage extends to other parts of the business, development funding has dried up and these new users have to make do with whatever tacit knowledge exists in the organization about the meaning and proper usage of the data. Third, there is a lack of tools. Software vendors have had little incentive to create business metadata tools because of the low value attached to the topic by IT, and then IT complains that there are no tools … a vicious circle.
Our lack of success in providing extensive and useful business metadata is, in my view, one of the key reasons why long-term return on investment in BI is much lower than it should be.
Even a cursory examination of how SOA works, or is supposed to work, will be sufficient to convince you that SOA is highly dependent on metadata. Often, the word metadata is not used; but metadata, both technical and business, is at the heart of SOA.
At a technical level, web services and their interfaces are described by metadata. In SOA, this metadata is stored in XML formats such as XML schema definitions (XSD) or web services description language (WSDL) files. As was the case for business intelligence, technical metadata for SOA is of critical importance for software vendors, so tools for creating and managing it proliferate.
It’s worth noting a difference in approach here between the BI and SOA worlds. BI tool vendors often explicitly define their metadata storage as a database of some sort. In the SOA world, files still seem to be the preferred approach. While one can understand this in terms of the differing histories of the vendors – BI tool vendors come largely from a data-centric world; SOA tool vendors come largely from an application-centric world – it’s difficult to see the wisdom of trying to manage any enterprise-wide data resource, as metadata certainly is, in anything other than a database. I’d argue that the first lesson any SOA tool vendor or implementer can learn from BI is that SOA metadata should be stored and managed in a (meta)database, sometimes known as a repository.
A second noteworthy point relates to the temporal and usage characteristics of SOA technical metadata. Again, BI expertise could help. In contrast to BI technical metadata, the SOA variety is both more changeable and active. In business intelligence, technical metadata (with the exception of the relatively small operational subset) changes rather slowly because it is the IT-driven development cycle that uses and modifies it. Furthermore, it is seldom used in an active manner during operation of the warehouse infrastructure; rather, it is “compiled and loaded” into the tools such as database and ETL engines that use it. In SOA, the opposite is the case. Technical metadata, especially at the workflow level, is potentially highly changeable insofar as it is envisaged that business users can change processes on the fly. The technical metadata is also active, that is to say that it is used directly by the infrastructure to discover services and understand interfaces. SOA technical metadata is thus technologically significantly more challenging than its BI counterpart. However, BI vendors and implementers have substantially more experience in dealing with metadata and distributed data, and this knowledge could be of value in SOA.
The case for business metadata in SOA is even more compelling than in BI. The longer term expectation of SOA is that it empowers business users to take more control of the business processes. The goal is to enable them to apply their business expertise and knowledge to modify workflows as the business changes with substantially less recourse to IT than at present. A clear prerequisite for such behavior is, of course, a solid understanding of what the current process does, what information it consumes and produces, and the implications of any change to any of these. Such understanding is, of course, nothing less than complete and comprehensive business metadata.
The BI experience with business metadata, while patchy, is certainly far in advance of anything that has been considered so far in the operational environment.
At a project level, implementing metadata, whether business or technical, is very analogous to a data warehouse implementation. Metadata comes from a wide variety of disparate sources. It needs to be modeled in order to understand how it can be correctly combined. Extraction, transformation and cleansing are needed to obtain clean metadata for loading into the destination repository or files. So, once again, we can immediately see how basic BI skills can be of great assistance to an SOA implementation.
As we reach the end of this series, it should be clear that the type of enterprise integration envisaged by SOA has many parallels to what data warehousing has been attempting for many years now. Indeed, it can be argued that SOA integration is an immensely more challenging process than data warehousing, because it affects the operational infrastructure and procedures directly. It has immediate impact on the company’s dealings with customers and suppliers, and thus has immediate influence on the bottom line, especially if something goes wrong. One might compare it to changing the engines of a 747… in flight! With such an analogy, you may wonder if you want to get involved in the SOA experience, but there’s certainly no shortage of areas where your BI experience and skills can bring much needed knowledge and value. Have a good flight.
Recent articles by Barry Devlin
Dr. Barry Devlin is among the foremost authorities in the world on business insight and data warehousing. He was responsible for the definition of IBM's data warehouse architecture in the mid '80s and authored the first paper on the topic in the IBM Systems Journal in 1988. He is a widely respected consultant and lecturer on this and related topics, and author of the comprehensive book, Data Warehouse: From Architecture to Implementation published by Addison-Wesley in 1997.
Over the past few years, Barry has extended his interest to cover the wider field of a fully integrated business, covering informational, operational and collaborative environments and, in particular, how to present the end user with an holistic experience of the business through IT.
Barry has worked in the IT industry for more than 25 years, mainly as a Distinguished Engineer for IBM in Dublin, Ireland. He is now founder and principal of 9sight Consulting, specializing in the human, organizational and IT implications and design of deep business insight solutions.