business intelligence resources
Intelligent Integration Depends on Business Intelligence, Part 2
Listen to the audio version of this article
by Barry Devlin
Published: 7 May 2008
Building on Part 1 of this series, this article covers the skills, techniques and knowledge that business intelligence professionals can bring to the current and ongoing SOA craze.

In Part 1 of this series, I showed that enterprise-wide application integration projects based on service-oriented architecture (SOA) bear an uncanny resemblance to the enterprise data warehouse projects that business intelligence (BI) departments have been undertaking for years. The staged roll-out approach to SOA described in the previous article and repeated in Figure 1 here is based closely on the experience of data warehouse projects from as far back as the mid-‘90s, as can be seen if you compare it with Figures 15-2 and 15-3 in my 1997 book, Data Warehouse – from Architecture to Implementation (Addison-Wesley). This observation clearly begs the question: What other skills, techniques and knowledge can BI professionals bring to the current and ongoing SOA craze? This article discusses one of the key crossover areas: data modeling.

alt

Figure 1: Rolling Out an Integration Infrastructure

Data Modeling for (SOA) Dummies

From the earliest days of data warehousing, data modeling has been a core skill in the BI implementation team. It has two very different, but complementary aspects. The first aspect might be called “pure” data modeling, as its purpose is to understand the data needs of the organization in breadth and depth unsullied by implementation restrictions or shortcuts. The second aspect might just as validly be termed “data archaeology” as data modeling, as it involves digging into the data implementation in the operational environment and attempting to exhume a valid and viable data model from all the implementation compromises and “temporary” fixes that are now the operational databases.

Both of these sets of data modeling skills, particularly the latter, are of immense benefit to an SOA implementation. With a focus still strongly planted in the operational world, most SOA implementers are more at home in the world of process modeling than the data component. The data is perceived to be subservient to the process, because it is the process and the results of the process that provide directly visible business value in the operational world. Furthermore, with their eyes firmly fixed on performance and response times of the individual applications, operational systems data modelers and database designers seldom ask the question: Can my data be easily and validly used outside the context of my own application?

Clearly, it is with this precise question that data warehouse designers have struggled for years. The high-level model of the warehouse must, by definition, be an application- and usage-neutral representation of the business’ data needs. When we turn to SOA, this type of high-level, enterprise-wide data model is what is needed to underpin the messages that traverse the enterprise service bus if they are to be commonly understood and correctly and consistently used by all the services on the bus. The enterprise data model is a lingua franca of the warehousing world; an equivalent is required in SOA, and who better to undertake its definition than the modelers who honed their skills in the warehouse? And while the SOA enterprise data model will never be implemented in a database, it will certainly be made manifest in the service definitions and interfaces exposed by the web services on the bus.

In practice, if a sufficiently broad and generic enterprise data model has been developed (or, more likely these days, bought and customized) for the data warehouse, this same model will become the basis for the model required by SOA. In the data warehousing environment, industry data models have become increasingly accepted as a better basis for defining an enterprise data model than starting from scratch with a blank sheet of paper. Numerous vendors, such as Teradata, ADRM Software, Universal Data Models, and Adastra Corporation, to name but a few, have developed generic data models by industry or business function that can be customized to individual needs. While generally sufficient for a data warehouse project, industry data models on their own will not meet the needs of an SOA project. Industry models that cover data and process, such as those pioneered by IBM, are required.

At the top of such an industry model, illustrated in Figure 2, is the business model, which, by definition, covers both process and data, since both are required for a viable business. Beneath it are the process and data models that ultimately must be joined together to provide a working service-oriented model for the SOA implementation. Such models can tie together the knowledge and skills of the previously disparate data and process modeling teams and provide a common framework against which the SOA rollout can proceed.

alt

Figure 2: Industry Models

And what of the “data archaeology” skill mentioned earlier? While defining an agreed and usable enterprise data and process model is certainly no small task, there is an even bigger task that all SOA projects must face: sourcing the services that hang off the bus. This is the part of the process that SOA vendors and consultants like to talk about the least, because the primary source for services for most businesses is from their old operational systems. These overburdened systems, originally designed as monolithic applications, built by staff long since departed in languages no longer in favor and perhaps repurposed several times already as business demands changed, present a substantial set of challenges.

On the one hand, they encode key elements of the business process and are thus vital components of any worthwhile service-oriented architecture. On the other hand, they were never designed to be componentized or to work with one another. Significant and often poorly documented dependencies exist between different parts of the application. Business meaning is often deeply embedded in non-intuitive single byte codes in the databases. Maintenance over many years has disrupted prior logic flows, leaving dead code in some places and code that behaves very differently to the original specifications in other parts of the system. My discussions with SOA consultants and vendors have not uncovered any magic solution to these issues. There is hope, however.

The data archaeologists of the BI implementation team have been up to their elbows in one aspect of this mucky problem since we started extracting data into data warehouses in the eighties. Understanding and documenting the real structure and content, however dysfunctional, of the operational databases is a key first step in the process of breaking these monoliths apart. As the data structures are teased out, we begin to see the functional boundaries of the original application. We also begin to imagine the possible service interface definitions that a componentized application could provide.

The data archaeologists probably already know more than many on the operational IT team about how the data of the operational systems really works: how they maintain internal consistency, where they have been patched over the years to overcome problems or add new function and where the natural fault lines in the data exist that might allow componentization. Data profiling tools used in support of the data warehouse population can be reused in the SOA context. These tools, from vendors such as Informatica, Trillium, DataFlux and others, automate the process of finding tacit meaning embedded in the data. For example, implicit data relationships between different fields (field A has value x if and only if field B is greater than value y) may indicate a piece of undocumented business process that needs to be preserved when the application is componentized.

Perhaps the hardest thing for these BI modelers to come to terms with will be the realization that it will now be possible, and even necessary, to modify the operational components to create clean and reliable interfaces. When they were modeling for the warehouse, changes to the operational environment were proposed only as a very last resort. And even then, they were often rejected. The task of cleansing and reconstituting the operational data was pushed up into the extract, transform and load (ETL) layer of the warehouse. Now, as the SOA implementation proceeds, the operational systems themselves will have to clean up their own act. And a necessary corollary is that the ETL process of the warehouse will have to respond. It will become simpler and cleaner. But making it so will require additional effort.

Coming Next

A look at some further knowledge, skills and techniques that BI developers can bring across to SOA, with particular emphasis on what happens at the user interface.


Recent articles by Barry Devlin

Barry Devlin -

Dr. Barry Devlin is among the foremost authorities in the world on business insight and data warehousing. He was responsible for the definition of IBM's data warehouse architecture in the mid '80s and authored the first paper on the topic in the IBM Systems Journal in 1988. He is a widely respected consultant and lecturer on this and related topics, and author of the comprehensive book, Data Warehouse: From Architecture to Implementation published by Addison-Wesley in 1997.

Over the past few years, Barry has extended his interest to cover the wider field of a fully integrated business, covering informational, operational and collaborative environments and, in particular, how to present the end user with an holistic experience of the business through IT.

Barry has worked in the IT industry for more than 25 years, mainly as a Distinguished Engineer for IBM in Dublin, Ireland. He is now founder and principal of 9sight Consulting, specializing in the human, organizational and IT implications and design of deep business insight solutions.

 

If you found this article helpful and would like to receive the latest insights each month from Mike Ferguson and other experts featured on the Business Intelligence Network, please subscribe to the UK Business Intelligence Network Newsletter.