BeyeNETWORK: Global coverage of the business intelligence ecosystem
Business Intelligence Problems and the Abstraction-Translation Paradigm
Listen to the audio version of this article
by Malcolm Chisholm
Published: 4 June 2008

(Article URL: http://www.b-eye-network.co.uk/view-articles/7566)

Enterprise information architecture is complex and consists of many different perspectives, each of which is valid in its own right. Unfortunately, no single perspective corresponds to the complete reality of architecture.

Business intelligence (BI) is a tricky area to deal with. It involves collecting and integrating data produced by transactional applications and presenting this data in a way that can be used for specific reporting or analysis. Luckily, there are a lot of business intelligence technologies and methodologies that are quite mature and provide enormous help in building BI applications. Unfortunately, many BI applications have significant problems. At least some of these problems seem to stem from the interaction of data and architecture.
 
One of the issues about enterprise information architecture is being able to think about it clearly, precisely, and completely. The reality is that enterprise information architecture is complex and consists of many different perspectives, each of which is a valid concept in its own right. Unfortunately, no single one of these corresponds to the complete reality of architecture, and we are just going to have to build out our understanding of it piece by piece.

alt 


Figure 1: The Abstraction-Translation Paradigm

Figure 1 shows one such perspective, which I have called the Abstraction-Translation Paradigm of enterprise information architecture. It presents the processes by which transactional applications are created as passing through successive layers of abstraction and translation that ultimately enable business information to be stored as binary representation of facts in implemented databases. This data can then be moved into data marts for use in BI applications. However, it is still a binary representation of facts. To be used by a BI application, it has to be transformed and translated back through the equivalent layers required in the creation of the operational application that produced it. Finally, it emerges as business information again. Given the complexity of this round trip for information, we should be amazed that BI applications ever produce anything useful, rather than disappointed that they have shortcomings.

The Abstraction-Translation Paradigm

The leftmost stack in Figure 1 represents the layers of analysis and design needed to build an operational application, and the roles played by the individuals who are responsible for what happens in each layer. It is divided up into a column for data and another column for business rules. These end up as physically implemented databases and application logic respectively.

The process of implementing an operational application begins in the business, where a business analyst gathers business requirements, describes the current business processes, and identifies the data used in these processes. The artifacts that are produced ideally include use cases, workflows, and a conceptual data model with a glossary of business terms. These represent the business in what is termed the Conceptual Layer in Figure 1. Unfortunately, “conceptual” has all kinds of definitions, but here it is taken to mean a direct representation of the business.

Next, on the data side, a data analyst produces a data model. This contains an element of design. Even if it is for a current state, data modelers will always say they are producing a picture of how the business truly “sees” the data. Whatever truth there is in this assertion, we see business concepts like vendors, customers, and employees abstracted into “party” entities, inventions of surrogate keys, awkwardly named association entities, and so on. We have definitely left the Conceptual Level and entered the Logical Level here.

Continuing with data, the next layer we meet with is the Physical Level. This is the true realm of designers whose objective is to produce a physical data model. If a vendor-supplied package is being purchased, it is commonly thought that all of this has been done in advance. However, such packages normally have to be “configured” to make them work in any given client environment, and this too is a form of physical data modeling.

Eventually, the physical database and application are produced and handed over to the DBAs and production control in the Implementation Layer. These technicians can influence the architecture by deciding where to locate the database and application, how queries will physically be executed, and so on. They can be expected to have little detailed understanding of the business environment which uses the solution they look after. However, that solution works to combine the database with the application to move services and data back through all the layers of abstraction used to develop them, and deliver useable functionality to the business users, usually in the form of automation.

Eventually, the data from the application can be transported to other places in the enterprise for reuse. Data marts are an obvious example. In this data transport layer, there is heavy emphasis on transport concepts, like XML and middleware. The technicians involved are usually indifferent to the semantics of the data, and have little understanding of it. Like truck drivers delivering containers, they concern themselves with their vehicles, the roads they must travel, and how to offload at their destination. They have little idea of the significance of what is in the containers.

Abstraction and Translation

Each level in Figure 1 represents a different level of abstraction. The term “abstraction” usually means the separating of ideas from objects in philosophy. In Figure 1, it means extracting the concepts from the prior level that map to the components which have to be manipulated in the current level. For instance, a data analyst will have to take the concepts presented in the conceptual data model – which may be a text document – and put them into a data model in a CASE tool. Each layer in Figure 1 has its own set of components and concepts. Thus, the individuals who function at each layer tend to speak a different language than the individuals who function in the other layers. The process of abstraction – mapping concepts in the prior layer to components in the current layer – is matched by a process of translation, and the jargon used to express ideas in the prior layer is translated into the jargon used to express ideas in the current level. Hence the business “information” becomes the conceptual “data element”, then the logical “attribute”, then the physical “column”, and finally a “data value” in a database, and maybe an XML “document”. This chain illustrates how the same thing gets mapped to different concepts that are described in different technical languages.

A major problem is that the individuals who work in each layer focus heavily on the idiosyncrasies of the components they have to deal with and the tools that help them do their work. Hence a logical data modeler will be concerned about things like optionality and cardinality, what notation to express them in, and deciding upon which CASE tool to use – or even making sure the lines on their models do not cross. Such issues are of no concern to, say, a DBA trying to implement a database. There is some overlap between the layers, but there is a lot more that is distinct and unique about each one of them. Realization of this, however, tends to reinforce the technical specialist’s allegiance to his or her own technical sphere of competence. Preservation of understanding of the business data and rules tends to get crowded out.

The Implications for BI

There seems to be very little acceptance of the reality of the Abstraction-Translation Paradigm, and a general expectation that BI can simply be added into enterprise architecture. The right hand stack in Figure 1 shows a parallel process to that described previously for developing a BI application based on a data mart. Here, it is the business’ information requirements that are driving the development process. The technical actors tend to be specialists in the realm of data marts, data warehouses, BI tools, and so on.

The problem is that in the business intelligence application, the captured data has to be transformed again back through all the layers until it makes business sense, just as it had to be in the operational system. There are two difficulties:

  1. The BI environment tends to be constructed based on an understanding of the information needs of the BI users, with little or no knowledge of the abstractions and transformations that have taken place in the development of the operational solution.
  2. The data in the operational system’s database derives part of its semantic properties from the abstractions and transformations that occurred in the levels above it, and part from the constraints inherent in the application logic, which has traveled along a parallel path. These properties are thus not inherent in the data itself and cannot be transported with it. That is, the data is tightly coupled to the code of transactional applications, and loses meaning when removed from the code.

Until these problems can be effectively addressed, the technologies and methodologies available for building BI solutions are not of themselves going to guarantee success.
 
 

If you found this article helpful and would like to receive the latest insights each month from Mike Ferguson and other experts featured on the Business Intelligence Network, please subscribe to the UK Business Intelligence Network Newsletter.


Recent articles by Malcolm Chisholm

Malcolm Chisholm -

Malcolm Chisholm, Ph.D., has over 25 years of experience in enterprise information management and has worked in a wide range of sectors. He specializes in setting up and developing enterprise information management units, master data management and business rules. Malcolm has authored two books: Managing Reference Data in Enterprise Databases (Morgan Kaufmann, 2000) and How to Build a Business Rules Engine (Morgan Kaufmann, 2003).  He can be contacted at mchisholm@refdataportal.com.

Editor’s note: More Malcolm Chisholm articles, resources, news and events are available in the Business Intelligence Network's Malcolm Chisholm Channel. Be sure to visit today!