BeyeNETWORK: Global coverage of the business intelligence ecosystem
Enterprise Data Management, Part 2

by Mike Ferguson
Published: 4 October 2006

(Article URL: http://www.b-eye-network.co.uk/view-articles/3419)

This installment continues to look at federated query enterprise information integration and at extract, transform and load tools that have been extended to support EII.

Read Part 1 of this series here.

Federated Query Enterprise Information Integration (EII) Deployment
Now that we understand how federated query enterprise information integration (EII) works, the next question is what kinds of applications is this form of EII best suited to? Figure 1 shows four examples of different enterprise information integration deployments.


Figure 1

 

These are:

  • Real-time data warehouse
  • Virtual operational data store (ODS)
  • Business intelligence integration
  • Federated search 

Please note that these are just examples and not an exhaustive list. Nevertheless they show that enterprise information integration can be used to integrate data for use in reporting, in integrating business intelligence (BI) from multiple systems for performance management scorecards and dashboards. They also show on-demand data integration for consumption by operational applications and (although not shown) enterprise portals could also make use of enterprise information integration to integrate data on demand. In addition, EII can even be used as a full-blown search engine if an EII product has a search interface or as a virtual content management system sitting above disparate underlying document management and content management systems and exposing the content as if it was all one system via a standard API such as JSR170. These are two areas where IBM is taking EII software with their WebSphere Information Integrator Product which has an OmniFind Edition (search) and a Content Edition (virtual content). 

Figure 1 also shows that ETL and EII can be used together. Several EII vendors now also own EII products (e.g., Business Objects and IBM) while others try to double up on a single product. It is also fair to say that the examples in Figure 1 suggest that EII is for read-only use. Today I believe that this is what the vast majority of companies are using EII for. It does not mean that EII products are incapable of offering update processing and we will discuss update processing of virtual views later. However, sticking with read-only for a moment, a very popular use of EII today is for reporting – especially regulatory and operational reporting (Figure 2) when the data needed is for such reports is not all in one place, and the budget (or appetite!) is not there to build yet another reporting database or data mart. Note that in the case of reporting, a BI system can be a data source in a virtual view. 

 


Figure 2
 

Archrivals in the business intelligence market, Business Objects and Cognos, are both selling an EII solution with their reporting products. Business Objects offer their Data Federator EII product with Crystal Reports (note that BusinessObjects Universes can be used on top of Data Federator), while Cognos offers Composite Software’s Information Server with Cognos ReportNet.  

EII Data Integration Web Services
As model-driven federated query products have gained traction, it has also been important to fit on-demand data integration into the wider world of a service-oriented architecture (SOA) in order to get maximum use out of this technology across a wide range of applications and portals in an industry standard way. Therefore, it makes a lot of sense if each federated query that integrates data and renders a virtual view can be saved and published as a Web service. This makes them on-demand data integration services. The idea here is that federated queries can be published to a UDDI registry or service directory so that they can be discovered and invoked by portals, applications, reporting tools and business processes using industry standard simple object access protocol (SOAP). One of the first companies to offer this was BEA with their AquaLogic Data Services Platform product. Figure 3 shows an example of how portlets in a portal can make use of EII data integration services.  

 


Figure 3
 

EII Data Integration Services for Unstructured Content
There are also specialist vendors in the world of unstructured content that can integrate content on-demand from disparate unstructured sources such as Web pages, document management systems and Web content management systems. These unstructured EII data integration Web services can also be used by portals and applications. Vendors in this area include Kapow and Vamosa. Re-purposing unstructured content on demand is also a key piece of the world or enterprise data management and something that is increasingly required to facilitate dynamic content in portals and other applications. It is a fair bet to say that it is only a matter of time before specialist vendors such as these two get sucked into the enterprise data management suites being put together by the larger software giants.

Update Processing and Federated Query EII
I touched earlier on the subject of update processing with EII. Not all products support the ability to do update, insert and delete via EII virtual views. Update via EII virtual views is still in its infancy in most companies today. EII products that do support transaction processing include:

  • BEA AquaLogic Data Services
  • Composite Information Server
  • Ipedo XIP 

Also, IBM will deliver this in Q4 this year.

Update processing in EII requires support for distributed transaction processing. This is bandied about like it is a new phenomenon but this takes me back 17 years to my old bosses Dr. Ted Codd and Chris Date who pioneered distributed database. Many of the requirements needed to solve this problem were documented then, so it is not new. We have seen technologies such as Tuxedo and the XA standard for distributed transaction processing many years ago to solve this. Nevertheless, when it comes to update processing in EII, it is the case that restrictions may apply to update processing via EII virtual views. For example, only updates to a single source may be allowed. Figure 4 shows a very simple example of a very basic problem whereby two different tables from two different DBMSs are included in a virtual view but only some columns are included from the underlying tables. What happens if columns in the underlying data structures that are NOT included in the EII view have an integrity constraint such as NOT NULL?  

 


Figure 4
 

Of course, in this case, the underlying DBMS would reject the insert unless the EII server knew that it had to provide a value and did it automatically on the basis of a user defined rule in the EII server that allowed the server to cater for this. Alternatively (and more likely today), you may find that the EII server rejects the insert because part of the distributed transaction failed in an underlying data source. Given that each product varies here, testing of products in this area is strongly recommended to understand their capabilities in distributed transaction processing. Vendors with a strong track record of distributed transaction management are certainly most likely to solve these problems more rapidly. One thing that can be said is that if EII completely managed distributed transaction processing across heterogeneous data sources (a non-trivial problem), then it paves the way to solving a business problem with minimal disruption. This is the ability to consolidate data without changing applications.  

I have clients who would die for this capability – consolidating data without changing applications. This is because they have too many copies of data but no budget to rip existing applications apart and point them at shared operational data. Consider Figures 5 and 6 below.  

 


Figure 5

 
Figure 6 

These figures show how EII with update processing can help a company transition from disparate data in application specific data stores (Figure 5) to consolidated shared operational data shared among applications. In other words, the figures show the before and after transition facilitated by EII. In Figure 6, the common integrated view in EII is a 1-to-1 mapping with the underlying shared data store. You might argue that this is ambitious and I accept that comment, but one thing is for sure – you can’t just hide the mess under an EII layer. Or, as one customer put it to me, there is no point putting lipstick on a pig! People want “their mess” cleaned up and simplified, and this is one possible option for doing it that might not break the bank. Of course, some companies have looked at this problem and decided to buy their way out by going with a single enterprise packaged application suite (e.g., SAP, Oracle, etc.), but not all can afford that luxury.  

Strengths and Weaknesses of Federated Query EII

The pros and cons of federated query EII are as follows:

Pros

  • Particularly suited for operational and regulatory reporting and BI integration
  • On-demand integrated views of master data (possibly)
  • EII tools with full support for distributed transaction management could provide a way to consolidate data without changing applications 

Cons

  • Concurrent access? (workload management is missing from many tools)
  • Not designed for complex transformations, fuzzy matching and integrating high volumes of data
  • Semantics
  • Not a replacement for data warehousing
  • Performance will be an issue as more data sources are added
  • Update processing via EII is still in its infancy and subject to product specific restrictions
  • Underlying referential integrity and column constraints in data sources could make update processing complex or cause it to fail 

Performance is a question I am asked about a lot. In general, the more data sources an EII server must retrieve from, the higher the chances are that performance may be impacted. Many vendors have or are adding things such as caching, materialized views and the push down optimization to address performance – and, undoubtedly, more will follow.  

However, in my opinion, it is not just a performance issue. What is more important to me is semantics. In other words, is it valid (does it make business sense) to integrate specific data from disparate underlying business systems? Are we just joining data like stitching together pieces of meat? If there is one criticism to be laid at the feet of most vendors supporting EII real-time data integration, it is the lack of support for defining what the underlying data means. This takes me back to my relational roots when I worked for the late Dr. Codd – the inventor of the relational model. In the relational model, there is the concept of domains. Domains allow you to create user-defined data types with data integrity constraints and then to define data attributes of a specific domain. In this way we absolutely know the data being joined is semantically valid. The same applies in EII – we should know without doubt that it makes sense to integrate specific data and that the system should prevent us from joining data that does not make business sense. If relational DBMS vendors had led the industry by implementing domains in relational DBMS products, the world of understanding data would be a much better place. Instead, they ignored a fundamental foundation stone in the relational model and we are where we are today. If the large RDBMS vendors didn’t see the need, then it doesn’t set a great example for other vendors to follow.  

Note also that I question the ability of an EII tool to integrate master data on demand for one main reason and that is that an EII tool would have to support global identifiers and map application-specific identifiers to each global identifier to make this happen. Readers should look closely at whether or not EII vendors provide this capability and not just take it for granted.  

Extending ETL to Cater for EII
The other option to federated query EII is to extend ETL tools to do this. Many vendors have opted for this route including:

In addition, Business Objects Data Integrator and IBM WebSphere DataStage SOA Edition are also capable of this, but these two vendors are now supporting EII using their respective Data Federator and WebSphere Information Integrator products referred to earlier.  

ETL tools that have been extended to support EII, do this by publishing each ETL workflow as a Web service.  

 


Figure 7
 

Applications and processes can then invoke these ETL jobs (data integration services) on demand using standard SOAP requests. When this happens, the ETL job runs, but in this case, there is no target database as per the classic use of ETL. Instead, the data is acquired, integrated and simply returned in XML form to the requestor from the data integration service. Therefore, an ETL tool needs to support XML output to make this happen.  

The pros and cons of this approach are as follows:

Pros

  • Companies can get more value out of their existing ETL investment
  • Data quality capability built into ETL tools  

Cons

  • Mixed workload of complex ETL for building data warehouses and “lightweight ETL” for on demand data integration may be difficult to manage
  • Update processing ???  

As can be clearly seen from the cons above you may well ask how such ETL workflow-based EII solutions support update processing with this approach? I don’t know. I think you have to assume they don’t.  

Conclusions
We have covered a lot in this paper on EII. The trend with EII is already clear; it is a cog in a bigger wheel. EII is a component tool being added to enterprise data integration suites that handle any kind of data integration and metadata management. These enterprise data integration suites will be used for many applications including building data warehouses, integrating operational data on demand and master data management. These suites will include:

  • Data and metadata connectors to access disparate data and metadata sources and targets
  • Metadata and metadata relationship discovery and identification tools
  • Data modelling tools
  • Enterprise data quality tools for data profiling and rules-driven data cleansing
  • EII data integration software
  • Event-driven real time and batch ETL data integration software
  • Enterprise content and records management software (ECMS) for life cycle management of documents, Web content and digital media
  • Industry standard APIs (e.g., Web services, JSR 170 and JSR 283, JMS)
  • Data lineage for data audit ability and traceability
  • Data security and administration 

The race is on to complete the suite; so you should expect to see more consolidation in this market. While this all appears quite impressive, you also have to ask yourself what it is you need. 

One thing is for sure. Although a shared business vocabulary is not mandatory, it is and will become a critical success factor to consistent understanding of data. Enterprise integration and standalone EII vendors are offering vertical industry models to speed up creation of shared business vocabularies.  

Today, EII on-demand data integration is particularly well suited for:

  • Operational and regulatory reporting
  • BI integration scorecards
  • On demand delivery of disparate content into portlets
  • And perhaps on demand integrated views of master data (global identifier support is needed) 

If you want transaction processing via EII, purchase EII tools from vendors who have a long track record in distributed transaction management. 

If you found this article helpful and would like to receive the latest insights each month from Mike Ferguson and other experts featured on the Business Intelligence Network, please subscribe to the UK Business Intelligence Network Newsletter.


Recent articles by Mike Ferguson

Mike Ferguson -

Mike Ferguson is Managing Director of Intelligent Business Strategies Limited, a leading information technology analyst and consulting company. As lead analyst and consultant, he specializes in enterprise business intelligence, enterprise business integration, and enterprise portals. He can be contacted at +44 1625 520700 or via e-mail at mferguson@intelligentbusiness.biz.