Enterprise Data Management, Part 1 An In-Depth Look at Enterprise Information Integration (EII)

Originally published 6 September 2006

Over the last two years, the world of enterprise data management has seen a major market shake up that is still going on. Data integration tools, metadata management, data quality and data modelling have all been changing rapidly in addition to the marketplace itself consolidating. This article focuses on one of these areas – data integration and, in particular, the use of enterprise information integration (EII) technologies.

Data integration tools first appeared in the market under the banner of extract, transform and load (ETL) tools aimed at data integration in the data warehousing market. Since then, we have seen the need for data integration stretch way beyond data warehousing into all kinds of uses such as data migration, data synchronisation and creation of master data management data hubs, etc.

What is Enterprise Information Integration (EII)?
In addition to the batch-oriented ETL processing, it has become evident more recently that many different applications and portals require on-demand integration of data that may reside in disparate operational and analytical systems, as well as in content management systems and even on the public Internet. This need has been filled by the introduction of enterprise information integration (EII) technology. A key difference between EII and ETL is that with EII, data is integrated at run time and delivered to an application without the need for data staging. With ETL, the target is a database; but with EII, the target is an application, a reporting tool or a portal.

Figure 1 shows EII as presenting a virtual view of data that may be resident in disparate underlying systems. Effectively, EII technology makes it look like the data is integrated in a single data store even though it isn’t.

EII is in demand mainly because data is not always consolidated and in one place and that disparate data may be needed for:

  • Reporting
  • Integration of business performance management tools with multiple line-of-business business intelligence (BI) systems
  • Provision of integrated disparate structured and unstructured data into a portal (e.g., integration of news items)
  • Heterogeneous replication
  • Federated search

Generally speaking there are two main approaches to EII. These are model-driven federated query EII and, secondly, ETL tools that have been enhanced to support EII via dataflow-driven data integration services. I would like to discuss both of these.  

Figure 1: Enterprise Information Integration (EII)

You could argue that a third approach is distributed databases – something that was more a topic of discussion in the late ‘80s and early ‘90s whereby relational DBMS products were extended to support distributed query processing.

Looking at the federated query EII first, the marketplace has seen a number of vendors emerge in the last two years. In alphabetical order these include:

Federated query EII products work by presenting a virtual model of integrated data. The process of creating these virtual views of underlying data involves four steps:

  1. Define common integrated data model for the virtual views
  2. Define the data sources and connect to them
  3. Define mappings from source systems to common virtual view (this generates a federated query)
  4. Query the virtual view(s)

Figure 2: Federated Query EII

It is also the case that multiple virtual views can be created (by creating many different federated queries) such that different users and applications can have different virtual views of the disparate data depending on their security profiles and business needs. Each federated query then dynamically creates the virtual view by integrating the needed disparate data at run time (Figure 3). 

Figure 3: Support of Multiple Virtual Views

Federated EII products can access a wide range of data sources. These include:

  • Relational DBMSs (e.g., via JDBC, ODBC or X/Query)

  • Flat files

  • Non-relational DBMSs

  • XML data sources (e.g., XML files, RSS feeds, MS Office documents [Office 2007])

  • Message queues

  • Web services

  • Content management systems

  • LDAP directories

It is also possible for EII servers to access applications via a message broker to get at application data via application adapters such as JCA adapters.

Note that from the list of data sources above, an underlying system being accessed by the EII server can be a Web service. Typically, this would be an application Web service that retrieves data from its application data store (e.g., database or file) on behalf of the EII server. In this case, the underlying system is “hiding” its own data from the EII server. Put another way, the EII server does not need to know the physical schema of the underlying system to retrieve data from that data source. Instead, it simply maps the XML schema returned by the Web service to its own virtual model and then invokes the Web service data source to get the data needed. Products such as Ipedo XIP make it easy to view XML data sources (whether such data sources are XML files or Web services returned XML) by making XML look like a simple table (Figure 4). Ipedo XML tables allow you to create virtual tables in a virtual model from an XML data source and treat it like it is a table when querying it with SQL, for example.  

Figure 4: Ipedo XIP Product Example

Accessing Web service interfaces as a data source is a way to get data out of packaged applications when perhaps the underlying data model is unfamiliar and/or subject to change between releases. Several vendors including Composite Software and Ipedo are now offering pre-built services or connectors to popular packaged applications to get at their data. This includes outsourced applications such as Salesforce.com.

Federated query EII setup and designing of virtual data models in typically done in a studio tool that allows a designer to define data sources, design a virtual model, visually map underlying data sources to the virtual model(s) and test federated queries to confirm results. An example of such a tool would be Composite Software’s Composite Studio that provides this kind of capability (Figure 5). Most products do provide such a facility (e.g., MetaMatrix Enterprise Designer, XAware XA-Designer).  

Figure 5: Composite Software's Composite Studio

Virtual views can themselves be queried via SQL and/or X/Query depending on the interfaces supported by the EII server product. When this happens, the federated query is invoked to dynamically integrate data at run time. If only a subset of that integrated data is needed, then the EII server will filter out unnecessary data as part of its optimisation process. Federated query EII servers are now becoming more efficient at doing this by exploiting techniques such as “push down optimisation” whereby filters are “pushed down” to the underlying system to get them to filter out non-qualifying data before it is returned to the EII server for integration with other data subsets gathered from other systems. This kind of support helps performance and is typically implemented on underlying systems that are storing data in relational DBMSs.

In addition, in several EII products, federated queries can themselves be published as Web services and invoked via industry standard simple object access protocol (SOAP). In this case, the EII server would return a result with an XML schema and XML tags that match the virtual view. The result of a federated query could come back as an SQL result set, as XML or other formats such as HTML or CSV file format depending on the capabilities of the EII server. This result set is rendered using the data names, definitions and schema defined in the virtual view.

Shared Business Vocabularies
One of the strengths of model-driven federated query EII is the ability to introduce common data names and data definitions into a virtual model sitting above disparate systems – all with many different data names and data definitions for the same data (Figure 6). I use the term shared business vocabulary (SBV) for this set of common data names and definitions. This is common metadata.  

Figure 6: Shared Business Vocabulary (SBV)

The demand for commonly understood data is rapidly rising to becoming strategically important for many companies. An SBV establishes common data definitions and data names for data that can be used in data models (logical, physical and virtual) to get consistency across multiple models. It can also be used in the case of EII in rendering data so that it can be presented in a form that is commonly understood by users and that can be consumed by composite applications. It is like leveling the playing field across multiple systems. Using an SBV with EII tools is not mandatory, but it is highly recommended as a critical success factor. There are several approaches to creating common vocabularies. These options include:

  1. Using pre-built vocabularies

    • Many standard vocabularies are often shipped with data integration and process/application integration products

    • You can also buy and download pre-built vocabularies from vertical industry standards bodies and import them into your integration products

  2. Buy pre-built enterprise data models from vendors

  3. Build your own common shared business vocabulary

Option 1 is most likely if integrating data for use by other businesses (B2B) while option 3 is more likely to be used in integrating data inside the enterprise.

New tools such as IBM’s Rational Data Architect can now speed the building of an SBV by automatically discovering data vocabularies, data meaning and data relationships in existing systems and then mapping them to common vocabularies. Once this is done, the mappings and virtual model can be automatically generated for use in an EII tool. Rational Data Architect can do this today for IBM’s WebSphere Information Integrator EII product.

In Part 2 of this article, we will look at the uses of federated query EII and at ETL tools that have been extended to support this capability.

SOURCE: Enterprise Data Management, Part 1

Recent articles by Mike Ferguson

 

Comments

Want to post a comment? Login or become a member today!

Posted 4 April 2009 by paul.vanderlinden@hotmail.com

Hi Mike, excellent article! Just some questions: what about data manipulation? Suppose that before you can pull data together, you have to compare two sources with almost the same data (customer data for instance) to determine (on a record level) which has the latest information - where is this done? Or if the data type or length has to be changed before it can be used ? Another question I have concerns historical data. I´m guessing that the assumption is that this is still being dealt with in a data warehouse? Lastly, you address the subject of uniform metadata, but what about uniform masterdata. If the same data is structured differently in the underlying source systems this would still pose a problem - despite having uniform metadata?

Thank you dor your kind reaction.

Is this comment inappropriate? Click here to flag this comment.