Print this Story
ADVERTISEMENT
business intelligence resources
Information Integration and Master Data Management

by Mike Ferguson
Published: 22 August 2006
Moving from where we are now to a more simple way of operating is a major undertaking for many enterprises.

Editor's note: An abbreviated version of this article is published at: http://www.intelligententerprise.com/showArticle.jhtml?articleID=179101909.

The business pressure on IT departments to simplify has never been greater. Probably the most frequently used word I have heard from business executives in that last year with respect to this is the word “common.” Business wants common user interfaces, common business processes, common application functionality, common tools and services, common data and common data definitions (metadata). Moving from where we are now toward this much more simple way of operating is a major undertaking for many enterprises. It is a business integration exercise equivalent to a corporate Atkins Diet. However, the call for business integration is no surprise. The business benefits that integration brings include potentially significant reductions in costs, separation of processes from applications and major improvements in business performance. In looking at the business integration problem, there are five main levels of integration that need to be tackled by most businesses. These are shown in Figure 1 and include:

  • User interface integration
  • People integration
  • Business process integration
  • Application integration
  • Data and Metadata integration

Figure 1

Note that different technologies are needed for each of these levels of integration, but that ultimately all of these layers need to work together to solve the business integration problem. The purpose of this article is to address the bottom layer – data and metadata integration and, in particular, the area of master data management.

The use of data integration technologies started in the world of data warehousing when we needed to integrate data from multiple operational systems to create historical single views of customer data, product data and asset data, as well as keep a history of transactions so that we could analyse business transactional activity by multiple dimensions over time. Many companies historically track changes to this dimensional data by implementing support for slowly changing dimensions. The dimensional data (e.g., customer data, product date etc.) housed in BI systems is in fact a copy of master data. The use of integrated master data (referred to as dimensional data in BI systems) allows us to measure, report on and analyse business performance across the business. Data warehousing still remains the dominant area for data integration today.

However, over the last few years, the demand for data integration has steadily grown in other areas. In operational systems, the client/server and open systems era resulted in operational systems springing up on all kinds of platforms and gave rise to duplication of functionality and fractured subsets of operational data across multiple operational systems and data stores. This has caused inefficiencies and overspend in many operational processes. While we have known about this problem for years we have not been very successful in dealing with it. Some companies have purchased enterprise application integration (EAI) products and asynchronous message queuing products such as IBM WebSphere MQ, Tibco, Sonic, etc., to try to keep copies of operational data synchronised (See Figure 2).

Figure 2

While this had had some success, applications with no application programming interfaces (APIs) could not be included; thus, batch file update processing has always continued to be needed.

It has been the arrival of the Web that has made business executives realise that there is potentially a way back to simplicity. If the Web can allow everyone, irrespective of their location, to gain access to common processes, common application services and common information via a Web browser, then why do companies need multiple application user interfaces, multiple versions of the same process, multiple applications with duplicate application functionality and multiple versions of fractured operational data? Would it not be possible to share common operational data across applications and access common services via the Web? This realisation that there is a possible way out of the fractured complexity of operational systems has sparked a massive demand for operational data consolidation and integration, as well as master data management.

In addition, the emergence of enterprise content management systems (ECMSs) has sparked the need to integrate unstructured documents, Web content, collaborations and digital media and place it where it can be managed by an ECMS. Vendors such as Vamosa, Kapow, IBM and Microsoft are all starting to address this need.

Therefore, today we have a need for enterprise data and metadata integration that spans all kinds of systems. This is shown on the enterprise architecture diagram in Figure 3 as the bottom horizontal layer.

Figure 3

The difference here is that enterprise data integration has now grown to include a suite of technologies that can manage data and metadata integration for any kind of system. This suite includes:

  • A metadata repository to hold a definition of a shared business vocabulary (common metadata – data names, data definitions and data integrity rules)
  • Data modelling services
  • Metadata discovery services to identify disparate definitions in disparate systems
  • Metadata mapping services – to map disparate definitions to the shared business vocabulary
  • Data quality profiling services
  • Data quality cleansing services
  • Batch and event-driven ETL for data consolidation
  • EII for data federation and heterogeneous replication
  • Integration with message brokers for data synchronisation

Approaches to Data Integration
Figure 4 shows three main approaches to data integration.

Figure 4

Besides the data synchronisation approach shown in Figure 2, Figure 4 also shows the extract, transform and load (ETL) option (data consolidation) and the enterprise information integration (EII) option (data federation). Enterprise information integration involves creating a virtual view over multiple underlying systems and uses federated queries or data integration services to integrate the needed data on-demand. Many companies are looking at enterprise information integration as a solution for regulatory reporting when the data needed for reports is in multiple underlying systems (See Figure 5). In addition, EII is also being used to integrate performance management software (sometimes called corporate performance management/CPM) with multiple underlying BI systems.

Figure 5

Consolidating Operational Data
In addition to regulatory reporting many companies are trying to simplify data by moving toward fewer copies of operational data as well as more holistic views of core data such as customer, product, asset, etc. This consolidation of operational data is part of a larger business integration initiative. Consolidating operational data has advantages in that it reduces the overall complexity of maintaining consistency across operational systems by reducing the amount of data synchronisation needed. Some companies have opted to consolidate operational data by replacing many legacy operational applications by a common packaged application suite (e.g., Oracle or SAP) where multiple application modules share data. However, a complete “rip and replace” strategy is not affordable to all and may not be a good solution for many companies. Therefore, others are trying to consolidate operational data by systematically re-engineering operational applications in order to share common master data and common services. In this case, it is the common services that are used to create and maintain this shared data (see Figure 6) that are called by lighter-weight reengineered application services with redundant logic removed.

Figure 6

The main barrier to achieving shared data and shared data services across all core operational applications is the fact that most companies have a mix of packaged and custom-built systems. Possible solutions to this might be:

  1. Use EII to create a virtual shared operational database with different mappings for different applications.
  2. Make one packaged application data store the master and employ packaged master data management (MDM) solutions to manage synchronisation of data with other packaged and custom built application data stores as required.
  3. Consolidate operational master data (e.g., product, customer, asset, employee etc.) and create common services on top of these master data sets that all systems must invoke to maintain that data and then:
    • Re-engineer custom-built operational applications to remove redundant data and to access and maintain shared master data only via shared master data services.
    • Synchronise changes to master data with packaged applications via a message broker.

At this stage, it is clear that many do not trust an EII approach to handling this because many EII products have no support for distributed transaction processing. Option 2 or 3 will more likely be adopted depending on the dominance of particular packaged applications in the enterprise.

Master Data Management
It is the demand to simplify and consolidate operational data in particular that has raised interest and demand for master data management (MDM). Master data is the core kernel data entities of a business. Examples are data about customers, employees, products, and assets that are used by operational and BI systems alike. Some would argue that master data includes all operational data, business intelligence and unstructured information associated with a subject (e.g., customer, product). However, in my client base, operational data is where the heart of the master data management discussion is right now.

There is no question that master data management is about consistency and synchronisation of core data; however, equally, I do not have a single customer that wants to leave their operational data scattered to the wind the way it is. Therefore, I believe that for most companies, an MDM strategy (which includes customer data integration as an example) is one of consolidation and synchronisation so that master data is created and maintained centrally via common services with synchronisation to packaged operational systems to update them with changes.

Tackling this problem does, however, require analysis to be done to find out answers to the following questions:

  • In what processes is master data accessed and/or changed?
  • Who changes it via what applications?
  • What systems hold versions of master data?
  • How are other system versions synchronised when a version of the master data changes?
  • What are all the different metadata definitions for master data?
  • Is there a company standard for master data definitions and a master data vocabulary?
  • What is the impact of inconsistency on operational performance and strategic performance targets?
  • What systems are affected when master data changes?
  • Can any application change their version of master data via their own business logic and if so, how are other systems notified of changes to synchronise them and how are conflicts resolved?
  • How many copies of master data are there?
  • What data needs to be integrated to create a consolidated view of master data?
  • Can EII be used to create a virtual view of master data to support reporting for example?

Once answers to these questions are obtained, then master data can be created either by purchasing a product to manage it (from vendors such as Hyperion, IBM, Oracle or SAP) or alternatively by doing it yourself. If the latter option is chosen then master data management needs to exploit the following components of the data integration suite discussed earlier:

  • Shared business vocabulary and the metadata repository
  • Data integration to:
    • Integrate and migrate disparate operational data from custom built systems to create master data sets, and
    • Acquire data from the shared master data store(s) to populate dimensional data in the data warehouse and data marts of the BI system.
  • Data quality software will be used to maintain high quality master data
  • Master data Web services.
  • Integration of data integration software with a message broker to synchronise packaged operational systems that make use of master data.

The steps to building your own MDM solution include the following:

  1. Naming, definitions and structuring of master data.
  2. Defining data integrity rules of master data.
  3. Identifying (discovering) disparate versions of master data.
  4. Mapping disparate versions of master data to the shared business vocabulary (cross-referencing data) definition of it.
  5. Assessing data quality in disparate systems containing versions of master data.
  6. Defining cleanup and translation rules for disparate master data versions.
  7. Data integration to create common master data.
  8. Data sharing of common master data across applications.

Looking at this in more detail, the first thing that needs to happen is that all data entities describing master data need to defined using common definitions, common data names and common data integrity rules. I prefer to use the term shared business vocabulary (SBV) for this.

The SBV should include:

  • Common data names
  • Common data definitions
  • Common data integrity rules

Note that, if necessary, different versions of master data will be maintained, which means that historical tracking of changes to master data definition is also needed. The SBV should be held in XML form in a metadata repository (Figure 7) that integrates with data modelling tools, data integration tools, message brokers, data quality software, BI tools and other products preferably using industry standards. The reason this is necessary is because the definition of master data is needed by so many infrastructure technologies.

Figure 7

Once this has been done, we then need to discover what data definitions exist that represent versions of this master data in different systems. These data definitions need to be identified. Also, the relationships between the discovered disparate metadata definitions need to be understood to work out precisely what data items in what disparate systems refer to one and the same thing (e.g., all the different data items in different systems that describe a customer name). Technologies such as InformaticaIBM Rational Data Architect and Sypherlink are examples of tools that can help with this kind of exercise.

Once this has been done, we can map the disparate data to master data definitions, sample data sources to profile data quality in the disparate systems and then decide on rules to cleanse the data and transform it if necessary. At this point, master data consolidation can be implemented. Also, if disparate metadata has been identified and mappings to the shared business vocabulary definition of master data have been captured in a metadata repository, it becomes possible to generate artefacts (Figure 8) to translate application-specific versions of master data to common SBV form so that master data can be marked up in XML using shared business vocabulary definitions as it flows between systems in XML messages over an enterprise service bus. This helps keep data consistent and well understood everywhere it goes.

Figure 8

Data integration technology and data quality software can then be used to consolidate master data using the mappings, transformation and cleanup rules to construct this data. Note that a master data management solution consists of common integrated master data plus a set of common master data services that need to be used to maintain that data.

Processes then need to be put in place to control the maintenance, auditing and synchronisation of master data. Over time, custom-built operational applications then need to be gradually re-engineered to remove their redundant versions of master data and replacing any logic to update that data with calls common master data services if they want to maintain (update, insert, delete) shared integrated operational master data (Figure 9). Note that as we consolidate master data, it will then become a source used to supply dimensional data to data warehouses, and it will also be used to synchronise data in operational systems.

Figure 9

A question often remains about how to stop applications from causing conflicting updates to master data. One possible way that this might be achieved is by linking enterprise portals and applications to the enterprise service bus. As application user interfaces are re-engineered to integrate with enterprise portals, it becomes possible to link portlets with the applications via the enterprise service bus (ESB). In addition, the enterprise service bus is also integrated with application services and process management. In this way, when data is entered into the portal user interface, it is passed into the ESB/message broker for routing to all appropriate applications. The enterprise service bus can then guarantee that master data, operational packages and data warehouses all pick up the changes to the master data as and when it happens.

Figure 10

Changes to master data will trickle feed into the BI system to reflect changes over time in the analytical system. Event-driven data integration will handle the movement of changed master data into the data warehouse while data quality software acts as a “firewall” to guarantee accuracy and completeness. Data synchronisation via the ESB will keep packaged (and other) operational systems up to date.

There are a few choices when it comes to master data. These are:

  • Manage master data separately from any application and then synchronise all systems that need it when it changes.
  • Evolve toward custom-built operational applications maintaining shared master data via common services over a service bus and then synchronise all systems that need it when it changes.

Companies implementing their own solution are now looking for data modelling, metadata management, data integration and data quality software to be combined into a single platform to help solve the problem of master data management.


Recent articles by Mike Ferguson

Mike Ferguson -

Mike Ferguson is Managing Director of Intelligent Business Strategies Limited, a leading information technology analyst and consulting company. As lead analyst and consultant, he specializes in enterprise business intelligence, enterprise business integration, and enterprise portals. He can be contacted at +44 1625 520700 or via e-mail at mferguson@intelligentbusiness.biz.