Editor's note: An abbreviated version of this article is published at: http://www.intelligententerprise.com/showArticle.jhtml?articleID=179101909.
The business pressure on IT departments to simplify has never been greater. Probably the most frequently used word I have heard from business executives in that last year with respect to this is the word “common.” Business wants common user interfaces, common business processes, common application functionality, common tools and services, common data and common data definitions (metadata). Moving from where we are now toward this much more simple way of operating is a major undertaking for many enterprises. It is a business integration exercise equivalent to a corporate Atkins Diet. However, the call for business integration is no surprise. The business benefits that integration brings include potentially significant reductions in costs, separation of processes from applications and major improvements in business performance. In looking at the business integration problem, there are five main levels of integration that need to be tackled by most businesses. These are shown in Figure 1 and include:
Figure 1
Note that different technologies are needed for each of these levels of integration, but that ultimately all of these layers need to work together to solve the business integration problem. The purpose of this article is to address the bottom layer – data and metadata integration and, in particular, the area of master data management.
The use of data integration technologies started in the world of data warehousing when we needed to integrate data from multiple operational systems to create historical single views of customer data, product data and asset data, as well as keep a history of transactions so that we could analyse business transactional activity by multiple dimensions over time. Many companies historically track changes to this dimensional data by implementing support for slowly changing dimensions. The dimensional data (e.g., customer data, product date etc.) housed in BI systems is in fact a copy of master data. The use of integrated master data (referred to as dimensional data in BI systems) allows us to measure, report on and analyse business performance across the business. Data warehousing still remains the dominant area for data integration today.
However, over the last few years, the demand for data integration has steadily grown in other areas. In operational systems, the client/server and open systems era resulted in operational systems springing up on all kinds of platforms and gave rise to duplication of functionality and fractured subsets of operational data across multiple operational systems and data stores. This has caused inefficiencies and overspend in many operational processes. While we have known about this problem for years we have not been very successful in dealing with it. Some companies have purchased enterprise application integration (EAI) products and asynchronous message queuing products such as IBM WebSphere MQ, Tibco, Sonic, etc., to try to keep copies of operational data synchronised (See Figure 2).
Figure 2
While this had had some success, applications with no application programming interfaces (APIs) could not be included; thus, batch file update processing has always continued to be needed.
It has been the arrival of the Web that has made business executives realise that there is potentially a way back to simplicity. If the Web can allow everyone, irrespective of their location, to gain access to common processes, common application services and common information via a Web browser, then why do companies need multiple application user interfaces, multiple versions of the same process, multiple applications with duplicate application functionality and multiple versions of fractured operational data? Would it not be possible to share common operational data across applications and access common services via the Web? This realisation that there is a possible way out of the fractured complexity of operational systems has sparked a massive demand for operational data consolidation and integration, as well as master data management.
In addition, the emergence of enterprise content management systems (ECMSs) has sparked the need to integrate unstructured documents, Web content, collaborations and digital media and place it where it can be managed by an ECMS. Vendors such as Vamosa, Kapow, IBM and Microsoft are all starting to address this need.
Therefore, today we have a need for enterprise data and metadata integration that spans all kinds of systems. This is shown on the enterprise architecture diagram in Figure 3 as the bottom horizontal layer.
Figure 3
The difference here is that enterprise data integration has now grown to include a suite of technologies that can manage data and metadata integration for any kind of system. This suite includes:
Approaches to Data Integration
Figure 4 shows three main approaches to data integration.
Figure 4
Besides the data synchronisation approach shown in Figure 2, Figure 4 also shows the extract, transform and load (ETL) option (data consolidation) and the enterprise information integration (EII) option (data federation). Enterprise information integration involves creating a virtual view over multiple underlying systems and uses federated queries or data integration services to integrate the needed data on-demand. Many companies are looking at enterprise information integration as a solution for regulatory reporting when the data needed for reports is in multiple underlying systems (See Figure 5). In addition, EII is also being used to integrate performance management software (sometimes called corporate performance management/CPM) with multiple underlying BI systems.
Figure 5
Consolidating Operational Data
In addition to regulatory reporting many companies are trying to simplify data by moving toward fewer copies of operational data as well as
more holistic views of core data such as customer, product, asset, etc. This consolidation of operational data is part of a larger business integration initiative. Consolidating operational data
has advantages in that it reduces the overall complexity of maintaining consistency across operational systems by reducing the amount of data synchronisation needed. Some companies have opted to
consolidate operational data by replacing many legacy operational applications by a common packaged application suite (e.g., Oracle or
SAP) where multiple application modules share data. However, a complete “rip and replace” strategy is not affordable to
all and may not be a good solution for many companies. Therefore, others are trying to consolidate operational data by systematically re-engineering operational applications in order to share
common master data and common services. In this case, it is the common services that are used to create and maintain this shared data (see Figure 6) that are called by lighter-weight reengineered
application services with redundant logic removed.
Figure 6
The main barrier to achieving shared data and shared data services across all core operational applications is the fact that most companies have a mix of packaged and custom-built systems. Possible solutions to this might be:
At this stage, it is clear that many do not trust an EII approach to handling this because many EII products have no support for distributed transaction processing. Option 2 or 3 will more likely be adopted depending on the dominance of particular packaged applications in the enterprise.
Master Data Management
It is the demand to simplify and consolidate operational data in particular that has raised interest and demand for master data management (MDM). Master data is the core kernel data entities of a
business. Examples are data about customers, employees, products, and assets that are used by operational and BI systems alike. Some would argue that master data includes all operational data,
business intelligence and unstructured information associated with a subject (e.g., customer, product). However, in my client base, operational data is where the heart of the master data management
discussion is right now.
There is no question that master data management is about consistency and synchronisation of core data; however, equally, I do not have a single customer that wants to leave their operational data scattered to the wind the way it is. Therefore, I believe that for most companies, an MDM strategy (which includes customer data integration as an example) is one of consolidation and synchronisation so that master data is created and maintained centrally via common services with synchronisation to packaged operational systems to update them with changes.
Tackling this problem does, however, require analysis to be done to find out answers to the following questions:
Once answers to these questions are obtained, then master data can be created either by purchasing a product to manage it (from vendors such as Hyperion, IBM, Oracle or SAP) or alternatively by doing it yourself. If the latter option is chosen then master data management needs to exploit the following components of the data integration suite discussed earlier:
The steps to building your own MDM solution include the following:
Looking at this in more detail, the first thing that needs to happen is that all data entities describing master data need to defined using common definitions, common data names and common data integrity rules. I prefer to use the term shared business vocabulary (SBV) for this.
The SBV should include:
Note that, if necessary, different versions of master data will be maintained, which means that historical tracking of changes to master data definition is also needed. The SBV should be held in XML form in a metadata repository (Figure 7) that integrates with data modelling tools, data integration tools, message brokers, data quality software, BI tools and other products preferably using industry standards. The reason this is necessary is because the definition of master data is needed by so many infrastructure technologies.
Figure 7
Once this has been done, we then need to discover what data definitions exist that represent versions of this master data in different systems. These data definitions need to be identified. Also, the relationships between the discovered disparate metadata definitions need to be understood to work out precisely what data items in what disparate systems refer to one and the same thing (e.g., all the different data items in different systems that describe a customer name). Technologies such as Informatica, IBM Rational Data Architect and Sypherlink are examples of tools that can help with this kind of exercise.
Once this has been done, we can map the disparate data to master data definitions, sample data sources to profile data quality in the disparate systems and then decide on rules to cleanse the data and transform it if necessary. At this point, master data consolidation can be implemented. Also, if disparate metadata has been identified and mappings to the shared business vocabulary definition of master data have been captured in a metadata repository, it becomes possible to generate artefacts (Figure 8) to translate application-specific versions of master data to common SBV form so that master data can be marked up in XML using shared business vocabulary definitions as it flows between systems in XML messages over an enterprise service bus. This helps keep data consistent and well understood everywhere it goes.
Figure 8
Data integration technology and data quality software can then be used to consolidate master data using the mappings, transformation and cleanup rules to construct this data. Note that a master data management solution consists of common integrated master data plus a set of common master data services that need to be used to maintain that data.
Processes then need to be put in place to control the maintenance, auditing and synchronisation of master data. Over time, custom-built operational applications then need to be gradually re-engineered to remove their redundant versions of master data and replacing any logic to update that data with calls common master data services if they want to maintain (update, insert, delete) shared integrated operational master data (Figure 9). Note that as we consolidate master data, it will then become a source used to supply dimensional data to data warehouses, and it will also be used to synchronise data in operational systems.
Figure 9
A question often remains about how to stop applications from causing conflicting updates to master data. One possible way that this might be achieved is by linking enterprise portals and applications to the enterprise service bus. As application user interfaces are re-engineered to integrate with enterprise portals, it becomes possible to link portlets with the applications via the enterprise service bus (ESB). In addition, the enterprise service bus is also integrated with application services and process management. In this way, when data is entered into the portal user interface, it is passed into the ESB/message broker for routing to all appropriate applications. The enterprise service bus can then guarantee that master data, operational packages and data warehouses all pick up the changes to the master data as and when it happens.
Figure 10
Changes to master data will trickle feed into the BI system to reflect changes over time in the analytical system. Event-driven data integration will handle the movement of changed master data into the data warehouse while data quality software acts as a “firewall” to guarantee accuracy and completeness. Data synchronisation via the ESB will keep packaged (and other) operational systems up to date.
There are a few choices when it comes to master data. These are:
Companies implementing their own solution are now looking for data modelling, metadata management, data integration and data quality software to be combined into a single platform to help solve the problem of master data management.
Recent articles by Mike Ferguson
Mike Ferguson is Managing Director of Intelligent Business Strategies Limited, a leading information technology analyst and consulting company. As lead analyst and consultant, he specializes in enterprise business intelligence, enterprise business integration, and enterprise portals. He can be contacted at +44 1625 520700 or via e-mail at mferguson@intelligentbusiness.biz.