Oops! The input is malformed!
Originally published 20 December 2006
Over the last six months, the hype surrounding master data management (MDM) has been overwhelming at times. Yet it seems that there is still a lot of confusion around as to what MDM is and how it integrates with various core operational and analytical systems across the enterprise. This two-part article will focus primarily on the integration of MDM with business intelligence (BI). In the first part, I will look at what MDM is about and the types of MDM systems. The second part of the article will discuss how master data management can integrate with BI systems. I will also look at what kind of impact the introduction of an MDM system will have on existing data warehousing and business intelligence practices.
What is MDM?
Without wanting to labour the point about what MDM is, I nevertheless feel compelled to state my understanding of this relatively new technology area before I get into discussing integration of it with business intelligence. This is done mainly for the sake of clarity.
Master data management is a set of processes, policies, services and technologies used to create, maintain and manage data associated with a company’s core business entities as a system of record (SOR) for the enterprise. Core entities include customer, supplier, employee, asset, etc. Note that master data management is not associated with transaction data such as orders, for example.
Whether you build your own MDM system or buy a MDM solution from a MDM vendor in the marketplace, it should meet a number of key requirements. These include the ability to:
By defining and maintaining metadata, I mean that an enterprise-wide standard set of data names and data definitions should exist to describe each master data business entity. In addition, mappings from disparate systems holding “fractured” master data to standard master data definitions should also be defined. This last point, in some cases, can be a daunting task and cries out for tooling to help us automate the discovery of subsets of master data that are “out there” and to discover the relationships between these data so as to determine all the disparate definitions for the same thing. The good news here is that we are now seeing the emergence of such tooling from vendors like DataFlux, IBM, Informatica and Sypherlink.
Why Do We Need MDM Systems?
One of the most common questions I run into from my clients is why is it that we need yet another type of system? Why is master data management so important? One of the key reasons for getting control of master data is to improve business performance. Many companies today now realize that their master data is heavily duplicated and fractured across multiple line of business operational systems. If you simply take order entry and billing as two basic systems of any company, you realize right away that customer data, for example, is highly likely to be stored in both systems. But which one is the master? Is it possible, for example, to create new customers and maintain the same customer data in both of these systems? Also, how does data associated with core business entities in each of these systems remain synchronized as changes occur?
If you have subsets (some attributes) of a core business entity master data resident in multiple systems you are probably well aware of the problems not least of which is the issue of identity mania. Every system has a different identifier for the same master data. Take retail banking as an example. Figure 1 shows a picture of yours truly as an example of a customer that holds a VISA credit card, a VISA debit card, a Master Card credit card, a mortgage and a loan. To top all that, assume that I am also an employee at the Bank. Looking at this, you don’t have to be a rocket scientist to realize immediately that a customer is likely to have a different identifier in each of the systems holding data about them including, in this case, all the product-oriented line of business systems (where a customer holds that product) and in the HR system. Therefore, one thing that is clear is that an MDM system needs to support a global identifier (sometimes referred to as a surrogate key). Sound familiar? I can already hear you saying, “Yeah, yeah….been there, done that in my data warehouse years ago……” Hold that thought for a while, and I’ll come back to it.
Continuing on, it is also clear that all the disparate identifiers need to be mapped to the global identifier to allow a single version of a customer to be built. Of course, you quickly realise that this is not enough to solve the problem. This is because it is not just the identifiers that differ across systems housing subsets of master data. The problem is that all of the attribute data definitions and code sets may be different (see Figure 2).
What you actually need is a common set of data names and data definitions for all attributes describing each core business master data entity. In addition, to get a single view of a master data entity, you need to have global identifiers, a master shared business vocabulary (SBV), and know all the data mappings from disparate systems to common master data definitions (see Figure 3).
For those of you well practiced at data warehousing, I would assume that you still recognize this as something very familiar and something that you most probably already do in the course of building dimension data in a data warehousing system. However, even if you have integrated customer data and product data in your data warehouse, MDM goes much wider that this – it is an enterprise-wide issue. The real problem is still back in operational systems. Given that master data is often heavily fractured in operational systems, and also that it is typically maintained in more than one line of business operational system of entry (SOE), the problem of master data synchronization often resembles the proverbial “spaghetti ball.” We’ve all seen a spaghetti ball slide that took forever to figure out and draw, and that makes you realize what a mess you are in! In this maze of interfaces, you realize that data is flowing in all kinds of directions between systems, primarily just to keep it synchronized. It happens in batch, via application messaging and, of course, via the main culprit – Excel spreadsheets attached to e-mails! I have a feeling this latter point may have struck a chord with a few of you. Yes folks, Excel mania is still out there alive and well.
Even if we replaced all the batch interfaces with electronic messaging interfaces using popular messaging products (as many companies have tried over the years), it still does not fix the problem which is, of course, that there is no master data hub, no common master data store where a complete system of record (SOR) exists for each core business entity.
Therefore, many companies are now (if they haven’t done so already) looking to create master data stores by capturing, cleaning and integrating subsets of master data from disparate line of business operational systems (see Figure 4) using tools from an enterprise data management suite. (Note: An enterprise data management suite of tools includes familiar products such as data modelling tools, data quality assessment and cleansing tools, data integration tools including EII and ETL, metadata management, etc.)
Again MDM data hubs can be built or bought. Either way, companies still struggle with the problem of multiple systems of entry (SOEs). It would be a lot easier if multiple SOEs could share common services to access and maintain common master data (see Figure 5).
Even though Figure 5 seems an obvious thing to do, the problem is that this may require applications to be changed to invoke master data services if master data is consolidated into a master data store. While this could be done for custom-built applications, it is more difficult to achieve for packaged applications. This problem can be solved if integrated master data is maintained via a single portal-based system of entry and master data changes are propagated to other systems that need it (see Figure 6).
This works very well in a service-oriented architecture (SOA), and I have already discussed this in Figure 9 in my BeyeNETWORK article entitled Information Integration and Master Data Management.So assuming we understand all this and can see the reasons for needing an MDM solution, the question is: How can this be created? There are, of course, two main approaches – build or buy. Companies are likely to build when they already have a data store and/or dominant system that already holds a significant set of integrated master data that is shared by at least some of a company’s main operational systems. This may avoid the creation of yet another data store which is often an issue that several of my clients have raised. Alternatively, you could buy MDM technology in the marketplace to more rapidly deploy an MDM solution. There are, of course, different types of MDM systems that you can buy. These include:
• Registry-based MDM systems
• Hub-based MDM systems
• Enterprise MDM systems
The differences between these types of system are briefly outlined below.
Registry-based MDM systems use global identifiers and data federation to create a virtual master data system of record (SOR). Therefore, master data elements in disparate underlying systems are assembled on-demand in real-time to dynamically produce integrated master data records. These systems are remarkably similar to EII tools in many respects and, in some cases, partnerships with EII vendors exist to enable this virtual view of master data. They are also similar to data quality products that can provide on-demand matching services to match data to create single integrated versions of master data records. It is no surprise, therefore, to see data quality vendors such as DataFlux moving in on the MDM market to compete with registry-based MDM vendors. Registry-based MDM systems support several data sources and also often have a Web services interface and, therefore, offer on-demand master data integration services. On-demand master data can be requested by:
Registry-based MDM systems also use data quality services to clean and validate master data. Finally, operational systems still remain the systems of entry (SOEs) for master data in the registry approach. Examples of registry-based vendors include Initiate and Purisma.
Hub-based MDM systems (see Figure 7) propagate master data changes between disparate operational systems and can persist a complete integrated set of master data in a master data store as a system of record (SOR). This means that master data and master data changes can be integrated into a MDM hub. Each hub contains master data for a single master data entity. Some hub-based MDM systems also provide federated master data integration across the MDM hub and other systems to create a virtual SOR as per the registry approach. This is valuable if the hub is not a complete SOR. Once again, with a hub-based MDM system, the line of business operational systems still remain the systems of entry (SOEs).
Enterprise Master Data Management solutions are more comprehensive than MDM data hubs. These systems are top of the range in that they are both the system of entry (SOE) and the system of record (SOR) for master data. They offer master data integration and master data synchronization as well as a fully integrated complete set of master data in a master data store. The master data store supports multiple master data entities, and changes are logged and propagated to other operational applications and data warehouses. In addition enterprise MDM systems go beyond operational master data by integrating operational master data, related master BI and related unstructured content associated with an instance of a core entity (e.g., a customer, a product, etc.). All entities are defined in a shared business vocabulary (SBV) of common enterprise-wide data names and data definitions. In addition, data quality services are offered to validate and clean master data. Common Web services exist for access and maintenance of master data, and changes are tracked. Finally, a history of metadata changes is also maintained.
Vendors in the MDM hub and enterprise MDM markets include:
• FullTilt Perfect Product Suite
• Hyperion MDM
• I2 PIM
• IBM WebSphere Product Center and Customer Center
• Kalido 8M
• ObjectRiver MDM
• Oracle Customer and PIM data hubs and Sunopsis AIP
• SAP NetWeaver MDM
• Siperian Hub, Master Reference Manager, Hierarchy Manager and Activity Manager
• Stratature Enterprise Dimension Manager
• Teradata MDM
In the second part of this article I will discuss how MDM and BI systems can fit together and how the use of an MDM system will speed up business intelligence development.
Recent articles by Mike Ferguson