Originally published 5 August 2009
In Part 1 of this article, we looked into the background of data modelling and some of the history associated with it. We also touched upon the criticism modelling receives and the baggage it unfortunately attracts in far too many organisations.
We also examined the typical benefits that are expected when modelling is associated with DBMS development, and took a deep-dive into the role of modelling for DBMS development.
Finally, we posed the question:
“What needs to change to make modelling more relevant to today’s environments?”
In Part 2, we’ll be discussing:
I feel I must make a confession here. The technologies are not really all that new! It’s just that “traditionally” data modelling has not been seen as being relevant to these areas. To break out of this “modelling is a one trick pony” view, we need to show how and why data modelling IS relevant for today’s varied IT landscape. Thus we must show that it’s relevant for the “new” technologies such as:
ERP Packages
As data architects, when faced with projects that are embarking upon the introduction of a major ERP package, have you ever heard the cry:
"We don’t need a data model – the package has it all."
But, does it?
Is data part of your business requirement? Of course it is. So just how do you know whether the package meets your overall business data requirements? You did assess the data component when doing your fitness for purposes evaluation, didn’t you?
A data model will assist in both package configuration and fitness for purpose evaluation.
How can you assess that the ERP package has data structures, definitions and meanings that are compatible with your legacy systems?
Again a good data model will assist this.
What about data integration, legacy data take on and master data integration – how can these readily be accomplished?
You guessed it – a data model can help here too.
The critics say that modelling isn’t needed for ERP packages. But that’s because they are wedded to the old-world view that modelling is only used for DBMS development. It’s not. In this case, when implementing ERP systems, the model will NOT be required to generate a DBMS from; however, for all of the other aspects described, it IS invaluable.
So what’s’ the problem? Why can’t we just point our favourite data modelling tool at the underlying DBMS of the package? Simply put, for the most part the problem is that the database system catalog does not hold useful metadata. Several well known ERP systems do not hold any primary key (PK) or foreign key (FK) constraints in the database itself. It’s their application layer that holds all of this, and it’s their proprietary ERP DD which holds the “logical view” of the data.
Figure 1: Part of an ERP reverse-engineered directly from the DBMS
What we really need is to be able to get the ERP metadata into a useful format such as that shown in Figure 2.

Figure 2: Useful Model from an ERP
How can we do that? Well there isn’t space in this article to go into the detail, and much of it varies from ERP to ERP. However with, for example, SAP, there is a metadata extraction
facility independently available called SAPHIR. Additionally, you can also validate a model created from SAPHIR by examining key
screen items such as those shown in Figure 2.
Figure 3: Validating an ERP Model from Transaction Screens
So in summary, why do we need to bother undertaking data modelling when implementing an ERP system?
SOA and XML
I don’t intend to give a detailed exposition on the subject of SOA; however, it’s worth reminding ourselves of the fundamental components in the architecture.
The Bus in SOA is a “conceptual” construct, which helps to get us away from point to point thinking. An approach for integrating applications via a bus is using message-oriented middleware (MOM).
The Message Broker is a dispatcher of messages and comes in many varieties. The broker operates upon a queue of messages within the routing table.
Adapters are where the different technology worlds are translated, e.g., UNIX, Windows, OS/390 and so on.
Fundamentally, SOA is built upon a message based set of interactions, i.e., all interaction between components is through messages. These are generally XML messages, so it is true to say that XML is at the core of SOA.
But there is a potential problem.
XML is a hierarchical structure, but the real world of data is not.
Figure 4: Book Example
Let’s illustrate this with a real world example – a book. Looking at Figure 4, we see that this book is entitled Data Modeling for the Business. When we look at this real
example, we see data such as: Title, Author(s), ISBN, Price, Publisher, Amazon URL and so on.
Looking at the authors (myself, Steve and Donna), there is also some information (on the back cover) relating to each of us.
We can develop a model to represent this “real world” data and show it in an entity-relational format. Typically, these ER models can represent real world data pretty accurately.
Figure 5 shows an example ER model for the book authoring data subject area. A book can be authored by many writers (in this case, me, Steve and Donna). However, a writer can author many books (e.g., Steve has also written Data Modeling Made Simple). So, as we see in Figure 5 we’ve added an intersection entity (Book Authorship) to resolve the many-to-many relationship.
Figure 5: Book Example ER Model
Now, when we want to use data in this model within an XML message, we have to turn the model into a hierarchic XML representation. Thus, we need to decide whether to make Book the parent of Book Authorship or to choose Writer.
In Figure 6, the resultant XML model has been created after choosing Book as the parent.
Figure 6: Book XML Model
Whilst simplistic (for the sake of the example), the XML model in figure 6 now represents the XML schema we’re going to use. Within our SOA-based system, we may have a transaction which utilises an XML message called “Book Details.” Figure 7 shows how the XML message has been created from the XML schema and is utilised (in the message queue) in our SOA solution.
Figure 7: Book Details XML Message
So clearly, data modelling IS a key component required in an SOA implementation.
Business Intelligence
When looking at business intelligence and data warehouses, we are trying to ensure that the data utilised by the business for their queries and reports is reliable.
In order to accomplish this, not only do we need to manage the data that the business utilises, but also the metadata. We all know by now that much of this metadata is contained within the data models.
So, what are the main reasons for managing this model metadata?
The top 5 benefits from managing this model metadata for reporting are:
#5 Data Structure Quality. Models ensure that the business design of a data architecture is appropriately mapped to the logical design, providing comprehensive documentation on both sides.
#4 Data Consistency. By having standardized nomenclature for all data – including domains, sizing, and documentation formats – the risk of data redundancy or misalignment is greatly reduced.
#3 Data Advocacy. Models help to emphasize the critical nature of data within the organization, indicating direction of data strategy and tying data architecture to overall enterprise architecture plans, and ultimately to the business’s objectives.
#2 Data Reuse. Models, and encapsulation of the metadata underpinning data structures, ensure that data is easily identified and is leveraged correctly in the first place, speeding incremental tasks through reuse and minimizing the accidental building of redundant structures to manage the same content.
#1 Data Knowledge. Models, combined with an efficient modelling practice, enable the effective communication of metadata throughout an organization, and ensure all stakeholders are in agreement on the most fundamental requirement: the data.
ER Models vs. Dimensional Models for Reporting
Much has been written previously about the appropriateness of ER versus dimensional models for business intelligence and data warehousing.
To dispel any myths, it’s worth looking at the key features of each type of model:
Features of an ER model
Features of a Dimensional Model
So, where should I start if I want to develop a model that is suitable for reporting?
Don’t forget data lineage – it’s applicable to many aspects, and now with regulatory compliance requirements in many sectors, this is a statutory need.
In BI and DW applications, mappings and transformations determine how each field in the dimensional model is derived. The derivations could actually drive the ETL process. In lineage, like business intelligence, the metadata is vital!
What is the problem?
Fundamentally, we need to be able to help business users to answer questions or concerns raised such as:
That figure doesn’t look right! Where does it come from?
How can we prove to the auditor that financial data has been handled correctly?
Not only do we need to help our primary customers (the business folks), but also we need to be able to help IT staff to answer questions such as:
I need to integrate the data supplied from your system with the data in my system. How can I understand where your data has come from and what it means?
And finally, we need to be able to help systems to answer questions such as:
When a piece of source data is updated, which items in the data warehouse will need to be recalculated?
So why does data lineage matter?
We aim to have an increased understanding of where data comes from and how it is used, which will lead to increased confidence in the accuracy of data.
The knowledge of how data is transformed is itself valuable intellectual property that should be retained within a business, and, very importantly, it is absolutely necessary for compliance with the Basel II Accord and Sarbanes-Oxley Act (SOX): SOX requires that lineage and transformation of financial data is recorded as it flows through business systems.
Two Key Aspects of Data Lineage
Transformations:
What has been done to the data?
Business Processes:
Which business processes can be applied to the data?
What type of actions do those processes perform (Create, Read, Update, Delete)?
Audit Trail – who has supplied, accessed, updated, approved and deleted the data and when?
Which processes have acted on the data?
So where do I need data lineage?
You need data lineage for the design of ETL processes, the creation of dimensional models, the transforming of data to XML (typically from ER) and for workflow design.
As I mentioned in part 1 of this article, we constantly need to demonstrate the benefits accruing from data modelling. Nobody owes us a living, and no matter how important WE believe the place of modelling to be, it is our responsibility to demonstrate (and sell) the benefits within our organisations.
So just how can you gain traction, budget and executive buy-in? Here are a few tips:
Remember, soft skills are becoming critically important for information professionals, and whilst you might not like it, the hard facts are that part of YOUR job nowadays IS marketing.
As information professionals, we need to break away from the “you must read my detailed data model” mentality and make the appropriate model information available in a format users can readily understand. This, for example, means that data architects need to recognize the different motivations of their users and re-purpose the information they present to be suitable for the audience: Don’t show a business user a data model!
Information should be updated instantaneously, and we must make it easy for users to give feedback; after all, you’ll achieve common definitions quicker that way.
We need to recognize the real world commercial climate that we’re working in and break away from arcane academic arguments about notations methodologies and the like. If we want to have data modelling play a real part in our business, then it’s up to us to demonstrate and communicate the benefits that are realized. Remember, data modelling isn’t a belief system, just because you “get it,” don’t assume that the next person does.
So what can we do?
1. Provide information to users in their “Language”
2. Document Metadata
3. Provide the Right Amount of Information
4. Market, Market, Market!
5. Be aware of the differences in behavior and motivations of different types of users.
For example, a DBA is typically:
- Cautious
- Analytical
- Structured
- Not a talker
- “Just let me code!”
However, a Data Architect is:
- Analytical
- Structured
- Passionate
- “Big Picture” focused
- Likes to Talk
- “Let me tell you about my data model!”
And a Business Executive is:
- Results-oriented
- “Big Picture” focused
- Has little time
- “How is this going to help me?”
- “I don’t care about your data model.”
As information professionals, we’ve got to get these softer skills baked into ourselves and our colleagues. Some of the key things as a profession we can do are to:
So having highlighted the areas that need to change in order to make modelling more relevant to our business colleagues, and the information environments of today, are there any things that should stay the same?
Yes indeed.
We must keep the disciplines and best practices that have existed in the modelling community for many years. These can be categorised into 3 major areas as follows:
So, in the course of these two articles, we have examined many aspects of data modelling, starting with its history, it use in DBMS development and firmly refuting the criticism that it’s only appropriate for DBMS development.
But as data professionals, it’s up to us to make the biggest change necessary to make it appropriate to the new technologies and business environments of today. We need to grasp the nettle and engage effectively within our businesses.
Go to it ………
Recent articles by Chris Bradley
Comments
Want to post a comment? Login or become a member today!
Be the first to comment!