Rethinking Definition

Originally published 6 May 2009

Everyone subscribes to the idea that we should produce adequate definitions for entities, attributes, tables, and columns (and relationships too). They make our data models and physical databases understandable. However, the data management profession seems to treat definitions rather reflexively as something out there simply waiting to be gathered. But is this really the case?

Easy Definitions, Difficult Definitions

Some definitions are easy. A dodecahedron can be defined as a 12-sided regular solid. This definition captures the essence of a dodecahedron so well that it is possible to use it as an input to geometrical theorems from which additional properties of the dodecahedron can be deduced. On the other hand, trying to figure out a definition of justice is something that has occupied the greatest intellects for centuries, and we do not have one yet. You can try to get around this problem by, for instance, defining justice as that which is produced in the law courts, but everyone would feel that such a definition does not even begin to capture the essence of justice. There are many debates on what justice is, and many books have been written on the subject (and surely many more will be).

Now, I have never had to put dodecahedrons or justice into any data models I have ever worked on, but these examples clearly show that some definitions are easier to "gather" than others. The implication is that the common types of data we come across in data models are likely to show a similar variation in terms of degree of difficulty of definition. Yet, in data modeling there is no recognition of this. There is simply an expectation that we go out and find the definitions of our data, write them down, and move on.

What is Definition?

The word "definition" was originally used to mean to fix the limits of a plot of land. In this way, people could understand where a plot of land began and where it ended. The word has been extended to cover concepts and arguments, but as data modelers, it is concepts which really concern us. What do we really expect from a definition? It seems that there is an unspoken expectation that we can define things absolutely.

The only place in human experience where we can get concepts that can be defined absolutely is in what are called the exact sciences – mathematics and geometry (if you even distinguish geometry from math). Here, definitions, like the definition of a dodecahedron, capture the essence of the concept perfectly. Such definitions are complete, and they fully distinguish the concept from all other concepts. In fact, we do not even have to list out the other properties of a dodecahedron because we can actually derive them starting from the definition.

It seems that data modeling assumes that this perfect kind of definition applies to every other kind of concept that we have to deal with. We are not alone in this. Socrates got hold of this idea too and thought he could get precise definitions out of his Athenian neighbors. He would stop them in the street and demand that they give him definitions of things like truth and beauty as if they were geometrical figures. In the end, he annoyed the Athenians so much that they forced him to commit suicide.

There has to be a little bit of Socrates in all data modelers, since we go to users asking them for definitions of things like customer, gross revenues, customer type, and account number. And if we do not get precise answers, it is never the fault of the data modeler, who is merely there to gather the definitions.

Difficult Definitions

So if we accept that some concepts are more difficult to define than others, where does that leave us? I think there is a spectrum of difficulty in definition. In my experience, attributes that are calculated or derived can be defined by computations or derivations. These are not too difficult to arrive at, and are very precise definitions in their way. They should be on the "easy" end of the spectrum. For instance, the calculation of state sales tax in an order can be defined in terms of a calculation – a business rule. The same is true of credit card finance charges, and gross revenues. However, there is still a subtle difference in explaining how something can be calculated and describing exactly what it means. Stating a computation does not tell me what a credit card finance charge means – it only tells me how to calculate the thing. A user who sees credit card finance charges on a report may need a definition of the attribute in order to be able to use the report. So, even calculations and derivations are still not that easy to define.

Further along the spectrum of difficulty are pure concepts that categorize things, such as financial instrument type. This is usually an entity that is implemented as a code table with records for things such as equity or bond. It is possible to find good definitions for equity and bond, but it is almost impossible to find a good definition for the entity Financial Instrument Type. It usually ends up as something like type of financial instrument, or a list of equity, bond, and so on. This sort of thing is maddening to data modelers who crave a really good definition. Unfortunately, it is possible that Financial Instrument Type may be a collection of things rather than an entity. If this is the case, then a good definition may be impossible. It would be like trying to define the ten things I would take out of my house if it caught fire – the only thing they have in common is that I would take them out of my house if it was burning.

Improving Definitions

Given all of this, it is reasonable to accept that good definitions are not simply out there waiting to be harvested by data modelers. The kinds of things we put into data models are usually rather difficult to define, and we may reasonably suspect that some of them will be nearly impossible to define. What can we do? I would suggest that one thing we can do is to treat definitions as things that can always be improved. The idea that we have a perfect definition at the point in time when we put an entity or an attribute into a data model is not practical. As time goes on, we learn more about the entities and attributes. Why not update the definitions as this happens? A definition can also be sharpened as we learn what something is not. Admittedly, the entire definition cannot consist of things the entity or attribute is not, but there is no reason to exclude these propositions from a definition either. Making definitions living things in this way is probably going to add a lot more value to them. Permitting any individual in the enterprise to add to a definition is also a good idea. Such an approach requires strong governance, but it is more open than only allowing data modelers to enter definitions into data modeling tools. This, of course, introduces the need to have an appropriate infrastructure for definitions, which is a different topic.

Acknowledging that definitions have to be captured rather than gathered, and that they can be continuously improved, will make them more valuable to the enterprise over time. 

SOURCE: Rethinking Definition

  • Malcolm ChisholmMalcolm Chisholm

    Malcolm Chisholm, Ph.D., has more than 25 years of experience in enterprise information management and data management and has worked in a wide range of sectors. He specializes in setting up and developing enterprise information management units, master data management, and business rules. His experience includes the financial, manufacturing, government, and pharmaceutical industries. He is the author of the books: How to Build a Business Rules Engine; Managing Reference Data in Enterprise Databases; and Definition in Information Management. Malcolm writes numerous articles and is a frequent presenter at industry events. He runs the websites http://www.refdataportal.com; http://www.bizrulesengine.com; and
    http://www.data-definition.com. Malcolm is the winner of the 2011 DAMA International Professional Achievement Award.

    He can be contacted at mchisholm@refdataportal.com.
    Twitter: MDChisholm
    LinkedIn: Malcolm Chisholm

    Editor's Note: More articles, resources, news and events are available in Malcolm's BeyeNETWORK Expert Channel. Be sure to visit today!

Recent articles by Malcolm Chisholm

 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!