Blog: Barry Devlin Subscribe to this blog's RSS feed!

Barry Devlin

Hello and welcome to my blog!

About the author >

Dr. Barry Devlin is among the foremost authorities in the world on business insight and data warehousing. He was responsible for the definition of IBM's data warehouse architecture in the mid '80s and authored the first paper on the topic in the IBM Systems Journal in 1988. He is a widely respected consultant and lecturer on this and related topics, and author of the comprehensive book, Data Warehouse: From Architecture to Implementation published by Addison-Wesley in 1997.

Over the past few years, Barry has extended his interest to cover the wider field of a fully integrated business, covering informational, operational and collaborative environments and, in particular, how to present the end user with an holistic experience of the business through IT.

Barry has worked in the IT industry for more than 25 years, mainly as a Distinguished Engineer for IBM in Dublin, Ireland. He is now founder and principal of 9sight Consulting, specializing in the human, organizational and IT implications and design of deep business insight solutions.

I was intrigued by a webinar entitled "Extending Data Warehouse Architecture: Deploying a Data Warehouse without Databases" by Bill Inmon and Compact Solutions, so gave up an hour of my evening the other night to listen in.  I have to say that I didn't learn much about extending the Data Warehouse architecture.  Nor was I even mildly convinced that the solution offered was either a Data Warehouse or lacking a database.

The solution was described as being based on Unix Compressed Files that are partitioned and indexed to support querying along commonly used dimensions.  Now, what is not "database" about that?  From what I could gather, the data can only be queried from Compact's own proprietary user interface, so appears not to support SQL.  Updating seems to take place only through ETL tools such as Ab Initio, so I guess it's not ACID (Atomicity, Consistency, Isolation, Durability) compliant.  So certainly, it's not a full-function database and thus cheaper to implement and maybe faster running, but claiming it's not a database at all seems like marketing-speak.

More important - is the resulting solution a data warehouse?  Well, it was claimed that no modeling was needed to to set it up (another low cost implementation selling point).  So, how does data integration and cleansing get defined?  It sounded like the partitioning and indexing was done with some specific types of access in mind.  So, maybe a cheap and large data mart at best, but not a data warehouse.  And if you want to use any of your standard BI tools, you have to export the data into a (real) relational database or cube!

And finally, in response to a question on where to position this in his own Data Warehouse 2.0 architecture, Bill replied ... ummm, it doesn't really fit anywhere ... it's a special category on its own.

Personally, I don't think I buy it as a Data Warehouse or a non-database...

Posted April 21, 2009 3:27 AM
Permalink | No Comments |
I was a speaker at the TDWI World Conference in Las Vegas last week, and if there was one phrase that kept occurring to me, especially in the vendor exhibition, it was disruptive technological change.  Those of you who've read "The Innovator's Dilemma" by Clayton Christensen will know what I mean by that phrase. For those who haven't, I'd say it's a must-read for anyone involved in a technology-driven industry.

By Christensen's definition, disruptive change occurs when a new technology has some feature that is not applicable in an existing market and performance characteristics worse than existing technology in that market, but capable of growing to meet that market's needs in time.  What happens is that the new technology debuts in another, often related, market and then moves back into the original market, often displacing the existing suppliers there.  Christensen's key example relates to the development of the disk drive market form the '70s to the '90s and the failure of many of the incumbent 14-, 8- and 5.25-inch drive manufacturers over that period.

What struck me at TDWI was the explosion in novel and even radical approaches to the database and storage side of data warehousing that were on view.  While most of the technologies are not new, the combinations and price-points are certainly innovative and maybe disruptive.  For many years, the DW database market has been very quiet, but the last couple of years has seen an explosion in new entrants.  What the newcomers have in common, from the more established ones like Netezza to the more recent entrants like ParAccel, is a focus on query performance and large data volumes in specific analytical applications that might traditionally be called data marts.

As these vendors' technologies and techniques are proven in largely stand-alone environments, they are beginning to raise questions in the traditional enterprise daat warehouse arena.  We've already seens the incumbents (Teradata, IBM, Oracle and Microsoft) introduce appliance-like solutions.  But the real question I see relates to the underlying architecture of the data warehouse itself.  After more than 20 years, are we about to see a fundamental change in the way we design business intelligence environments?

I'll be exploring this question over the coming months, but I'd love to hear your views at this stage!  

Posted March 3, 2009 7:10 AM
Permalink | No Comments |

Back in July, I resolved to blog regularly, and I did manage to do so for more than two months, but then I got distracted, lazy or just busy (take your choice!), so this is my first blog since the end of September. It's a bit too early for new year resolutions, so I'm just going to take it one blog at a time.

One of the things I got busy on was writing a white paper sponsored by Lyzasoft which they published recently. Speaking to the folks at Lyzasoft, and also to participants in conferences at which I was presenting over the same time period, I found myself looking again at the role and positioning of Business Intelligence tools vs. the people who use them. And I've deliberately phrased that as "versus" - because it's well accepted that many "analysts" who should use BI tools as part of their toolkit end up either fighting the prescribed tool or abandoning it altogether.

Business Intelligence these days is a term that covers a multitude of sins, from executives defining and executing business strategies to automated processes identifying potential fraudulent transactions without any human intervention and everything in between. Depending on the particular business need, software vendor, consultant or industry analyst involved, the focus in different implementations can vary dramatically. All remain BI, but each requires very different thinking and architectural approaches.

One set of users who often sit uncomfortably on the boundary of BI are often called business analysts within their organizations. They tend to use and combine data in new or unusual ways in order to gain new insights into what is going on in the business. They often obtain the data they need from beyond the data warehouse, because the warehouse doesn't hold the data they need or has cleansed it in a way they don't agree with or they simply don't know it's there. They often manipulate and combine data in different ways as an iterative part of their analyses. And many times their tool of choice is the spreadsheet.

It's pretty clear that these business analysts should be supported by the BI community. Within their own organizations, the data they require should come increasingly through the data warehouse in order to ensure data integrity and consistency. The analyses they perform and the outputs of their work also need to be maintained and tracked for compliance and regulatory reasons. Overall, setting these people off to find, manipulate and analyze data in a haphazard way simply doesn't make sense.

Fitting the needs of business analysts into the data warehouse architecture is, however, possible. I've coined the term "playmart" to represent the type of environment these users need. The aim is to balance agility (playing) with control (in a data mart). I've defined eight key characteristics of a playmart in the white paper (see also below) and would be very interested to hear your views on them.


Posted December 19, 2008 4:04 PM
Permalink | No Comments |

As the worldwide banking crisis continues to escalate, one has to wonder-where was the Business Intelligence in all of this? What happened to Data Quality and Data Management?

First, we had the interesting revelation that the individual banks and lending institutions all seemed to be blissfully unaware of the extent to which they were exposed by lending in the sub-prime mortgage market. It's difficult to imagine how the information available to decision makers in these companies could have been so scarce or so uninformative. Most, if not all, financial institutions have had extensive and expensive data warehouses in place for many years now. Business Intelligence should easily have warned of the dangers. Was the increasing level of risk unmeasured, overlooked or simply ignored?

More recently, we've had the spectacle of banks being unwilling to extend short-term lending facilities to one another for fear that the borrowing institutions could go belly-up in the next few days! Could the lenders not know? Unfortunately, in this case, the answer is probably that they couldn't. Despite the fact that the worldwide financial market is tightly and instantly interconnected at a transaction level, the truth is that the underlying data remains disconnected and dispersed. Data Management and Data Quality have simply not been considered. Proper business governance in the financial markets as a whole is impossible without a well-defined and credible data foundation.

So, assuming that we can survive the crisis without a meltdown, what has been happening should be a clarion call to Data Management professionals in the financial industry particularly but also beyond. We need to recognize the interconnected and increasingly fragile web of data dependencies that hold the business world together. It's time to get out there and apply the principles we know and preach already. And we had better get moving quickly.


Posted September 28, 2008 7:52 PM
Permalink | No Comments |

Enterprise BI shops and data quality departments regard spreadsheets largely as the work of the devil. Against all the rules of information quality, data in spreadsheets is manipulated by users at will and in private. Then the resulting data and function is distributed, shared and further played around with, until it's anybody's guess whether the results presented at the end bear any relationship to the truth. Data that was pure and clean as it came out of the data warehouse, data mart or approved BI report is now potentially as contaminated as nuclear waste.

And yet, check in with the users. Indeed, check in with yourself. Why is Excel so popular? Because it makes it easy to play with the data, check out hypotheses, get answers otherwise unavailable, and so on. And once you've gotten the answer through the spreadsheet, chances are you won't get the time or the resources to recreate the process in a more auditable, quality-conscious way. It's a real and spreading problem. But, what to do?

This week I had the opportunity to preview a new product called Lyza that's due to launch on Sept. 22. In fact, you can download it and play with it already. Scott Davis, the CEO of Lyzasoft Inc. explained that they had spent a lot of time investigating how business analysts, the power users of spreadsheets, actually work. This is usually a good idea, because you find out what the users really need, and which of your assumptions are right or wrong. It will probably come as no surprise that most analysts approach their work in a highly unstructured and iterative way, pulling bits of relevant data into Excel from a variety of known sources - both official data marts and reports as well as unofficial files, spreadsheets, etc. they happen to have created before or borrowed from trusted colleagues. And they do it in Excel, because that's the only way they can.

What Lyza does is to provide an easier, more intuitive way of pulling data together from diverse sources, combining and manipulating it and creating results and reports for distribution to the business. Well, that's all fine and dandy for the business analysts you may say, but how does it help the BI and data quality departments address the data contagion? The answer is that Lyza tracks and saves an audit trail of every action and every step of the analysis process that the user is building as well as enabling snapshots of the results to be cached and preserved for posterity. Now the data quality folks are beginning to smile. And the BI department? Well, they're less sure: they like the added traceability, but this is still outside their comfortable data mart zone.

However, we could look at it in a different way. We could imagine that Lyza provides a new type of data mart - a "playmart" - a sand box where power analysts can experiment with data and perform all sorts of analyses in a safe, well-managed environment. Now, if only we could evaluate the analysts' logic and productionalize those analyses and reports that are going to be reused and built upon in the future.

Scott's initial answer was that you can certainly do all this within Lyza itself. But a bit of further probing convinced me that the metadata that Lyza stores to describe the analysis processes is probably sufficient to enable the creation of ETL scripts for your ETL engine of choice. This would certainly require further investigation and automation, but it seems like the bones of the idea are there. In this case, the playmart could address a set of business analysts' needs that have been long ignored by the BI departments and by BI vendors as well.

The only real fly in the ointment is whether Lyza will be able to convince the spreadsheet jockeys to get off their current Excel rocking horse and jump on the bright new Lyza pony in the playmart! (And that sentence would work so much better if only Lyza had chosen a mustang for their logo rather than a gecko.)


Posted September 19, 2008 4:54 PM
Permalink | No Comments |

In my last post, I shared some thoughts inspired by the Decision Intelligence article written by Claudia Imhoff and Colin White. There, I suggested we need to really begin to consider all information as a single resource for the whole business. This entails stepping beyond our traditional IT-bounded view of our systems and looking at them with a renewed business vision. If we do this, it will also quickly becomes clear that our view of process needs reworking too.

Claudia and Colin have drawn a box on the left of their architecture picture that arises directly from the insight that operational BI really is a different beast from the traditional BI we've all known and loved over the past 20 years or so. When you deeply consider the implications of building an operational BI system, as Claudia and Colin clearly have, it becomes obvious that operational BI has many of the characteristics of traditional operational or transaction-processing systems. Therefore, from a systems architecture point of view, you put them in the same box, in this case called "Business process intelligence".

There are also some differences, of course. The most important is how the business users interact with these two related types of system. The value proposition of operational BI is that human decision-making skills can improve operational processes. How? Well, there are two very distinct threads here.

One is the proposition that we can apply advanced analytics technology automatically to parts of the operational process. Fraud detection is a good example. Applying advanced analytics on the fly to credit card transactions gives better detection of fraudulent transactions. Note that this type of operational BI is almost completely invisible to the business users: they see the results of more fraud detected or less false positives, but how that happened is both unknown and uninteresting.

The second thread brings users very directly into the loop. Here, the operational BI technology is made part of the users' visible process. Business users are presented with decision support technology that displays trends or exceptions in near real-time data, so that they can potentially choose a different course of action to that embedded in the normal flow. In effect, business users get to change the business process on the fly, rather than doing little more than data input as was previously the case.

Now, keeping this in mind, here's the million dollar question. What's the difference between an operational system and an informational system; how do you distinguish between an operational process and an informational process? In the good old days, it was easy! The operational side was nearly or actually real-time, dealt with individual transactions or data elements according to a predefined process where the users had minimal freedom to act intelligently. Informational systems, in contrast, were centered around users who were expected to make intelligent decisions based on historical data without any clear process to turn those decisions into action.

So, what is the answer today? When we in BI start building operational BI and the operational world starts implementing adaptive SOA-based systems, the distinction between operational and informational more-or-less disappears. This puts operational BI and operational systems together in one box of the architecture. But the deeper and probably longer-term implications of this bold step have not been explicitly called out. In fact, these implications are obscured by the naming of the new architecture as "Decision intelligence", because the top level of this architecture is no longer confined to the world that was formerly BI; it actually becomes the single, common process or interface through which all business users will interact with the underlying IT systems.

Is that scary? Absolutely! But it is a clear and logical consequence of the paths that BI and operational systems are currently on. It means that we in BI are no longer in total control of our destiny. But the same is true of the operational systems. And, although I've not covered it here, collaborative systems (e-mail, office support, etc) are also being drawn inexorably into the same converged path.

It's time we all started to talk to one another! And that does imply that decision intelligence may be too narrow a term for us all to agree on. May I propose again the "Highly Evolved Business"?


Posted September 14, 2008 12:26 PM
Permalink | No Comments |

Claudia Imhoff and Colin White have a lengthy history of insightful and provocative contributions to the development of Business Intelligence. Their recent article, Decision Intelligence, is no exception. Their thesis is that the IT support needed for decision-making, now known as "Business Intelligence", today extends far beyond the traditional domain of data warehousing and is in need of a new architecture and a new name - Decision Intelligence.

I fully agree. I've been using the terms "Highly Evolved Business" and "Business Insight" over the past year or so to express exactly the same thought. Indeed, Claudia, Colin and I have discussed this whole idea already at length and are very much on the same page. But I hadn't seen their architecture picture before, and it gives me the opportunity to discuss the whole topic from a higher perspective in this and the next post.

Under Decision intelligence, the architecture shows three vertical blocks called "Business process intelligence", "Business data intelligence" and "Business content intelligence". The meanings of these blocks are fairly obvious, but take a look at the linked article for a full explanation. My thought is that they are almost too obvious: they closely reflect our current arrangement of systems building blocks in the IT world.

Let's first examine the data and content blocks. Today, if you look at typical enterprise implementations, you will certainly see databases and separate content stores. You'll also notice independent systems built upon these separate stores. But, if you step back from the storage and processing issues, it's pretty difficult to distinguish between the two categories. Try explaining the difference to a business user!

Take an example of a clinician who's trying to make a treatment decision. She's looking at a chest x-ray - content in our terms. And she's also looking at the "structured data" that goes with it: this x-ray is of a 45 year old male, smoker of 20 cigarettes a day for the past 30 years who has been admitted with shortness of breath. Does she see unstructured content and structured data that must somehow be combined in her decision making? I'd argue not. She simply sees a set of information she's using.

And some of the old barriers between the storage of structured data and unstructured content are breaking down. Where is the EXIF data (structured metadata) of a photo stored? Yes, in the JPG file along with the unstructured content. Where do e-mail systems store the structured metadata about sender, subject, date sent, etc? Sure, in the database with the unstructured e-mail body content.

I could make a similar argument about the lack of distinction between real-time data (or operational) data and historical (data warehouse) data.

My point is that if we want to create a new vision for the future, we need to start seeing the world through non-IT eyes. It's all information. It's a single concept; a single category of "stuff". And we in IT need to start creating the tools and methods that allow us to create, manage and make available all information in a coherent and consistent way. At a conceptual level, that has to be the goal and that should be our first pictorial representation.

Keep that thought in mind. I'll come back to next time when I look at the process side of the picture.


Posted September 1, 2008 10:51 AM
Permalink | No Comments |

I was browsing through the blogs on B-eye-network.com this morning (Sunday - yeah, sad, I know) and came across two recent entries that spoiled my coffee. Given that I'm no fan of instant gratification (in IT anyway), I'm not going to give you links, so you have to work at finding them yourself. But the phrases that caught my eye were "Instant SOA", "Data marts in about an Hour" and "full EDW's with AS-IS star schemas in 2 weeks".

Now I'm as fond of a shortcut as the next guy, but I've learned the the word "Instant" is not all goodness. When I've bought some instant Spaghetti Bolognese in the local supermarket I've found that the cost is a lot higher than the individual ingredients and the taste, well, leaves a lot to be desired. Sure, I saved some time when I got home, but did I get value for money? And did I end up with what I really wanted? So, why should I expect more from an Instant DW?

"Caveat emptor" as the Romans used to say. Here are a few contra-indications for when instant gratification should not be expected in your next BI (or SOA) project:

  1. The business users are not quite sure what they want.
    Most BI projects start with a vague set of requirements from the potential users. It's going to take some time to hone these down to a usable definition of data and query needs. In the meantime, maybe it's best to let the users continue to play with their instant Excel spreadsheets and look over their shoulders to see what they're doing.
  2. Somebody forgot to document the meanings of the data in the source applications.
    This is the oldest metadata problem. If your data sources have not been properly described, an Instant DW is likely to be instantly dismissed as misleading and inaccurate. Do you want to go there?
  3. Garbage in, garbage out. Or worse...
    If your ingredients (data sources) are contaminated with erroneous data, you're going to end up with a very sick business on your hands if you just take the Instant DW approach. Understanding and fixing dirty data is time-consuming, but mandatory.

It's all about quality time... or quality vs. time. If I bring home my instant Spaghetti Bolognese, I may get it on the table within a few minutes. But, if the kids won't eat it or, worse, throw up that night, I'd argue I've made the wrong trade-off between time and quality. You need to consider the same balance in a BI or SOA project.

Now, I'm off to spend some quality time with my kids :-)


Posted August 24, 2008 12:20 PM
Permalink | No Comments |

I was at the Business Object Summit this week in Boston, where the main emphasis was on linking strategy to execution and a seeming focus on the larger enterprises. All very SAP-inspired, I thought. And very insightful, especially if you're a large enterprise. There have been some comments in the blogs already on these topics. But it was a small conversation over lunch that caught my interest...

Information OnDemand. No, not the annual IBM Conference in Las Vegas, in October. But a rather low key effort from Business Objects with a website to allow companies to access market data and incorporate it into their BI efforts.

There's a definite growing interest these days in combining external data with the contents of the warehouse. But it does raise some concerns, not least about the reliability of the external data and how to create a valid semantic relationship between the two data sets. In the past, companies have addressed these concerns by obtaining key market and other external data from trusted sources like Dunn and Bradstreet, Reuters and others and then ensuring that such data entered the warehouse via a controlled feed designed by Information Architects who could match the two data sets correctly. After all, such external data is another information source for the warehouse and should be managed like any other.

This method works well for large enterprises with a centrally-controlled approach to the warehouse. And where the value-add derived from or risks incurred by using this data are significant, this method is probably still required. But what if you are a small or medium enterprise? Or what if you really only want to do a couple of once off analyses?

Shopping at the Information OnDemand website appears to be the answer! Here you can buy prebuilt, but customizable, reports combining your data with external market and financial data. You can buy one-time snapshots or subscribe for regular updates.

For larger companies, this could provide a safe and cost-effective way of dipping their toe in the big ocean of external data. For smaller companies, it could be all they need. Sounds like a useful idea to me!

The service has been available since September 2007, but I hadn't come across it before. Maybe there are some similar services I should know about, so please feel free to comment.


Posted August 14, 2008 5:14 PM
Permalink | No Comments |

I came across an ad today for a Google Webcast on Universal Search for Business. It contained the phrase "As the volume of information inside enterprises explodes, most executives recognize the importance of a Google-like search solution for business content.", which set me wondering...

A Google-like search solution for business content? What exactly does that mean?

The phrase "Google-like search", of course, covers a multitude of marketing-speak, but let's assume that it includes the patented PageRank technology behind Google's Internet search success. Google itself describes PageRank as follows: "PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value." (http://en.wikipedia.org/w/index.php?title=PageRank&oldid=230400158 as of Aug. 7, 2008). A number of questions arise for me: Does an enterprise intranet usually have a vast link structure? Would business executives really consider the "democratic vote" of the organization as a valid indicator of a document's importance? Indeed, how democratic is the link structure in an intranet?

Google, Wikipedia and many Web 2.0 systems have an underlying belief in James Surowiecki's concept of "the wisdom of crowds". Data warehousing, Business Intelligence and, indeed, all traditional IT development tend to put more faith in experts and their accumulated knowledge. In the BI world, I'm beginning to see some level of acceptance that the so-called experts do not have a monopoly on business knowledge. We see that there is a growing need to allow and, indeed, facilitate the feedback of knowledge that emerges on the fringes of the BI community (the front-line staff and first-line managers) back into the core of the warehouse for wider promulgation and reuse.

But, to what extent does Google and the Web 2.0 community recognize that some knowledge is inherently more useful or valuable (although not necessarily "right") simply based on the authority of its source? And within the tighter and more closed confines of an enterprise, that not all the requirements for wise crowds are met? If not, we may see the many years of careful effort by data modelers and administrators, and information stewards overturned in the rush to Web 2.0. This would not be in anybody's interest.

On the other hand, if I've made the wrong assumption about what "Google-like search" means... Anybody care to comment? Or maybe I'll find time to sign up for the webinar!


Posted August 7, 2008 8:17 PM
Permalink | No Comments |