Blog: Barry Devlin Subscribe to this blog's RSS feed!

Barry Devlin

Hello and welcome to my blog!

About the author >

Dr. Barry Devlin is among the foremost authorities in the world on business insight and data warehousing. He was responsible for the definition of IBM's data warehouse architecture in the mid '80s and authored the first paper on the topic in the IBM Systems Journal in 1988. He is a widely respected consultant and lecturer on this and related topics, and author of the comprehensive book Data Warehouse: From Architecture to Implementation published by Addison-Wesley in 1997.

Over the past few years, Barry has extended his interest to cover the wider field of a fully integrated business, covering informational, operational and collaborative environments and, in particular, how to present the end user with an holistic experience of the business through IT.

Barry has worked in the IT industry for more than 25 years, mainly as a Distinguished Engineer for IBM in Dublin, Ireland. He is now founder and principal of 9sight Consulting, specializing in the human, organizational and IT implications and design of deep business insight solutions.

Columnar databases, especially those with an MPP approach, have been notching up impressive query performance figures, showing gains of 100X and more on the traditional players.  Such figures make great press releases, but they do place the emphasis on using these databases as data marts rather than the enterprise data warehouse (EDW) component of the data warehouse architecture.  Focusing on data marts and very specific business intelligence applications makes a lot of sense for new market entrants and smaller players in the DW space, allowing quick wins and easily understood sales messages.

But, I have been convinced for some time now of the much greater potential such performance unleashes in the broader and more complex EDW environment.  And the vendors have been fairly quiet about this part of the market so far, maybe preferring to leave such more technically and politically complex projects to the big guys.  So, it was good to see Vertica's 4.0 announcement last week beginning to address the EDW market with its emphasis on "enterprise ready" and a number of interesting new features and expansions of old functions.

Robust workload and resource management for mixed workloads is a prerequisite for an EDW.  Vertica's introduction of administrator-defined resource pools with memory-usage, priority and concurrency settings and the assignment of users to these pools is a big step in this direction.  A rework of the optimizer in support of this and other features suggests that Vertica are serious about this support.

Also introduced in V4.0 is a newly optimized single record lookup on primary keys.  While aimed at a particular financial analysis use case, this function shows that the database can do more than just crunch columns.  Added to the FlexStore feature introduced in V3.5 where newly loaded data is kept in row format in memory for some period of time, I believe we're seeing the database's growing ability to handle the sort of record-level processing often needed in EDWs.  The new time-series support in V4.0 also plays directly in EDW needs.

Time and customer experience will, of course, prove if I'm correct, but it seems to me that Vertica is beginning to test my assertion that columnar, MPP databases can be applied to EDWs.  And further that their performance characteristics offer the possibility of re-architecting the EDW / data mart divide.

Posted March 3, 2010 11:17 AM
Permalink | No Comments |
Having worked with CEO, Scott Davis of Lyzasoft and produced a white paper on Collaborative Analytics in the first half of 2009, it came as no surprise to me that version 2.0 of Lyza had a major emphasis in the same area.  What did surprise me, however, was how far they have advanced the concepts and implementation in such a short timeframe!

Successful collaboration between decision makers requires an environment that facilitates a free-flowing but well-managed conversation about ongoing analyses as they evolve from initial ideas to full-fledged solutions to business problems.  Consider a common scenario.  The first analyst gathers data she considers relevant and creates an initial set of assumptions, data manipulations and results.  She shares this via e-mail with her peers for confirmation, and she receives suggestions for improvement, some of which she incorporates in a new version.  Her manager reviews the work personally and makes further suggestions; a new version emerges.  She also shared the intermediate solution with a second department, and the analyst there created another solution based on the original.  Meanwhile, the first analyst finds an error in her logic buried deep in cell Sheet3!AB102...

We all know the problems with multiple unmanaged copies, rework, silently propagated errors and so on in the usual spreadsheet- and e-mail-based business analysis environment.  Lyza and Lyza Commons together address these issues by creating a comprehensive tracking and auditing mechanism for every step of an analysis and providing an integrated environment for sharing and discussing work among collaborators.  Integral metadata links all copies derived from an initial analysis.  Twitter-like conversations (called Blurbs) about an analysis are linked to the referenced object creating a comprehensive context for the conversation and the underlying analysis.  The folks at Lyzasoft have also come up with a security concept for sharing analyses they call Mesh Trust that should make sense in most enterprise collaboration environments.

My bottom line?  Lyza and Lyza Commons 2.0 provide a seamless blending of analytic function, managed and controlled access to information resources and enterprise-adapted social networking around analytic results and their provenance.  This is precisely the type of function needed by businesses who want to regain control of spreadmarts that have run amok.  This is the right conceptual foundation for real, meaningful business insight and innovation going forward.

Posted February 25, 2010 2:58 PM
Permalink | No Comments |
As mentioned in my last post, ParAccel had a really interesting announcement coming out this week.  I was talking about their partnering with Fusion-io to attach SSD technology in their Paraccel Analytic Appliance for even faster query performance.  ParAccel are not alone in their use of SSD; Teradata's 4555 and Oracle's Exadata 2 also include the technology.  For me, it's not even about faster query results for users.  It's about the implications for the entire Data Warehouse architecture.

Over the past couple of years, we've seen dramatic improvements in database performance due to hardware and software advances such as in-memory databases, columnar storage, massively parallel processing, compression, and so on as described in my white paper from April 2009.  SSD, in one sense, is just another piece of accelerating technology.  However, add it to the existing list, and you begin to see the possibility of revisiting old assumptions about what is possible within a single database.  Here are a few ideas to play with:

  • Do you still need that Data Mart?  With so much faster performance, maybe the queries you now run in the Mart could run directly on the EDW.  Reducing data duplication has enormous benefits, on storage volumes, but principally in reducing maintenance of ETL to the Marts.
  • Where to do operational BI?  It was once considered necessary to install a separate ODS to support closer to real-time access to consolidated atomic data.  But with such a fast database, couldn't you just trickle feed the data and do it all in the Warehouse itself.  One less copy of data and one less set of ETL can't be all bad!
  • ETL or ELT?  Extract, transform and load has been the traditional way of loading a Warehouse, with a special engine to do the transform step.  Well, with a faster and more powerful database engine, you have the option to try extract, load and transform and let the Warehouse database do the transform work.
Although ParAccel, like all the smaller vendors are focusing more on selling to the "bigger, faster, more complex analytics applications" market at present, I'm pretty sure that the work ParAccel is doing under the covers on query optimization, workload management, loading and updating features will pave the way for a sea change in how we do data warehousing in the next few years.


Posted February 17, 2010 2:34 PM
Permalink | No Comments |
Kim Stanick and Rick Glick of ParAccel were at the Boulder BI Brain Trust (BBBT) last Friday. They have an exciting announcement coming soon, and much of what was discussed was under NDA, so I can't give details here. But about half-way through their presentation, they threw up a slide saying simply "EDW: What's not working?"

Well, that's a negative question! And, anyway, I believe most of us have some good ideas about what's not working--from project scoping and delivery issues to problems of complexity of feeds and bottlenecks in timely data availability. So, let me re-frame the question: "Where next for EDW?"

I wrote a BI Thought Leader for ParAccel last April called "Analytic Databases in the World of the Data Warehouse" that began to address that question, and as the world of BI has evolved since, I want to revisit that question briefly. Back then, I wrote:

"Specialized analytic databases using [advanced] technologies ... now offer significantly improved performance for typical BI applications, enable previously impossible analyses and often lower cost implementation. They also have the potential to challenge the current physically layered Data Warehouse architecture. This paper ... argues that analytical databases may enable a move to a simpler non-layered architecture with significant benefits in terms of lower costs of implementation, maintenance, and use."

In brief, it's our old friend, the paradigm shift, enabled by a dramatic shift in the price-performance characteristics of data warehouses driven by a new generation of technology. The possibility I saw then was a return to a physically simpler, more singular implementation of the EDW. And indeed that may still be a first step.

My thinking has evolved further since then, and I'm really beginning to envisage a much larger problem space that we need to address--how to integrate the entire enterprise information set, operational, informational and collaborative. I call that Business Integrated Insight (BI2), described in a more recent white paper. The discussion at BBBT last Friday led by a number of physical database technology experts gave rise to some new insights into how BI2 could be physically instantiated.

Virtualization at every level of the environment--servers, applications, data and particularly databases--linked closely with advances in the technology (as opposed to the hype) of cloud computing is widely discussed today as a way to reduce IT capital and operating costs, consolidate infrastructure, simplify resource management and so on. However, database virtualization offers new possibilities in the physical implementation of an enterprise data architecture that spans all data types and processing needs. Chief among these are flexibility of implementation, adaptability, mediated access to and use of data across multiple database types, significant reductions in data duplication and the gradual construction of overarching models that describe the entire business information resource. I'm sure there's much more to be said on this topic, but I'd love to hear the views of some experts in the field.


Posted February 9, 2010 6:52 AM
Permalink | No Comments |
I've been busy on the conference circuit in Europe over the past couple of months and have spoken extensively on the growing importance of social networking approach for decision makers, especially those at management and higher levels.  The response is increasingly positive.  From the nods of agreement I received in the past, I now get more anecdotes of what has happened and what was the outcome.

But anecdotes are one thing; some real research is another.  So, I've been very gratified to see the results of research carried out by the Society for New Communications Research (SNCR) over the past year, which corroborated my view that social networking is going to be big for BI.  Don Bulmer's blog entry gives all the details, but there was one snippet that, in my opinion, deserves special attention.  The age profile of users of social networking tools has a double peak - one in the under-35 bracket (as would be expected) and another in the over-55s, which came as a bit of a surprise.  It seems that the older decision makers must see the benefits of social networking, not through high prior familiarity with internet tools but based on the results they achieve.  And it also poses a question - how do we get the middle of the age range (my own.... just about!) engaged?

I suspect that the answer will come down to BI vendors actively including more real social networking functionality and connectivity in their tools.  And addressing the question of how to effectively use the function more effectively within the organization (where I guess the majority of the middle(-aged) managers are focused in the decisions - from both a data and people viewpoint).  For BI tool vendors, it's still all to play for.

Posted November 19, 2009 4:56 AM
Permalink | No Comments |

I normally treat these debates on the paternity of the term "data warehouse" with a joke and a smile and let them go right by.  But Bill Inmon's latest newsletter article is just too factually incorrect to let it go without rebuttal.

In the article, Bill says:
'So let's examine the facts, something that "RAUL634" does not care to take into account.

In the mid 1980s, Barry Devlin, a research associate of IBM in Ireland, wrote an article discussing an "information warehouse." The article was written in the IBM Systems Journal. The article went on to address some vague and generally undefined concepts about the thing called an "information warehouse."'


Bill - please check YOUR facts before going into print. 

My (and Paul Murphy's) 1988 IBM Systems Journal article described an architecture, of which the key component was the "Business Data Warehouse".  It was far from vague, although it was certainly high level.  It introduced and defined many of the concepts that continue to be at the core of the data warehouse today.  Since IBM still owns copyright on the full article, I can't publish it here, but here is the key figure from it, and FACT - it does use the term "data warehouse" and define it with sufficient clarity that most people would accept it as the forerunner of the data warehouse today.

And here is the link to the full document on the IBM website, although you now have to pay to download it.

I can also state as a FACT that I and others within IBM Europe were using the term "data warehouse" internally as early as 1985-86.  However, despite widespread search, I have never found the term used in the public domain before my 1988 paper. 

Furthermore, it is a well-known and easily discoverable FACT that IBM announced the "Information Warehouse" in 1991.

And Bill - if you're really keen on facts - I suggest that you edit your own bio: "Bill is universally recognized as the father of the data warehouse."  By my dictionary, "universally recognized" means that literally everybody accepts the attribution. Clearly, some would disagree...
 


Posted August 6, 2009 8:49 AM
Permalink | 2 Comments |
There's been a lot of interest in Enterprise 2.0, social networking and collaborative working over the past couple of years.  However, very little thought has been given to how such techniques can be realistically incorporated into today's Business Intelligence paradigms.

Working with Scott Davis and the folks at Lyzasoft over the past couple of months has given me pause to consider just how the rather controlling mindset of Data Warehousing will need to change to accommodate and encourage the more flexible approach to BI that Enterprise 2.0 implies.  To this end, I've come up with the "adaptive information cycle", a model that links the center-out approach of traditional data warehousing to the edge-based, emergent prototyping that characterizes today's analytic environment.

Traditionally, IT has always seen itself as the supplier of quality data to the decision makers, extracting data from the operational environment, cleansing and consolidating it in the Data Warehouse and making it available to business analysts through data marts and similar tools.  While this has undoubtedly been a good strategy, we still find numerous analysts loading up non-warehoused data and analyzing in non-standard, innovative ways.  While IT has railed at the plague of "spreadmarts" that has impacted data consistency and quality, there is no doubt that, from a business viewpoint, these independent thinkers are providing worthwhile answers and innovative ideas.  It's simply not on for IT to say "Quit doing that!"; we need a way to bring these activities into a more controlled environment and to link the emerging information needs and analyses back to the Data Warehouse.

The point about a controlled environment I dealt with earlier in a white paper on "playmarts", also originally developed in collaboration with Lyzasoft.

In a new white paper, available today on the Lyzasoft site, I deal with the absolutely essential linking of new insights developed by business analysts back to the Data Warehouse environment.  But, can we afford to link every business analyst's uncorroborated insight back to the warehouse?  Would we even want to?

Probably not, and this is where collaborative analytics comes in.  By enabling and encouraging business analysts to share and reuse their work in a managed and controlled environment, we can benefit from the "wisdom of crowds" - as analysts collaborate, best practices emerge through data and function that is invented, shared and cross-checked among one-another.  And what Lyza has now provided is an initial set of function to enable business analysts to collaborate in the creation of the new data sets and function the business needs.

Of course, this is only a first step on a longer journey that will involve a reappraisal of how the ubiquitous spreadsheet can be brought under control.  And we'll need Microsoft to step up to that one.  But Lyzasoft have made a good start on the principles and techniques needed. 

Posted July 9, 2009 6:41 AM
Permalink | No Comments |
I was intrigued by a webinar entitled "Extending Data Warehouse Architecture: Deploying a Data Warehouse without Databases" by Bill Inmon and Compact Solutions, so gave up an hour of my evening the other night to listen in.  I have to say that I didn't learn much about extending the Data Warehouse architecture.  Nor was I even mildly convinced that the solution offered was either a Data Warehouse or lacking a database.

The solution was described as being based on Unix Compressed Files that are partitioned and indexed to support querying along commonly used dimensions.  Now, what is not "database" about that?  From what I could gather, the data can only be queried from Compact's own proprietary user interface, so appears not to support SQL.  Updating seems to take place only through ETL tools such as Ab Initio, so I guess it's not ACID (Atomicity, Consistency, Isolation, Durability) compliant.  So certainly, it's not a full-function database and thus cheaper to implement and maybe faster running, but claiming it's not a database at all seems like marketing-speak.

More important - is the resulting solution a data warehouse?  Well, it was claimed that no modeling was needed to to set it up (another low cost implementation selling point).  So, how does data integration and cleansing get defined?  It sounded like the partitioning and indexing was done with some specific types of access in mind.  So, maybe a cheap and large data mart at best, but not a data warehouse.  And if you want to use any of your standard BI tools, you have to export the data into a (real) relational database or cube!

And finally, in response to a question on where to position this in his own Data Warehouse 2.0 architecture, Bill replied ... ummm, it doesn't really fit anywhere ... it's a special category on its own.

Personally, I don't think I buy it as a Data Warehouse or a non-database...

Posted April 21, 2009 3:27 AM
Permalink | No Comments |
I was a speaker at the TDWI World Conference in Las Vegas last week, and if there was one phrase that kept occurring to me, especially in the vendor exhibition, it was disruptive technological change.  Those of you who've read "The Innovator's Dilemma" by Clayton Christensen will know what I mean by that phrase. For those who haven't, I'd say it's a must-read for anyone involved in a technology-driven industry.

By Christensen's definition, disruptive change occurs when a new technology has some feature that is not applicable in an existing market and performance characteristics worse than existing technology in that market, but capable of growing to meet that market's needs in time.  What happens is that the new technology debuts in another, often related, market and then moves back into the original market, often displacing the existing suppliers there.  Christensen's key example relates to the development of the disk drive market form the '70s to the '90s and the failure of many of the incumbent 14-, 8- and 5.25-inch drive manufacturers over that period.

What struck me at TDWI was the explosion in novel and even radical approaches to the database and storage side of data warehousing that were on view.  While most of the technologies are not new, the combinations and price-points are certainly innovative and maybe disruptive.  For many years, the DW database market has been very quiet, but the last couple of years has seen an explosion in new entrants.  What the newcomers have in common, from the more established ones like Netezza to the more recent entrants like ParAccel, is a focus on query performance and large data volumes in specific analytical applications that might traditionally be called data marts.

As these vendors' technologies and techniques are proven in largely stand-alone environments, they are beginning to raise questions in the traditional enterprise daat warehouse arena.  We've already seens the incumbents (Teradata, IBM, Oracle and Microsoft) introduce appliance-like solutions.  But the real question I see relates to the underlying architecture of the data warehouse itself.  After more than 20 years, are we about to see a fundamental change in the way we design business intelligence environments?

I'll be exploring this question over the coming months, but I'd love to hear your views at this stage!  

Posted March 3, 2009 7:10 AM
Permalink | No Comments |
Back in July, I resolved to blog regularly, and I did manage to do so for more than two months, but then I got distracted, lazy or just busy (take your choice!), so this is my first blog since the end of September. It's a bit too early for new year resolutions, so I'm just going to take it one blog at a time. One of the things I got busy on was writing a white paper sponsored by Lyzasoft which they published recently. Speaking to the folks at Lyzasoft, and also to participants in conferences at which I was presenting over the same time period, I found myself looking again at the role and positioning of Business Intelligence tools vs. the people who use them. And I've deliberately phrased that as "versus" - because it's well accepted that many "analysts" who should use BI tools as part of their toolkit end up either fighting the prescribed tool or abandoning it altogether. Business Intelligence these days is a term that covers a multitude of sins, from executives defining and executing business strategies to automated processes identifying potential fraudulent transactions without any human intervention and everything in between. Depending on the particular business need, software vendor, consultant or industry analyst involved, the focus in different implementations can vary dramatically. All remain BI, but each requires very different thinking and architectural approaches. One set of users who often sit uncomfortably on the boundary of BI are often called business analysts within their organizations. They tend to use and combine data in new or unusual ways in order to gain new insights into what is going on in the business. They often obtain the data they need from beyond the data warehouse, because the warehouse doesn't hold the data they need or has cleansed it in a way they don't agree with or they simply don't know it's there. They often manipulate and combine data in different ways as an iterative part of their analyses. And many times their tool of choice is the spreadsheet. It's pretty clear that these business analysts should be supported by the BI community. Within their own organizations, the data they require should come increasingly through the data warehouse in order to ensure data integrity and consistency. The analyses they perform and the outputs of their work also need to be maintained and tracked for compliance and regulatory reasons. Overall, setting these people off to find, manipulate and analyze data in a haphazard way simply doesn't make sense. Fitting the needs of business analysts into the data warehouse architecture is, however, possible. I've coined the term "playmart" to represent the type of environment these users need. The aim is to balance agility (playing) with control (in a data mart). I've defined eight key characteristics of a playmart in the white paper (see also below) and would be very interested to hear your views on them.

Posted December 19, 2008 4:04 PM
Permalink | No Comments |