Blog: Barry Devlin http://www.b-eye-network.co.uk/blogs/devlin/ Hello and welcome to my blog! Copyright 2010 Wed, 03 Mar 2010 11:17:21 -0700 http://www.movabletype.org/?v=4.261 http://blogs.law.harvard.edu/tech/rss EDW and Columnar Databases
But, I have been convinced for some time now of the much greater potential such performance unleashes in the broader and more complex EDW environment.  And the vendors have been fairly quiet about this part of the market so far, maybe preferring to leave such more technically and politically complex projects to the big guys.  So, it was good to see Vertica's 4.0 announcement last week beginning to address the EDW market with its emphasis on "enterprise ready" and a number of interesting new features and expansions of old functions.

Robust workload and resource management for mixed workloads is a prerequisite for an EDW.  Vertica's introduction of administrator-defined resource pools with memory-usage, priority and concurrency settings and the assignment of users to these pools is a big step in this direction.  A rework of the optimizer in support of this and other features suggests that Vertica are serious about this support.

Also introduced in V4.0 is a newly optimized single record lookup on primary keys.  While aimed at a particular financial analysis use case, this function shows that the database can do more than just crunch columns.  Added to the FlexStore feature introduced in V3.5 where newly loaded data is kept in row format in memory for some period of time, I believe we're seeing the database's growing ability to handle the sort of record-level processing often needed in EDWs.  The new time-series support in V4.0 also plays directly in EDW needs.

Time and customer experience will, of course, prove if I'm correct, but it seems to me that Vertica is beginning to test my assertion that columnar, MPP databases can be applied to EDWs.  And further that their performance characteristics offer the possibility of re-architecting the EDW / data mart divide. ]]>
http://www.b-eye-network.co.uk/blogs/devlin/archives/2010/03/edw_and_columnar_databases.php http://www.b-eye-network.co.uk/blogs/devlin/archives/2010/03/edw_and_columnar_databases.php Database Wed, 03 Mar 2010 11:17:21 -0700
Lyza Commons: how to integrate BI and Enterprise 2.0 white paper on Collaborative Analytics in the first half of 2009, it came as no surprise to me that version 2.0 of Lyza had a major emphasis in the same area.  What did surprise me, however, was how far they have advanced the concepts and implementation in such a short timeframe!

Successful collaboration between decision makers requires an environment that facilitates a free-flowing but well-managed conversation about ongoing analyses as they evolve from initial ideas to full-fledged solutions to business problems.  Consider a common scenario.  The first analyst gathers data she considers relevant and creates an initial set of assumptions, data manipulations and results.  She shares this via e-mail with her peers for confirmation, and she receives suggestions for improvement, some of which she incorporates in a new version.  Her manager reviews the work personally and makes further suggestions; a new version emerges.  She also shared the intermediate solution with a second department, and the analyst there created another solution based on the original.  Meanwhile, the first analyst finds an error in her logic buried deep in cell Sheet3!AB102...

We all know the problems with multiple unmanaged copies, rework, silently propagated errors and so on in the usual spreadsheet- and e-mail-based business analysis environment.  Lyza and Lyza Commons together address these issues by creating a comprehensive tracking and auditing mechanism for every step of an analysis and providing an integrated environment for sharing and discussing work among collaborators.  Integral metadata links all copies derived from an initial analysis.  Twitter-like conversations (called Blurbs) about an analysis are linked to the referenced object creating a comprehensive context for the conversation and the underlying analysis.  The folks at Lyzasoft have also come up with a security concept for sharing analyses they call Mesh Trust that should make sense in most enterprise collaboration environments.

My bottom line?  Lyza and Lyza Commons 2.0 provide a seamless blending of analytic function, managed and controlled access to information resources and enterprise-adapted social networking around analytic results and their provenance.  This is precisely the type of function needed by businesses who want to regain control of spreadmarts that have run amok.  This is the right conceptual foundation for real, meaningful business insight and innovation going forward. ]]>
http://www.b-eye-network.co.uk/blogs/devlin/archives/2010/02/lyza_commons_how_to_integrate.php http://www.b-eye-network.co.uk/blogs/devlin/archives/2010/02/lyza_commons_how_to_integrate.php Social Networking Thu, 25 Feb 2010 14:58:11 -0700
Data Warehouses and Solid State Disks (SSD)
Over the past couple of years, we've seen dramatic improvements in database performance due to hardware and software advances such as in-memory databases, columnar storage, massively parallel processing, compression, and so on as described in my white paper from April 2009.  SSD, in one sense, is just another piece of accelerating technology.  However, add it to the existing list, and you begin to see the possibility of revisiting old assumptions about what is possible within a single database.  Here are a few ideas to play with:

  • Do you still need that Data Mart?  With so much faster performance, maybe the queries you now run in the Mart could run directly on the EDW.  Reducing data duplication has enormous benefits, on storage volumes, but principally in reducing maintenance of ETL to the Marts.
  • Where to do operational BI?  It was once considered necessary to install a separate ODS to support closer to real-time access to consolidated atomic data.  But with such a fast database, couldn't you just trickle feed the data and do it all in the Warehouse itself.  One less copy of data and one less set of ETL can't be all bad!
  • ETL or ELT?  Extract, transform and load has been the traditional way of loading a Warehouse, with a special engine to do the transform step.  Well, with a faster and more powerful database engine, you have the option to try extract, load and transform and let the Warehouse database do the transform work.
Although ParAccel, like all the smaller vendors are focusing more on selling to the "bigger, faster, more complex analytics applications" market at present, I'm pretty sure that the work ParAccel is doing under the covers on query optimization, workload management, loading and updating features will pave the way for a sea change in how we do data warehousing in the next few years.

]]>
http://www.b-eye-network.co.uk/blogs/devlin/archives/2010/02/data_warehouses_and_solid_stat.php http://www.b-eye-network.co.uk/blogs/devlin/archives/2010/02/data_warehouses_and_solid_stat.php Wed, 17 Feb 2010 14:34:45 -0700
EDW: What's not working?
Well, that's a negative question! And, anyway, I believe most of us have some good ideas about what's not working--from project scoping and delivery issues to problems of complexity of feeds and bottlenecks in timely data availability. So, let me re-frame the question: "Where next for EDW?"

I wrote a BI Thought Leader for ParAccel last April called "Analytic Databases in the World of the Data Warehouse" that began to address that question, and as the world of BI has evolved since, I want to revisit that question briefly. Back then, I wrote:

"Specialized analytic databases using [advanced] technologies ... now offer significantly improved performance for typical BI applications, enable previously impossible analyses and often lower cost implementation. They also have the potential to challenge the current physically layered Data Warehouse architecture. This paper ... argues that analytical databases may enable a move to a simpler non-layered architecture with significant benefits in terms of lower costs of implementation, maintenance, and use."

In brief, it's our old friend, the paradigm shift, enabled by a dramatic shift in the price-performance characteristics of data warehouses driven by a new generation of technology. The possibility I saw then was a return to a physically simpler, more singular implementation of the EDW. And indeed that may still be a first step.

My thinking has evolved further since then, and I'm really beginning to envisage a much larger problem space that we need to address--how to integrate the entire enterprise information set, operational, informational and collaborative. I call that Business Integrated Insight (BI2), described in a more recent white paper. The discussion at BBBT last Friday led by a number of physical database technology experts gave rise to some new insights into how BI2 could be physically instantiated.

Virtualization at every level of the environment--servers, applications, data and particularly databases--linked closely with advances in the technology (as opposed to the hype) of cloud computing is widely discussed today as a way to reduce IT capital and operating costs, consolidate infrastructure, simplify resource management and so on. However, database virtualization offers new possibilities in the physical implementation of an enterprise data architecture that spans all data types and processing needs. Chief among these are flexibility of implementation, adaptability, mediated access to and use of data across multiple database types, significant reductions in data duplication and the gradual construction of overarching models that describe the entire business information resource. I'm sure there's much more to be said on this topic, but I'd love to hear the views of some experts in the field.

]]>
http://www.b-eye-network.co.uk/blogs/devlin/archives/2010/02/edw_whats_not_working.php http://www.b-eye-network.co.uk/blogs/devlin/archives/2010/02/edw_whats_not_working.php Tue, 09 Feb 2010 06:52:56 -0700
BI and Social Networking
But anecdotes are one thing; some real research is another.  So, I've been very gratified to see the results of research carried out by the Society for New Communications Research (SNCR) over the past year, which corroborated my view that social networking is going to be big for BI.  Don Bulmer's blog entry gives all the details, but there was one snippet that, in my opinion, deserves special attention.  The age profile of users of social networking tools has a double peak - one in the under-35 bracket (as would be expected) and another in the over-55s, which came as a bit of a surprise.  It seems that the older decision makers must see the benefits of social networking, not through high prior familiarity with internet tools but based on the results they achieve.  And it also poses a question - how do we get the middle of the age range (my own.... just about!) engaged?

I suspect that the answer will come down to BI vendors actively including more real social networking functionality and connectivity in their tools.  And addressing the question of how to effectively use the function more effectively within the organization (where I guess the majority of the middle(-aged) managers are focused in the decisions - from both a data and people viewpoint).  For BI tool vendors, it's still all to play for.
]]>
http://www.b-eye-network.co.uk/blogs/devlin/archives/2009/11/bi_and_social_networking.php http://www.b-eye-network.co.uk/blogs/devlin/archives/2009/11/bi_and_social_networking.php Social Networking Thu, 19 Nov 2009 04:56:18 -0700
Fathers of the Data Warehouse I normally treat these debates on the paternity of the term "data warehouse" with a joke and a smile and let them go right by.  But Bill Inmon's latest newsletter article is just too factually incorrect to let it go without rebuttal.

In the article, Bill says:
'So let's examine the facts, something that "RAUL634" does not care to take into account.

In the mid 1980s, Barry Devlin, a research associate of IBM in Ireland, wrote an article discussing an "information warehouse." The article was written in the IBM Systems Journal. The article went on to address some vague and generally undefined concepts about the thing called an "information warehouse."'


Bill - please check YOUR facts before going into print. 

My (and Paul Murphy's) 1988 IBM Systems Journal article described an architecture, of which the key component was the "Business Data Warehouse".  It was far from vague, although it was certainly high level.  It introduced and defined many of the concepts that continue to be at the core of the data warehouse today.  Since IBM still owns copyright on the full article, I can't publish it here, but here is the key figure from it, and FACT - it does use the term "data warehouse" and define it with sufficient clarity that most people would accept it as the forerunner of the data warehouse today.

And here is the link to the full document on the IBM website, although you now have to pay to download it.

I can also state as a FACT that I and others within IBM Europe were using the term "data warehouse" internally as early as 1985-86.  However, despite widespread search, I have never found the term used in the public domain before my 1988 paper. 

Furthermore, it is a well-known and easily discoverable FACT that IBM announced the "Information Warehouse" in 1991.

And Bill - if you're really keen on facts - I suggest that you edit your own bio: "Bill is universally recognized as the father of the data warehouse."  By my dictionary, "universally recognized" means that literally everybody accepts the attribution. Clearly, some would disagree...
 

]]>
http://www.b-eye-network.co.uk/blogs/devlin/archives/2009/08/fathers_of_the_data_warehouse_3.php http://www.b-eye-network.co.uk/blogs/devlin/archives/2009/08/fathers_of_the_data_warehouse_3.php Thu, 06 Aug 2009 08:49:09 -0700
Collaborative Analytics
Working with Scott Davis and the folks at Lyzasoft over the past couple of months has given me pause to consider just how the rather controlling mindset of Data Warehousing will need to change to accommodate and encourage the more flexible approach to BI that Enterprise 2.0 implies.  To this end, I've come up with the "adaptive information cycle", a model that links the center-out approach of traditional data warehousing to the edge-based, emergent prototyping that characterizes today's analytic environment.

Traditionally, IT has always seen itself as the supplier of quality data to the decision makers, extracting data from the operational environment, cleansing and consolidating it in the Data Warehouse and making it available to business analysts through data marts and similar tools.  While this has undoubtedly been a good strategy, we still find numerous analysts loading up non-warehoused data and analyzing in non-standard, innovative ways.  While IT has railed at the plague of "spreadmarts" that has impacted data consistency and quality, there is no doubt that, from a business viewpoint, these independent thinkers are providing worthwhile answers and innovative ideas.  It's simply not on for IT to say "Quit doing that!"; we need a way to bring these activities into a more controlled environment and to link the emerging information needs and analyses back to the Data Warehouse.

The point about a controlled environment I dealt with earlier in a white paper on "playmarts", also originally developed in collaboration with Lyzasoft.

In a new white paper, available today on the Lyzasoft site, I deal with the absolutely essential linking of new insights developed by business analysts back to the Data Warehouse environment.  But, can we afford to link every business analyst's uncorroborated insight back to the warehouse?  Would we even want to?

Probably not, and this is where collaborative analytics comes in.  By enabling and encouraging business analysts to share and reuse their work in a managed and controlled environment, we can benefit from the "wisdom of crowds" - as analysts collaborate, best practices emerge through data and function that is invented, shared and cross-checked among one-another.  And what Lyza has now provided is an initial set of function to enable business analysts to collaborate in the creation of the new data sets and function the business needs.

Of course, this is only a first step on a longer journey that will involve a reappraisal of how the ubiquitous spreadsheet can be brought under control.  And we'll need Microsoft to step up to that one.  But Lyzasoft have made a good start on the principles and techniques needed. 
]]>
http://www.b-eye-network.co.uk/blogs/devlin/archives/2009/07/collaborative_analytics.php http://www.b-eye-network.co.uk/blogs/devlin/archives/2009/07/collaborative_analytics.php Thu, 09 Jul 2009 06:41:26 -0700
Deploying a Data Warehouse without Databases - NOT!
The solution was described as being based on Unix Compressed Files that are partitioned and indexed to support querying along commonly used dimensions.  Now, what is not "database" about that?  From what I could gather, the data can only be queried from Compact's own proprietary user interface, so appears not to support SQL.  Updating seems to take place only through ETL tools such as Ab Initio, so I guess it's not ACID (Atomicity, Consistency, Isolation, Durability) compliant.  So certainly, it's not a full-function database and thus cheaper to implement and maybe faster running, but claiming it's not a database at all seems like marketing-speak.

More important - is the resulting solution a data warehouse?  Well, it was claimed that no modeling was needed to to set it up (another low cost implementation selling point).  So, how does data integration and cleansing get defined?  It sounded like the partitioning and indexing was done with some specific types of access in mind.  So, maybe a cheap and large data mart at best, but not a data warehouse.  And if you want to use any of your standard BI tools, you have to export the data into a (real) relational database or cube!

And finally, in response to a question on where to position this in his own Data Warehouse 2.0 architecture, Bill replied ... ummm, it doesn't really fit anywhere ... it's a special category on its own.

Personally, I don't think I buy it as a Data Warehouse or a non-database...
]]>
http://www.b-eye-network.co.uk/blogs/devlin/archives/2009/04/deploying_a_data_warehouse_wit.php http://www.b-eye-network.co.uk/blogs/devlin/archives/2009/04/deploying_a_data_warehouse_wit.php Tue, 21 Apr 2009 03:27:39 -0700
Disruptive Technological Change in the DW environment? "The Innovator's Dilemma" by Clayton Christensen will know what I mean by that phrase. For those who haven't, I'd say it's a must-read for anyone involved in a technology-driven industry.

By Christensen's definition, disruptive change occurs when a new technology has some feature that is not applicable in an existing market and performance characteristics worse than existing technology in that market, but capable of growing to meet that market's needs in time.  What happens is that the new technology debuts in another, often related, market and then moves back into the original market, often displacing the existing suppliers there.  Christensen's key example relates to the development of the disk drive market form the '70s to the '90s and the failure of many of the incumbent 14-, 8- and 5.25-inch drive manufacturers over that period.

What struck me at TDWI was the explosion in novel and even radical approaches to the database and storage side of data warehousing that were on view.  While most of the technologies are not new, the combinations and price-points are certainly innovative and maybe disruptive.  For many years, the DW database market has been very quiet, but the last couple of years has seen an explosion in new entrants.  What the newcomers have in common, from the more established ones like Netezza to the more recent entrants like ParAccel, is a focus on query performance and large data volumes in specific analytical applications that might traditionally be called data marts.

As these vendors' technologies and techniques are proven in largely stand-alone environments, they are beginning to raise questions in the traditional enterprise daat warehouse arena.  We've already seens the incumbents (Teradata, IBM, Oracle and Microsoft) introduce appliance-like solutions.  But the real question I see relates to the underlying architecture of the data warehouse itself.  After more than 20 years, are we about to see a fundamental change in the way we design business intelligence environments?

I'll be exploring this question over the coming months, but I'd love to hear your views at this stage!  
]]>
http://www.b-eye-network.co.uk/blogs/devlin/archives/2009/03/disruptive_technological_chang.php http://www.b-eye-network.co.uk/blogs/devlin/archives/2009/03/disruptive_technological_chang.php TDWI Tue, 03 Mar 2009 07:10:11 -0700
Playmarts - a new component in the Data Warehouse white paper sponsored by Lyzasoft which they published recently. Speaking to the folks at Lyzasoft, and also to participants in conferences at which I was presenting over the same time period, I found myself looking again at the role and positioning of Business Intelligence tools vs. the people who use them. And I've deliberately phrased that as "versus" - because it's well accepted that many "analysts" who should use BI tools as part of their toolkit end up either fighting the prescribed tool or abandoning it altogether. Business Intelligence these days is a term that covers a multitude of sins, from executives defining and executing business strategies to automated processes identifying potential fraudulent transactions without any human intervention and everything in between. Depending on the particular business need, software vendor, consultant or industry analyst involved, the focus in different implementations can vary dramatically. All remain BI, but each requires very different thinking and architectural approaches. One set of users who often sit uncomfortably on the boundary of BI are often called business analysts within their organizations. They tend to use and combine data in new or unusual ways in order to gain new insights into what is going on in the business. They often obtain the data they need from beyond the data warehouse, because the warehouse doesn't hold the data they need or has cleansed it in a way they don't agree with or they simply don't know it's there. They often manipulate and combine data in different ways as an iterative part of their analyses. And many times their tool of choice is the spreadsheet. It's pretty clear that these business analysts should be supported by the BI community. Within their own organizations, the data they require should come increasingly through the data warehouse in order to ensure data integrity and consistency. The analyses they perform and the outputs of their work also need to be maintained and tracked for compliance and regulatory reasons. Overall, setting these people off to find, manipulate and analyze data in a haphazard way simply doesn't make sense. Fitting the needs of business analysts into the data warehouse architecture is, however, possible. I've coined the term "playmart" to represent the type of environment these users need. The aim is to balance agility (playing) with control (in a data mart). I've defined eight key characteristics of a playmart in the white paper (see also below) and would be very interested to hear your views on them.]]> http://www.b-eye-network.co.uk/blogs/devlin/archives/2008/12/playmarts_-_a_new_component_in.php http://www.b-eye-network.co.uk/blogs/devlin/archives/2008/12/playmarts_-_a_new_component_in.php Fri, 19 Dec 2008 16:04:33 -0700 BI and the Financial Crisis As the worldwide banking crisis continues to escalate, one has to wonder-where was the Business Intelligence in all of this? What happened to Data Quality and Data Management?

First, we had the interesting revelation that the individual banks and lending institutions all seemed to be blissfully unaware of the extent to which they were exposed by lending in the sub-prime mortgage market. It's difficult to imagine how the information available to decision makers in these companies could have been so scarce or so uninformative. Most, if not all, financial institutions have had extensive and expensive data warehouses in place for many years now. Business Intelligence should easily have warned of the dangers. Was the increasing level of risk unmeasured, overlooked or simply ignored?

More recently, we've had the spectacle of banks being unwilling to extend short-term lending facilities to one another for fear that the borrowing institutions could go belly-up in the next few days! Could the lenders not know? Unfortunately, in this case, the answer is probably that they couldn't. Despite the fact that the worldwide financial market is tightly and instantly interconnected at a transaction level, the truth is that the underlying data remains disconnected and dispersed. Data Management and Data Quality have simply not been considered. Proper business governance in the financial markets as a whole is impossible without a well-defined and credible data foundation.

So, assuming that we can survive the crisis without a meltdown, what has been happening should be a clarion call to Data Management professionals in the financial industry particularly but also beyond. We need to recognize the interconnected and increasingly fragile web of data dependencies that hold the business world together. It's time to get out there and apply the principles we know and preach already. And we had better get moving quickly.

]]>
http://www.b-eye-network.co.uk/blogs/devlin/archives/2008/09/bi_and_the_financial_crisis.php http://www.b-eye-network.co.uk/blogs/devlin/archives/2008/09/bi_and_the_financial_crisis.php Sun, 28 Sep 2008 19:52:30 -0700
Reining in the spreadsheets... into Playmarts Enterprise BI shops and data quality departments regard spreadsheets largely as the work of the devil. Against all the rules of information quality, data in spreadsheets is manipulated by users at will and in private. Then the resulting data and function is distributed, shared and further played around with, until it's anybody's guess whether the results presented at the end bear any relationship to the truth. Data that was pure and clean as it came out of the data warehouse, data mart or approved BI report is now potentially as contaminated as nuclear waste.

And yet, check in with the users. Indeed, check in with yourself. Why is Excel so popular? Because it makes it easy to play with the data, check out hypotheses, get answers otherwise unavailable, and so on. And once you've gotten the answer through the spreadsheet, chances are you won't get the time or the resources to recreate the process in a more auditable, quality-conscious way. It's a real and spreading problem. But, what to do?

This week I had the opportunity to preview a new product called Lyza that's due to launch on Sept. 22. In fact, you can download it and play with it already. Scott Davis, the CEO of Lyzasoft Inc. explained that they had spent a lot of time investigating how business analysts, the power users of spreadsheets, actually work. This is usually a good idea, because you find out what the users really need, and which of your assumptions are right or wrong. It will probably come as no surprise that most analysts approach their work in a highly unstructured and iterative way, pulling bits of relevant data into Excel from a variety of known sources - both official data marts and reports as well as unofficial files, spreadsheets, etc. they happen to have created before or borrowed from trusted colleagues. And they do it in Excel, because that's the only way they can.

What Lyza does is to provide an easier, more intuitive way of pulling data together from diverse sources, combining and manipulating it and creating results and reports for distribution to the business. Well, that's all fine and dandy for the business analysts you may say, but how does it help the BI and data quality departments address the data contagion? The answer is that Lyza tracks and saves an audit trail of every action and every step of the analysis process that the user is building as well as enabling snapshots of the results to be cached and preserved for posterity. Now the data quality folks are beginning to smile. And the BI department? Well, they're less sure: they like the added traceability, but this is still outside their comfortable data mart zone.

However, we could look at it in a different way. We could imagine that Lyza provides a new type of data mart - a "playmart" - a sand box where power analysts can experiment with data and perform all sorts of analyses in a safe, well-managed environment. Now, if only we could evaluate the analysts' logic and productionalize those analyses and reports that are going to be reused and built upon in the future.

Scott's initial answer was that you can certainly do all this within Lyza itself. But a bit of further probing convinced me that the metadata that Lyza stores to describe the analysis processes is probably sufficient to enable the creation of ETL scripts for your ETL engine of choice. This would certainly require further investigation and automation, but it seems like the bones of the idea are there. In this case, the playmart could address a set of business analysts' needs that have been long ignored by the BI departments and by BI vendors as well.

The only real fly in the ointment is whether Lyza will be able to convince the spreadsheet jockeys to get off their current Excel rocking horse and jump on the bright new Lyza pony in the playmart! (And that sentence would work so much better if only Lyza had chosen a mustang for their logo rather than a gecko.)

]]>
http://www.b-eye-network.co.uk/blogs/devlin/archives/2008/09/reining_in_the_spreadsheets_in.php http://www.b-eye-network.co.uk/blogs/devlin/archives/2008/09/reining_in_the_spreadsheets_in.php Fri, 19 Sep 2008 16:54:35 -0700
Decision Intelligence or Highly Evolved Business In my last post, I shared some thoughts inspired by the Decision Intelligence article written by Claudia Imhoff and Colin White. There, I suggested we need to really begin to consider all information as a single resource for the whole business. This entails stepping beyond our traditional IT-bounded view of our systems and looking at them with a renewed business vision. If we do this, it will also quickly becomes clear that our view of process needs reworking too.

Claudia and Colin have drawn a box on the left of their architecture picture that arises directly from the insight that operational BI really is a different beast from the traditional BI we've all known and loved over the past 20 years or so. When you deeply consider the implications of building an operational BI system, as Claudia and Colin clearly have, it becomes obvious that operational BI has many of the characteristics of traditional operational or transaction-processing systems. Therefore, from a systems architecture point of view, you put them in the same box, in this case called "Business process intelligence".

There are also some differences, of course. The most important is how the business users interact with these two related types of system. The value proposition of operational BI is that human decision-making skills can improve operational processes. How? Well, there are two very distinct threads here.

One is the proposition that we can apply advanced analytics technology automatically to parts of the operational process. Fraud detection is a good example. Applying advanced analytics on the fly to credit card transactions gives better detection of fraudulent transactions. Note that this type of operational BI is almost completely invisible to the business users: they see the results of more fraud detected or less false positives, but how that happened is both unknown and uninteresting.

The second thread brings users very directly into the loop. Here, the operational BI technology is made part of the users' visible process. Business users are presented with decision support technology that displays trends or exceptions in near real-time data, so that they can potentially choose a different course of action to that embedded in the normal flow. In effect, business users get to change the business process on the fly, rather than doing little more than data input as was previously the case.

Now, keeping this in mind, here's the million dollar question. What's the difference between an operational system and an informational system; how do you distinguish between an operational process and an informational process? In the good old days, it was easy! The operational side was nearly or actually real-time, dealt with individual transactions or data elements according to a predefined process where the users had minimal freedom to act intelligently. Informational systems, in contrast, were centered around users who were expected to make intelligent decisions based on historical data without any clear process to turn those decisions into action.

So, what is the answer today? When we in BI start building operational BI and the operational world starts implementing adaptive SOA-based systems, the distinction between operational and informational more-or-less disappears. This puts operational BI and operational systems together in one box of the architecture. But the deeper and probably longer-term implications of this bold step have not been explicitly called out. In fact, these implications are obscured by the naming of the new architecture as "Decision intelligence", because the top level of this architecture is no longer confined to the world that was formerly BI; it actually becomes the single, common process or interface through which all business users will interact with the underlying IT systems.

Is that scary? Absolutely! But it is a clear and logical consequence of the paths that BI and operational systems are currently on. It means that we in BI are no longer in total control of our destiny. But the same is true of the operational systems. And, although I've not covered it here, collaborative systems (e-mail, office support, etc) are also being drawn inexorably into the same converged path.

It's time we all started to talk to one another! And that does imply that decision intelligence may be too narrow a term for us all to agree on. May I propose again the "Highly Evolved Business"?

]]>
http://www.b-eye-network.co.uk/blogs/devlin/archives/2008/09/decision_intelligence_or_highl.php http://www.b-eye-network.co.uk/blogs/devlin/archives/2008/09/decision_intelligence_or_highl.php Sun, 14 Sep 2008 12:26:10 -0700
Decision Intelligence Claudia Imhoff and Colin White have a lengthy history of insightful and provocative contributions to the development of Business Intelligence. Their recent article, Decision Intelligence, is no exception. Their thesis is that the IT support needed for decision-making, now known as "Business Intelligence", today extends far beyond the traditional domain of data warehousing and is in need of a new architecture and a new name - Decision Intelligence.

I fully agree. I've been using the terms "Highly Evolved Business" and "Business Insight" over the past year or so to express exactly the same thought. Indeed, Claudia, Colin and I have discussed this whole idea already at length and are very much on the same page. But I hadn't seen their architecture picture before, and it gives me the opportunity to discuss the whole topic from a higher perspective in this and the next post.

Under Decision intelligence, the architecture shows three vertical blocks called "Business process intelligence", "Business data intelligence" and "Business content intelligence". The meanings of these blocks are fairly obvious, but take a look at the linked article for a full explanation. My thought is that they are almost too obvious: they closely reflect our current arrangement of systems building blocks in the IT world.

Let's first examine the data and content blocks. Today, if you look at typical enterprise implementations, you will certainly see databases and separate content stores. You'll also notice independent systems built upon these separate stores. But, if you step back from the storage and processing issues, it's pretty difficult to distinguish between the two categories. Try explaining the difference to a business user!

Take an example of a clinician who's trying to make a treatment decision. She's looking at a chest x-ray - content in our terms. And she's also looking at the "structured data" that goes with it: this x-ray is of a 45 year old male, smoker of 20 cigarettes a day for the past 30 years who has been admitted with shortness of breath. Does she see unstructured content and structured data that must somehow be combined in her decision making? I'd argue not. She simply sees a set of information she's using.

And some of the old barriers between the storage of structured data and unstructured content are breaking down. Where is the EXIF data (structured metadata) of a photo stored? Yes, in the JPG file along with the unstructured content. Where do e-mail systems store the structured metadata about sender, subject, date sent, etc? Sure, in the database with the unstructured e-mail body content.

I could make a similar argument about the lack of distinction between real-time data (or operational) data and historical (data warehouse) data.

My point is that if we want to create a new vision for the future, we need to start seeing the world through non-IT eyes. It's all information. It's a single concept; a single category of "stuff". And we in IT need to start creating the tools and methods that allow us to create, manage and make available all information in a coherent and consistent way. At a conceptual level, that has to be the goal and that should be our first pictorial representation.

Keep that thought in mind. I'll come back to next time when I look at the process side of the picture.

]]>
http://www.b-eye-network.co.uk/blogs/devlin/archives/2008/09/decision_intelligence.php http://www.b-eye-network.co.uk/blogs/devlin/archives/2008/09/decision_intelligence.php Mon, 01 Sep 2008 10:51:21 -0700
Instant Gratification vs. Quality Time I was browsing through the blogs on B-eye-network.com this morning (Sunday - yeah, sad, I know) and came across two recent entries that spoiled my coffee. Given that I'm no fan of instant gratification (in IT anyway), I'm not going to give you links, so you have to work at finding them yourself. But the phrases that caught my eye were "Instant SOA", "Data marts in about an Hour" and "full EDW's with AS-IS star schemas in 2 weeks".

Now I'm as fond of a shortcut as the next guy, but I've learned the the word "Instant" is not all goodness. When I've bought some instant Spaghetti Bolognese in the local supermarket I've found that the cost is a lot higher than the individual ingredients and the taste, well, leaves a lot to be desired. Sure, I saved some time when I got home, but did I get value for money? And did I end up with what I really wanted? So, why should I expect more from an Instant DW?

"Caveat emptor" as the Romans used to say. Here are a few contra-indications for when instant gratification should not be expected in your next BI (or SOA) project:

  1. The business users are not quite sure what they want.
    Most BI projects start with a vague set of requirements from the potential users. It's going to take some time to hone these down to a usable definition of data and query needs. In the meantime, maybe it's best to let the users continue to play with their instant Excel spreadsheets and look over their shoulders to see what they're doing.
  2. Somebody forgot to document the meanings of the data in the source applications.
    This is the oldest metadata problem. If your data sources have not been properly described, an Instant DW is likely to be instantly dismissed as misleading and inaccurate. Do you want to go there?
  3. Garbage in, garbage out. Or worse...
    If your ingredients (data sources) are contaminated with erroneous data, you're going to end up with a very sick business on your hands if you just take the Instant DW approach. Understanding and fixing dirty data is time-consuming, but mandatory.

It's all about quality time... or quality vs. time. If I bring home my instant Spaghetti Bolognese, I may get it on the table within a few minutes. But, if the kids won't eat it or, worse, throw up that night, I'd argue I've made the wrong trade-off between time and quality. You need to consider the same balance in a BI or SOA project.

Now, I'm off to spend some quality time with my kids :-)

]]>
http://www.b-eye-network.co.uk/blogs/devlin/archives/2008/08/instant_gratification_vs_quali.php http://www.b-eye-network.co.uk/blogs/devlin/archives/2008/08/instant_gratification_vs_quali.php Sun, 24 Aug 2008 12:20:20 -0700