Originally published 6 January 2010
Most organisations today have a business intelligence (BI) and data warehouse (DW) solution of some sort. The maturity of business intelligence and data warehousing has moved on from departmental through operational BI to the position I now see in many corporations to that of enterprise business intelligence.
The availability of a new generation of tools and BI solutions that easily integrate with ERP systems has undoubtedly provided real benefit in reducing overall time to solution.
However, the information explosion, plethora of tool options and information regulation and compliance presents us with more challenges, including:
Time does not allow me to cover all of these in this article so I’m going to highlight the first two.
By now most of us are familiar with the purpose of extract, transform and load (ETL) tools. Less well known, however, are the capabilities of the data virtualisation or enterprise information integration (EII) tools such as Composite or MetaMatrix.
Broadly speaking these provide the capability to access data from a massively wide variety of sources without having to move it from the source system. They have extremely rich caching and aggregation capabilities and, in my experience, have dramatically reduced the time to provide rich access to data. I once heard them described as “views on steroids”.
The use of EII technology in enterprise data warehousing and for data take-on is something that demands serious consideration. There are several ways in which EII can add value to DW solutions; here are just three to consider:
Prototyping data warehouse development. During data warehouse development, the time taken for schema changes, adding new data sources and providing data federation is often considerable. Using data virtualisation to prototype a development environment means you can rapidly build a virtual data warehouse rather than a physical one. Reports, dashboards and so on can be built on the virtual data warehouse. After prototyping, the physical data warehouse can be introduced.
Enriching the ETL process. Frequently new data sources, particularly from ERPs, are required in the data warehouse. All too often the ETL lacks data access capabilities to complex sources. Tight processing windows may require access, aggregation and federation activities to be performed prior to the ETL process. The powerful data access capabilities of EII provide rich access and federation capabilities which can present virtual views to the ETL process which continues as though using a simpler data source.
Federating data warehouses. How many organisations have more than one data warehouse? Is the information in each completely discrete? I don’t think so. Data virtualisation provides powerful options to federate multiple DWs by creating an integrated view across them. This has particular relevance in providing rapid cross warehouse views following a merger or acquisition.
When providing data into a data warehouse, the use of ETL or EII (or both) needs care. Some of the key considerations are shown in Figure 1:
Figure 1: ETL or EII?
As one of the most talked about BI technology trends this year, the theory of “open source” is undeniably good. In the commercial world however, remember that “commercial” open source doesn’t mean free. The business model for commercial open source usually means that support, training, consulting and so on are the supplier income streams. Thus said, it’s well worth considering these solutions.
For some organisations, management reporting and BI have been provided by spreadsheet-based reports and graphs. These have evolved from a few departmental reports to become an inter-twined set of Excel reports.
If this example spreadsheet life cycle sounds familiar, you are not alone. In fact, it is so familiar that many people just try to live with it. Managing data in this way will eventually lead to poor quality data in reports. Depending on the audience of the reports, the implication of poor data quality may be poor business decisions, loss of credibility, legal compliance issues and possible financial or legal penalties for breaching regulations. If these issues lead to an investigation of data management practises, the spreadsheet daisy chain is going to be hard to defend.
This doesn’t just lead to data quality issues; it also creates a very inefficient chain of data propagation. Everybody in the chain is dependent upon the previous people. Any issues identified need to be passed back along the chain.
Furthermore, by keeping all the raw data in your spreadsheet, you have far more data stored locally than you need. With the continuous stream of information security failings published in the press, can you defend why your local laptop has a spreadsheet containing all the low level data, when all you needed to publish were some high level key performance indicators (KPIs)?
Over time, departments become more and more dependent upon spreadsheets. Before long you have little departmental “cottage industries” producing spreadsheet applications often completely outside the governance of corporate application development strategies. These spreadsheet applications will inevitably need support and enhancement. You may end up with applications which themselves have a total cost of ownership that was never budgeted for.
For these scenarios, open source BI represents a quantum leap and is highly commended.
Open source BI is also suitable for larger scale opportunities too; however, before taking the plunge a few questions need to be considered:
The opportunity is thus to see where open source business intelligence can be beneficial in your overall solution portfolio. It really has now come of age.
Recent articles by Chris Bradley
Comments
Want to post a comment? Login or become a member today!
Be the first to comment!