(Article URL: http://www.b-eye-network.co.uk/view-articles/3750)
Several emerging technologies, such as business activity monitoring, event-driven architectures, and service-oriented architectures, are attracting attention based on their potential to revolutionise the way business intelligence solutions are developed and delivered.
In this article, let’s explore business activity monitoring (BAM). We’ll review the benefits of integrating BAM and business intelligence architectures, and consider if – in an environment of ever closer integration – extract, transform and load (ETL) technology remains a requirement for success.
Product Convergence and Real Time Reporting
It is difficult to tell whether the pressure for real time reporting is a push by the vendors or a pull by the users. Certainly all the major ETL vendors have felt the need to introduce a real time element to their products allowing them to both publish and receive messages from leading enterprise application integration (EAI) players. Indeed the direction of these product sets is to subsume messaging technologies, not just to integrate with existing technologies. EAI vendors have responded by adding functionality to move, transform and manage bulk data – although their strategic direction remains to provide superior messaging and Web services capabilities.
The result of these product introductions are that both ETL and EAI vendors position themselves as providing enterprise data integration, rather than ETL or EAI, and both can deliver “real time” business intelligence (BI) capability. But still the question remains: is there a requirement for real time reporting, and business justification for it?
There does not seem to be a clear definition of what constitutes real time BI reporting. Depending on your viewpoint, it can vary between a latency of less than one minute, up to 15 minutes. Our long held view is that “real time” business intelligence is a relative and not an absolute metric. Most business users seek latencies one order of magnitude lesser than the status quo when asking for “real time” information delivery.
As the notion of real time speeds up, implementation and delivery costs increase exponentially. But in reality, examples of real-time BI still are difficult to find and even more difficult to justify. Obvious examples in the energy trading business are credit and market risk systems. These constantly measure exposure based on new and existing deals, and “market-to-market” values. But it could be argued that many examples of real time business intelligence are more the application of business activity monitoring rather than business intelligence, as we’ll discuss later.
It is also important to distinguish between real-time reporting, however defined, and the delivery of a message-based BI architecture. Traditionally, a BI architecture has included an ETL tool to extract data from heterogeneous systems, translate the data into some form of canonical structure, and load it in to a cumulative data store. The frequency of extracting data tended to be driven by the dynamics of the reporting cycle, or by the availability of the source systems – daily loading and reporting being among the most common.
Is an ETL Tool Always Required?
In this traditional architecture, the ETL elements of the solution tend to be the most demanding in terms of implementation effort and technical support, often consuming more than 50% of the overall budget. This is related to both the initial cost of the tools, often greater than £525k ($1m), and the effort required to extract data from the multiple sources and convert it into the target data structure.
But with the growth of EAI, is it necessary to question the need for ETL technology? Even if the business does not require real time BI reporting, would it not be possible simply to load the warehouse from the integration bus, thus bypassing the problems of connecting to disparate systems and converting the data in to a common format? Depending on how the messaging infrastructure has been designed, the required events should be flowing along the wire, and in a format that is already conformed to a canonical model. Therefore, loading the warehouse should be a simple a matter of subscribing to selected messages. The expense and effort of implementing an ETL is eliminated.
There are additional benefits to be gained from tapping in to the integration network. BAM provides a means of managing business operations down to any desired level of granularity. It can quickly highlight a sudden increase in the production and shipment of incorrect goods, or delays caused by faulty supplies to the business. BAM differs from business intelligence by focusing on the effective execution of business processes. It reports on the delivery of business events making up a process. In contrast, business intelligence is used to measure and help define business tactics and strategy. Both BAM and business intelligence have associated key performance indicators, and both may offer support for bridging the gap between actual values and target key performance indicators, or business performance management.
Even though BAM focuses on improving performance of low level processes, and business intelligence on improving year-on-year business performance, it is still the case that one is a cumulative view of the other. Consolidating all sales invoice events across all products over a specific period delivers annual revenue. So given that we can load messages into a data store of some type, it should be possible to deliver both BAM and business intelligence from the same solution. The architecture appears obvious and appealing, a short term cache or operational data store would be used to provide the real time business activity monitoring functionality, and a cumulative store or warehouse would deliver the full BI capability. The operational data store would be flushed periodically, perhaps every 15 minutes, into the data warehouse.
The business benefits of integrating BAM and BI environments are significant. The cumulative data of the BI warehouse can be used to create predictive models that input to the rules engine of the BAM system. The BAM system could then, for example, provide content to a Web page suggesting products that a customer could purchase based on current activity (clickstream) and cumulative predicted behaviour. Other applications for this functionality include fraud detection in the credit card business and the detection of money laundering in banking.
In the energy industry, building on the credit example, BAM can add significant value by governing the rules and events related to credit. For example, if a pre-payment or collateral is required before releasing a shipment, BAM can provide data to the shipping department upon receipt of the credit collateral, allowing an organization to be responsive to their customer’s needs without taking risks by shipping too soon.
Another example in the upstream space where BAM adds value is in the area of using predictive analytics to anticipate equipment failures. Using real time data, the event model would govern the rules to notify the appropriate personnel of pressure, temperature, or vibration thresholds being exceeded. Trends can also be monitored via BAM to modify and optimize maintenance schedules.
Of course it would be possible to use a traditional ETL to service the warehouse while the EAI bus services the BAM solution. Data could then be passed between the two data stores as required. But, as outlined previously, this would introduce significant costs of implementing the ETL solution, plus the additional effort of reconciling the data drawn from the two environments.
Using the EAI Network: Potential Problems
Given all the advantages of using the EAI bus to drive an integrated BAM and BI environment, what are the potential problems?
One problem with relying on an existing EAI architecture is that messages may not provide a complete or consistent view across all required business processes. The main driver for implementing EAI is to transfer data between applications in order to progress through the execution of a business process. When an application needs to hand control over to another application, it will publish a message. The receiving application will then parse the message and continue executing the process. The publishing of the message is based on the functionality of the application. If a business process involves six significant events and the application executes four of these, it will only publish a message on executing the fourth event; there will be no requirement to publish messages related to the internal events. The result is that the granularity of the messages on the EAI bus may not match the requirements of either the BAM or BI architectures.
A second problem with relying on the EAI environment to drive the BI solution is that although the operational systems are integrated at the process level, they may not be integrated across the business. This can result from, for example, the systems of merged businesses not being fully integrated, the evolution of regionally based independent networks, or purchasing systems not being linked to sales processing systems. The outcome is that several EAI networks may need to be accessed to obtain a view across the business. Each of these environments is likely to use different metadata models and different (non-unique) attribute identifiers.
It is also possible that not all systems are communicating via the EAI network or using a common data structure. In any environment, legacy applications exist that are likely to still be using native APIs to transfer data, or relying on point-to-point messaging, or simply operating standalone. Identifying where these gaps in coverage exist, and identifying some form or solution, will be a major undertaking.
Data quality and completeness must also be considered. An invoicing application only requires the name and address of the billed account; it does not require information about the account legal entity, or the hierarchy of businesses making up the customer’s organisation, or the relevant sales region and sales hierarchy. This information may be available within the organisation but may not be published on the network or even included in the integration data model (assuming there is one). And if this data is available, then what is its source, and is there a guarantee, or SLA (service level agreement), of its quality?
There may also be an issue of scope. An enterprise BI solution is likely to include elements which are beyond the boundaries of a business’ systems and processes. This could include reporting on competitor performance or showing how macroeconomic conditions affect sales. This type of information would normally be purchased from external sources and integrated into the data warehouse using a batch process or GUI.
And finally, there may be a question of performance. Unlike an operational system, which generally executes a relatively simple algorithm on relatively small amounts of data, a data warehouse may perform very complex processing on very large amounts of data. It could, for example, be allocating multiple foreign keys to a sales transaction to populate a fact table, performing several lookups in the process. It may also be constructing a number of aggregate tables. This activity would not be a problem if carried out overnight in a batch process; but in a messaging environment, where sales messages are arriving in rapid succession, it could introduce a processing bottleneck.
Clearly, there are potential cost savings and business benefits to be gained from implementing a BI solution around an EAI architecture. But before embarking on such a course, the environment needs to be assessed for potential issues and pitfalls. Whether a single approach is considered, be it ETL or EAI, or a combination of the two, the price of mistakes can be very high.
This article has introduced the concepts of BAM and the use of EAI to drive an integrated BAM and BI solution. It has also highlighted some of the potential problems of relying on an existing EAI environment to populate such a solution. In a future article, I’ll revisit these problems and explore the relevance of event driven and SOA architecture, reference data management, data stewardship, standards and governance.
John is a principal with Knightsbridge Solutions. He has more than 20 years of experience in building strategic solutions in the area of business intelligence, data mining and customer retention for the energy, telecommunications and finance industries. He holds a bachelor's degree in engineering and a masters in business administration from