Emerging Technologies in Operational Business Intelligence

Originally published 3 December 2008

Following on from my article, “Getting Started with Operational BI,” I would like to continue the discussion on this topic by delving into new emerging technologies in this field. In my last article, I defined two broad categories of operational business intelligence (BI). These are operational use of on-demand BI (typically BI services integrated into operational applications and processes) and event-driven operational use of BI. The focus of this article is event-driven operational BI.

In my research into operational BI, I have developed an Operational BI Adoption Survey, which I would encourage you to complete on the Intelligent Business Strategies website. In that survey, there is a question about what it is you think operational BI includes, and you are asked to select from a list of options. Two of these options, business activity monitoring (BAM) and complex event processing (CEP), are about event processing. Having written about BAM on a number of occasions, my focus this time is on CEP, which is a newly emerging technology.

Why Invest In Event Processing?

There is no doubt that the speed at which processes execute is on the increase. Take ordering a “build-to-order” PC for example. In the past, it would take about a month to get your build-to-order PC, and now it takes a few days. Checking in at an airport is now a simple bag drop or heading straight into the security line if you already checked in at home. If these things are happening, then clearly the speed at which events happen in business is on the increase. In addition, there are increasing amounts of information available to consume. Companies are investing in instrumentation to capture more and more data to make sure they can track business operations. Look at RFIDs in retail supply chains and sensor networks in manufacturing and in oil and gas. These technologies are resulting in an explosion of event data that businesses can now capture to measure and analyse business performance. However, it is not just about being able to analyse this information, it is about being able to interpret it and act on it more and more rapidly.

One thing is very clear. If you have a classic BI system whereby operational data is captured and integrated overnight, then the earliest any business analyst can respond to an issue is 24 hours later. That is not good enough when the speed of business is in the tens of thousands, hundreds of thousands or even millions of events a day. With classic BI systems, businesses are unable to see problems, assess the impact of events (e.g., changed order) on their businesses and respond in time. In short, if your business is only taking data from operational systems into a BI system for analysis once a night, then for much of the day you are flying blind. Fraud is a clear example. Fraud must be detected as quickly as possible so that action can be taken to stop it. Taking action the following day is too late. In fact, it would be preferable to detect a pattern in the data that predicts fraudulent activity before it happens so that the business impact is minimised and the business continues to run smoothly. Risk management is another example. If a banking customer defaults on a loan payment, how long does it take before the bank closes off the credit limits on the credit card(s) of same customer? If a truck breaks down on a valuable customer delivery, how long does it take to respond to get the order to the customer on time? There are clearly all sorts of good reasons why businesses want to monitor events including to remain in compliance, manage and minimise risk, prevent business disruption, reduce operational cost, etc. But it is more than just simple events that need to be monitored. The correlation between many events also needs to be understood. For example, a password change on an account, a large withdrawal and a change of mailing address might indicate fraudulent activity. This correlated event pattern is known as a complex event, and this may be an event that the business has to respond to rapidly.

Now take this to another level. Consider millions of products with RFID tags running under RFID scanners at different points in a supply chain, or a billion mobile phones on the Internet and the huge volume of events that this may cause, or the financial markets and the millions of events happening there every minute or even every second. Identifying event correlations in massive “event clouds” (potentially millions of events) is non-trivial. It is complicated by the fact that the events that form a correlated pattern can arrive out of sequence and come from different sources (e.g., a mix of external web feeds and internal OLTP systems). To further complicate things, there could be several instances of the same correlated pattern in an event cloud with events associated with each instance of the pattern mixed in with events of another instance of the same pattern. Figure 1 shows the concept of CEP. It is derived from a similar diagram I saw recently at an event processing analyst briefing.

alt

Figure 1: Complex Event Processing

CEP Requirements

These complexities give rise to a number of requirements for complex event processing which are:

  • Scaling to deal with the sheer number of events

  • Multiple event sources that need to be monitored and queried over specific time windows (e.g., the last 5 minutes, the last 20 seconds, etc.)

  • The need to process event data in memory as it “flies by”

  • The need to detect events by querying event data from different data sources well before that data ever reaches a data warehouse or data marts

  • The need to filter out only the events that matter and correlate events from multiple data sources in real time

  • The need to integrate event data with other data from different sources well before it ever reaches a data warehouse or data mart

  • The ability to include a data warehouse or data source as a data source

  • The ability to define rules to govern actions

  • The need to automatically analyze event data well before it ever reaches a data warehouse or data mart using predictive and scoring models as well as statistical analysis

  • The need to automatically act to respond in a timely manner well before data reaches a data warehouse or data mart

  • The need to create events as an action message

  • The need to store events in an event store for subsequent analysis and reporting using traditional BI tools

Another name for event data that is moving over an internal web, an internal enterprise service bus or over the Internet is a data stream. What we have seen from the previous discussions is the desire to be able to query and analyse multiple data streams to identify correlated event patterns. So the question is how we do that. The answer lies in complex event processing technology. There are now several products on the market in this space. Some examples include:

  • Aleri

  • Corel8

  • IBM (3 event processing products)

    • Cognos Now!
    • WebSphere Business
    • InfoSphere Streams (for very high volumes of events)

  • Progress Apama
  • SeeWhy

     
  • Streambase

  • Tibco BusinessEvents

  • ThinkAnalytics IES

  • Truviso

Given the potential for huge volumes of events, CEP requires data to be read in memory before it ever reaches disk. In fact the requirement is to be able to query continuous streams of event data while the events “fly by” so to speak. This means that time series is important. For example, you may need to calculate a moving average over the last 5 minutes. The 5 minute time period is known as a time window. Streaming query language operators can query and manipulate continuous streams of event data within these windows. Today, CEP vendors often use their own languages to read streaming data. For example, IBM InfoSphere Streams uses a language called SPADE. Programs, therefore, need to be developed in this language and deployed to execute over multiple blade servers. Standards are emerging however to query time-series data as it moves in real time. The possible emerging standard is called StreamSQL, but this is not yet widely adopted or ratified. Nevertheless, vendors like Corel8, IBM, Oracle, Streambase, and Truviso are all working on the StreamSQL standard.

StreamSQL is a variant of standard SQL and is specifically designed to express processing over continuous streams of time-series data. Stream SQL can be used to perform SQL-style processing on the incoming messages as they fly by, without necessarily storing them as SQL does. StreamSQL extends SQL by adding to it rich windowing constructs and stream-oriented operations. To accommodate this new language, new tools are emerging to develop applications with StreamSQL.
Following is an example:

SELECT t.symbol, avg (T.price)
FROM StockTick (policy: maintain last 2O seconds where symbol = “HSBC”) T
Group by t.symbol

With this kind of technology, it potentially becomes possible to use CEP to filter high volume events so that only events of interest are passed on to simple event processing technologies for analysis, alerting and/or action taking. Again, as an example, IBM InfoSphere Streams can filter events and feed them into IBM Cognos Now! or WebSphere Business Events for subsequent analysis and action taking (e.g., alerts). Equally, Tibco Business Events could feed Tibco BusinessFactor in a similar fashion.

CEP holds hold a lot of promise. The challenge for most of us is to figure out how to leverage this technology to optimise business operations.

Recent articles by Mike Ferguson

 

Comments

Want to post a comment? Login or become a member today!

Posted 16 September 2009 by Gaurav Agarwal

I saw your article. It talks about Cognos Now! and WebSphere Business Monitoring as BAM solutions provided by IBM. But, could you please tell me why two BAM solutions from IBM or is Cognos Now! still a operational BI solution? Could you please tell me the difference and similarity between the two?

Is this comment inappropriate? Click here to flag this comment.