Decisions at Your Service DBMSs Under Pressure

Originally published 7 March 2007

As the adoption of service-oriented architecture (SOA) infrastructure takes hold in many organisations, the one thing we can be sure of is that we will get thrown into the deep end of real-time event-driven and on-demand business. In the world of business intelligence (BI), that means event-driven BI and on-demand BI.

The good news is that the latest releases of most of the major BI platforms in the market are already in good shape to play in a service-oriented world with support for Web services and event-driven data integration either already shipping or about to ship. The other thing that you can be sure of is that everything including the kitchen sink is going to end up on an enterprise service bus (ESB). We are set for the onslaught of an event storm that I recently described in one of my blogs (see: The Perfect Event Storm is Coming) where an ESB is not a “drip, drip” messaging backbone but something more akin to a full on fire hose. As more and more software becomes service-enabled, the traffic on the service bus will increase significantly, and data volumes are bound to increase yet again. When you look down on this transition from 1,000 feet, you could say that we are basically evolving toward continuous operation where the river of XML messages carrying data payloads constantly flows through our enterprise. This means that at least to some extent we are going to start seeing a reduction in “start/stop” batch processing in our everyday operations, and it is superseded by continuous messaging. No doubt sooner or later the “water level” on the ESB river of XML will find its level and the transition will be complete. However, for now there is a lot to do.

With respect to business intelligence in this real time message-oriented world, several things can be guaranteed. The first is that the number of users of business intelligence is set to rapidly increase as we offer BI queries, cubes, reports, scoring models, etc. as Web services to any and all applications, portals and processes that require business intelligence on demand. The consequences of this action are that we have to scale. Server-based BI tools, data integration servers, the DBMS servers – all of them have to scale to manage the increase in the concurrent user base. Bear in mind that a user of business intelligence in an SOA does not have to be a human user; it could equally be an application. Some of these applications requesting on-demand BI could themselves be offered to thousands of users over the Internet. When you think about it like this, you can see that the user base is potentially set to explode.

In this much larger user base, there is no guarantee that people know how to use BI tools, have time to use BI tools or even have time to analyse the data to make a decision. Skills could be at all kinds of different levels. In fact, the requirement may go even further in that they may even want the decisions themselves to be made for them. In other words, it would make sense in specific areas of a business to guide and assist people by using analysis and automated decisions and recommendations to help them keep the business running optimally. There may be several valid reasons for this. It may be that the nature of their job function (role) in the enterprise means that they do not have:

  • The time to analyse data to make certain decisions

  • Access to all the information needed to make certain decisions

  • Sufficient understanding of all the business rules and expertise needed to make the right decisions

In addition, with the arrival of radio frequency identification (RFID) and high transaction rates spawning a flurry of events, it may simply be too expensive to hire hundreds and hundreds of business analysts to constantly analyse – hoping that they will spot everything. I have a client with 65 million transactions per day happening on a 7x24 basis. If things start going wrong in a volume of this size, how can a business guarantee that its business analysts will see the problem in time and react in time to stop disruption or to avoid customer dissatisfaction? There has to come a point when no matter how many people you throw at the problem, you are still not going to see everything. So there has to be a better way whereby business can use software to create agents that continuously and automatically analyse low latency event data in order to detect problems and opportunities amid the blizzard of events happening throughout the enterprise. The idea again is to automatically make decisions to keep business optimal. In this case, business benefit will likely be at its maximum when the latency of the data, the time taken to analyse it and the time taken to decide are as close to zero as they can get.

Looking at these requirements, it is clear that event-driven and on-demand data integration, automated analysis and automated rules-based decisioning are all going to be high on the agenda. What is even more powerful is if you link data integration, automated analysis and automated rules-based decisioning together, and turn this whole thing into a decision service (Figure 1). It is especially powerful if the decision services can be triggered to execute when an event or set of events occur and can also be invoked on demand. I have talked at length about this in a previous article (see: Techniques for Integrating Business Intelligence into The Enterprise – Part 4). Event-driven decision services offer rapid response and therefore rapid actions to keep operations optimal. On-demand decision services offer recommendations and clear-cut decisions to guide people and dynamically guide process execution. It is possible that you could also put these together in the sense that an event-driven decision service could take action, such as alerting people, and also recommend to a person what actions they should take.



Figure 1: Creating Decision Services

On-demand and event-driven decision services that can be deployed in an SOA can be built in several ways. This can be achieved by using:

  1. A data integration tool that supports event-driven data integration workflows and that can publish these workflows as services.

  2. A predictive analytics model-building tool that that supports workflow, event-driven execution of the model and that can invoke data integration and rules engine services from within a model workflow.

  3. A process modelling tool that can invoke data integration services, scoring or predictive analytics services and rules engine services as part of a business process execution language (BPEL) process.

  4. A DBMS that can trigger scoring models to execute as soon as data is loaded into the database and that can trigger a rules engine to execute on completion of scoring.

  5. A specialist product that has built-in data integration, automated analysis and decision rules capability. 

All of these options are available in the marketplace. Examples (chosen at random) of these are shown in the table in Figure 2. Please note that this is not in any way an exhaustive list, and my apologies to any vendor whose products are not mentioned. These examples are chosen simply to illustrate the implementation options mentioned.

Figure 2: Technology Options and Examples


I have written in detail on how you build decision services using options 1, 3, and 5 in my article Building Intelligent Agents Using Business Activity Monitoring. At the time of writing that article (December 2005), I referred to what we were building as “intelligent agents.” Today perhaps a better term is decision services. In summary, with option 1 in Figure 2, the ETL job calls a scoring model and a rules engine service from within its workflow. Then the whole ETL workflow is itself published as a decision service to support on-demand decisions and recommendations (please see Figure 3 in the aforementioned article). Note that the data warehouse is a source to an ETL job in this case, along with perhaps the most recent transactional data and the targets set in the business strategy defined to a CPM tool. Taking data on metric targets into automated analysis means that it can compare actual performance versus set targets to see if the company is on track to achieve its goals. This option caters for the situation where all the data is not necessarily in the data warehouse at the time the automated analysis and decisioning is triggered.

Option 2 in Figure 2 is also possible to do from a data mining model-building tool that supports workflow. In this case, there are a lot of similarities to the first option. The main difference is that the scoring model workflow is also event driven and, therefore, executes as soon as an event of the nature it is looking for appears on the ESB. Data integration and rules engines are invoked from within the scoring model workflow itself. The entire scoring model workflow is then published as a decision service. This is shown in Figure 3.


Figure 3: Event-Driven Scoring Model Workflow

Please refer to Figures 4 and 5 in my article my article Building Intelligent Agents Using Business Activity Monitoring to see how a decision service can be built with a process modelling tool.

Creating event-driven decision services in a DBMS can be done by creating a model in a specific modelling tool and then using that tool to generate the model in something called predictive modelling markup language (PMML). This is an industry standard supported by most data mining tools. DBMSs such as IBM DB2, Oracle and Teradata can import PMML models into the DBMS and manage their execution on data in the DBMS. The DBMS treats the model as a user-defined function and can execute it in any SQL statement including dynamic SQL, SQL in the logic of a database trigger, SQL in a stored procedure, stored queries, SQL in a relational view or in a materialised view and even X/query. Using the trigger mechanism, it means that as soon as data is placed in a database, a trigger can execute a scoring model to automatically analyse that data. This is attractive in that it means that the data is low latency (i.e., we analyse it as soon as it arrives). It also means that the data required for automatic analysis needs to be in the DBMS. In addition, the output scores and other variables from the scoring model can be placed in another database table. This/these output table(s) could themselves have triggers on them, which means that a trigger can fire as soon as a score is generated and cause the invocation of a Web service like a rules engine (see Figure 4). The engine then looks at the scores generated from automatic analysis (and any other variables deemed necessary) and makes an automatic decision. It is this technique that forms the basis of the Teradata Active Data Warehouse, which uses the Teradata Warehouse Miner (or models built in other mining tools). The only question mark against this approach is how an on-demand decision service would work as opposed to an event-driven one. An on-demand request would simply have to issue a request to the rules engine, which in turn would need to invoke the scoring model via a SQL query that invokes the model as a user defined function in the SQL.


Figure 4: Automated Decisioning within the DBMS

Most enterprises have a need to support both batch and on-demand automated decisioning. If all the data is on the data warehouse system, then it clearly makes sense to deploy scoring models in the DBMS and undertake batch scoring leveraging the full power of a parallel DBMS. In the case of on-demand decision services, the data may not all be in the data warehouse. This is increasingly likely to be the case as the demand for lower latency data is pushed closer and closer to real time.

What Does This Mean for a Data Warehouse DBMS Server?
Setting aside how you build decision services, let’s assume they are deployed in an SOA (see Figure 5). If we go back to what SOA is about to unleash on a BI system as discussed at the beginning of this article, it really makes you think. Is the BI system really ready for this? Can we handle the increased user base requesting on-demand decisions, the event storms, the automated analysis, the 7x24 operational applications requesting BI on demand and insisting the BI system is 7x24...etc., etc.?

Figure 5: Decision Services in a an SOA

Think about all those events, all the users that we just invited to “come on in” and use our BI systems by offering on-demand BI services, data integration services, scoring services, and decision services. Add this to the “classic” use of BI with business analyst reporting, OLAP analysis, CPM users with their scorecards and dashboards accessing the data. Oh, and, of course, there is the event-driven ETL processing for low latency data that just got added to ETL batch processing, etc. For those of you who are running materialised views on your database servers to calculate and cache summary metrics, what happens when underlying data is being constantly loaded by event-driven ETL processing? Well, it means that your metrics need to be recalculated to rebuild the summary metrics in the materialised views. Demand for this recalculation to happen much more frequently is going to go up. Consequently, those queries in materialised views are going to start re-executing much more frequently as the underlying data changes. Then comes the big question: What technology is underpinning all of this in the BI system? The answer, of course, is the DBMS. One thing is for sure. If we think our DBMSs are under pressure today, we haven’t seen anything yet. This technology is massively important to our success in SOA and in on-demand and event-driven decision services.

Looking at this makes me realise that we had better start setting aside time to stress test scalability again. However, this time we need to look at DBMS scalability and BI platform scalability on a whole new level if we are going to be ready for the onslaught that’s coming. Fundamentally, DBMS workload management on our BI systems is set to become absolutely mission critical if it isn’t already. Also, scalability is now a lot more than data volumes (which are still going to increase, by the way). It is also about what we know about our workloads and how much of our workload is predictable and how much is unpredictable. It is also about concurrent users, data latency, query complexity and schema complexity. Fundamentally, if you are planning to offer decision services and BI services in an SOA and also opening up event-driven data integration, event-driven automatic analysis and event driven decisioning, I believe it is necessary to test this whole thing again, rather than just assume it’s all okay. We need to look at workload peaks and troughs, workload partitioning, to separate and prioritise data integration loading, materialised view query execution and its frequency, scoring model execution, on-demand reporting and analysis, online administration, etc., etc. Best to be safe rather than sorry and head over a cliff like a lemming.

Recent articles by Mike Ferguson

 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!