Blog: Richard Hackathorn Subscribe to this blog's RSS feed!

Richard Hackathorn

Welcome to my blog stream. I am focusing on the business value of low latency data, real-time business intelligence (BI), data warehouse (DW) appliances, use of virtual world technology, ethics of business intelligence and globalization of business intelligence. However, my blog entries may range widely depending on current industry events and personal life changes. So, readers beware!

Please comment on my blogs and share your opinions with the BI/DW community.

About the author >

Dr. Richard Hackathorn is founder and president of Bolder Technology, Inc. He has more than thirty years of experience in the information technology industry as a well-known industry analyst, technology innovator and international educator. He has pioneered many innovations in database management, decision support, client-server computing, database connectivity, associative link analysis, data warehousing, and web farming. Focus areas are: business value of timely data, real-time business intelligence (BI), data warehouse appliances, ethics of business intelligence and globalization of BI.

Richard has published numerous articles in trade and academic publications, presented regularly at leading industry conferences and conducted professional seminars in eighteen countries. He writes regularly for the BeyeNETWORK.com and has a channel for his blog, articles and research studies. He is a member of the IBM Gold Consultants since its inception, the Boulder BI Brain Trust and the Independent Analyst Platform.

Dr. Hackathorn has written three professional texts, entitled Enterprise Database Connectivity, Using the Data Warehouse (with William H. Inmon), and Web Farming for the Data Warehouse.

Editor's note: More articles, resources, news and events are available in Richard's BeyeNETWORK Expert Channel. Be sure to visit today!

Recently in 20081012TD-Partners Category

TD-P%20logo.jpgI am back in the office, escaped Las Vegas (at least for a week), and reflecting on the conference.

My first thought is that there was so much I missed. The last session was Three Crabby Old Men Predicting the Future of BI. I missed it because I had a plane to catch! aaaarrrrgggggg Someone please send me an audio of that session, please!

My second thought is that there were so many colleagues with whom I missed talking. The BI network surrounding Teradata (and other key vendors) is awesome! It is an exciting time for the BI community. There is a blending of ecosystems into a global community of professionals. Oh, how I wish that TDWI was a non-profit professional association. We really need an ACM-like entity to glue us together professionally.

A third thought is for the next generation of BI professionals. There are a lot of crabby old folks (like me) who are slowing down and need to pass the baton to the next generation. I know...speak for myself, since many of my 'old' friends were at those crap tables until 3am. The 30-ish BI professionals that I know are amazing - energy, enthusiasm, caring. We (the crabby old BI folks) need to provide them a springboard so that they can stand on our shoulders.

Finally, I am concerned about BI technology. We now have more power than we know how to manage. In some ways, it was good that we were constantly fighting the technology in the old days. It kept us focused on priorities and limited our ability to really screw things up. No more do we have that luxury! The issues surrounding the proper application of BI technology are deep and largely undiscussed. Is it ethical to do deep data mining on our customers? Is that not the same as a X-ray machine that reveals our intimate body parts at the public airport?

So, those are a few reflects from yet another BI conference. Bye until the next conference (in a week).

[Blog stream from the Teradata Partners Conference is here.]


Posted October 17, 2008 7:25 AM
Permalink | No Comments |

TD-P%20logo.jpgI spent several hours roaming the exhibit floor, mainly catching up with old friends at various companies. A new company that appear surprisingly was founded by an old friend - Andrew Cardno of former Compudigm fame. Andrew is quite a visionary when it comes to data visualization. In our conversation, he said that his goal is to make "the data warehouse a piece of art." If you know Andrew, you realize that this statement was made with total passion and conviction, void of any humorous connotation.

We all know that visual perception is our strongest sensory function accounting for 70% of all the information we perceived from the real world. So, what's new?

Andrew is blasting us with information in artfully done visual designs that highlight key dimensions, like time, spatial, hierarchies, and the like. He said, "Data should explode in our minds." In one glance, we should visualize both the overview and detail inherent in the data.

For the most part, the visualization consists of the raw data, cleanse and structured as it should be in any proper EDW. It is not images of aggregations or other analytics whose computations are hidden from the perceiver. Every bit hangs out there in the open! Maybe Andrew should rename his tool as the Data Nudist Colony. No clothing allowed on those bits! View six years of XXX revenue data in a glance.

Take a look at Andrew's new company BIS2 (Business Intelligence Systems Software). It is strictly a GP-rated site. They are still emerging, which is a nice way of saying that the product/service is not shipping. However, they have several beta sites in operation. And for a LV-size fee, they will performed a two-week magic show just for you. This is one company to watch.

[Blog stream from the Teradata Partners Conference is here.]


Posted October 16, 2008 8:23 AM
Permalink | No Comments |

TD-P%20logo.jpg

Speakers were Stephen Brobst, CTO of Teradata, and Paul Kent, VP of Platform R&D for SAS Institute. I was especially curious about his session because of the visibility placed on the SAS partnership at this Teradata conference. This was an overview of joint development work between Teradata and SAS. Where they are going, and why we should be excited about it?

There are technology barriers that limited our ability to use advanced analytics. We need to stop copying/moving the data and eliminate the need of a Ph.D. to analyze the data.

The solution is to do in-database processing. If you have lots of data, keep it there! The old world has separate data marts and MOLAP cubes. The new world is to leverage SQL and parallel database engines. Teradata is extending its features because SQL is limited for advanced analytics. SAS algorithms are packaged up into user-defined functions/methods/types along with external stored procedures (Java). In addition, extensions are being added for geospatial, encryption, XML publishing. And, all this must scale linearly and efficiently.

Version 13 will have: new table type with no primary index (distributed uniformly like a deck of cards to avoid hot AMPs), efficiency in UDF, and the like. I would have like to hear more on these v13 features.

TD-P%20SAS.JPGStephen and Paul went through a comparison of old/new world for SAS analytics. Teradata-awareness inside a SAS Proc implied that there was additional logic to execute SQL to return the results to the SAS Proc. Another example is SAS Scoring Accelerator where SAS Enterprise Miner that generates the C code and publishes a UDF to be executed inside Teradata.

SAS is the first strategic partner for Teradata's In-Database initiative. There is a joint product roadmap and joint R&D engagement, along with SAS/Teradata Center of Excellence and SAS/Teradata Executive Customer Advisory Board. The result of this partnership is to reduce the time to build and deploy models from months to a week. From SAS perspective, there is lots of data on a grid computing cloud! We need to focus on moving the work to the data, not the opposite.

Another approach is to put the SAS processor physically close to the Teradata database. For example, the 2550 has space in the standard rack to place an application processor inside.
Teradata 13 will support SAS with: variant data type, input data ordering context area, fast work tables, and expanded table headers. Future useful Teradata features to support SAS analytics are: intrinsic functions, pivot/unpivot SQL operators, and tens of thousands of columns.

Paul noted that an important new feature to support SAS is input data ordering. An example was given for a Proc Forecast embedded into SQL as a UDT.

Stephen wrapped up with comments that there really is a deep collaboration with the R&D teams. It has been a challenge to the Teradata folks to think different. He remarked that, if you told me you needed 10,000 columns, he would immediately feel that the data model is broken. It is driving unique and messy stuff into the database. The goal is to do your analytics faster and easier.

My take on this... This is cool! Often these partnerships are merely a superficial marketing alliance, but this is at a deep level. This is the most significant presentation that I heard at this conference. It was a glimpse at the next generation of BI analytics that eliminates the technical and cultural barriers separating the database and analysis communities. We all must be one with BI. The smarts of SAS can not remain hidden in the back cubicles; those smarts must be relocated to the front desk touching customers and to the loading docks touching suppliers. And, the power of Teradata must enabled that relocation.

[Blog stream from the Teradata Partners Conference is here.]


Posted October 15, 2008 10:40 PM
Permalink | 1 Comment |

TD-P%20logo.jpgSpeakers were Imad Birouty, Program Marketing Manager for Teradata, and Jim Blair, Sr. Manager of DW Operations for Blue Cross Blue Shield. The session explored the relationship of data integration and business value. The key questions were: What is the value of an integration environment? Then, what is the cost of the infrastructure required to realize that value?

There are three options for providing enterprise analytics: no integration with separate data marts, some integration with mixtures of data marts and EDW, single EDW with one common data model.
Benefits are derived from cost savings, efficiency/optimization, biz opportunity. Better decision making should be based on the right data at the right time. Of course, this begged the issue of how do you determine right.

Limited business value is illustrated by two triangles that do not overlap. If the data is integrated, the triangles will overlap, resulting in an increase in the number of questions that can be answered. You need to see the illustration to appreciate the concept.

Upon questioning, Imad noted that the numbers were derived from a detailed enterprise logical model so that there is specific questions behind each of those numbers. Imad extended his analysis with a Data Overlap Analysis, which provided insights into the effort to add new applications by leveraging existing data elements.

Jim continued by covering a TCO (total cost of ownership) comparison of DW platforms among Teradata, IBM, Oracle, and an unnamed DW appliance vendor (whose name starts with N). Jim gave a credible detailed explanation of the infrastructure of his company, but it felt somewhat as a pro-Teradata sales pitch. However, the cost categories would be useful to other companies struggling with a credible TCO estimate for their DW infrastructure.

My take on this. . . Too general and needs specific examples. And the other hand, time was limited and quite a few slides were omitted in the talk. I urged them to extent this work by attributing business value to each question so that the benefit of cross-function integrated data is not just the number of questions to be answered, but a subjective estimate of the importance to the company.

[Blog stream from the Teradata Partners Conference is here.]


Posted October 15, 2008 10:38 PM
Permalink | No Comments |

TD-P%20logo.jpgWayne Eckerson, director of TDWI Research, talked on The Myth of Self-Service BI: Balancing Ad Hoc and Tailored Delivery to Achieve Widespread BI Adoption.

He feels that he is taking on a big sacred cow in the BI industry. Self-service BI (SS-BI) is good, promising to release users from their data prison by opening the doors on the data warehouse. DW is like a black hole that sucks up all data. Users were excited, but old habits are tough to change. Traditional DW has not been able to deliver. So, give them self-service BI tools! Right? Any problems? Yes!

TD-P%20Wayne.JPGWayne defines SS-BI as. . . empower users to create their own reports so users get what they want when they want it without having to ask IT.

What do user really want? Is it to form queries to retrieve, or to analyze data to make decision? Should be the latter. The reality is that there is lots of crud that conforms user to the way that BI tools work, rather than how users do their work. Only 20% are power users who are plugged in. The other 80% of users are out in the cold with: cannot find right report, inconsistent data, slow response time, and too complex to use.

There are are two types of SS-BI: ad hoc report creation versus ad hoc report navigation. Wayne applying systems theory to shifting the burden. He suggested that we need tailored delivery for ad hoc report navigation. Characteristics are: tailored to specific group of users, ability to personalize, and as performance dashboard. Each sandbox should contain about 20 dimensions and 12 metrics. This can replace hundreds of traditional reports.

Wayne offered the MAD Sandbox framework, which consists of:

Monitor via graphical data for managers
Analyze via summarized data for analysts
Drill-Thru via detailed data for worker bees

At Cisco Systems, they applied the MAD Sandbox. Often companies flip the pyramid upside down where everyone is drowning in detail data. To work, IT must to totally involved with MAD. As a company matures with this framework, they will embrace Double MAD, which extends MAD to:

Model
Advanced Analytics
Decide and Do

As a gut check, it is Self-Service BI or Self Serving BI? Be honest!

My take on this... I like the simple MAD framework. However, it hides two deep problems.

First, how do users get education on the business semantics embedded in a MAD sandbox. It is constantly changing. Users are shifting job responsibilities. New employees are hired. And, acquisitions bring in whole new crowds of users who come with totally different business semantics.

Second, what is the mechanism that enables consensus again and again on business semantics. Yes, this is a data governance issue. But, everyone must be involved far beyond the governance board.

[Blog stream from the Teradata Partners Conference is here.]


Posted October 15, 2008 10:34 PM
Permalink | No Comments |

TD-P%20logo.jpgThe keynote started with a couple of fun videos kicking off the morning. After a fast ride through the conference hall, the chair, Tobi Zappe, delivered (out of breath) a good intro for Lance Armstrong.

Lance began by discussing his bout with cancer when he was 25 years old. He was diagnosed with advanced testicular cancer that had spread to his lungs and brain. The treatment involved brain surgery, which Lance admitted was the lowest point of his life. He concluded that his life had to get better.

TD-P%20Lance.JPGLance tried to come back to the racing circuit. However, the major racing teams considered him as damaged goods. After months of no interest from the teams to sponsor him on a racing team, the US Postal Service made him an offer. It was a wild card! This team was a disaster in terms of its chance of winning, but Lance took the same approach as battling cancer. He attitude was that it is great to win. However, it is really really bad to lose. As with cancer, victory is life, and losing is death.

Nike proposed a program to sell 5 million yellow arm bands at one dollar apiece, which Lance felt was a stupid idea. Since 70 million bands have been sold, Lance now said that it is not such a bad idea. All these millions of people wanted to be connect to this symbol of this disease. The beauty (and power) of the yellow band is the huge community of people supporting this cause.

As the treatments were finishing, one of Lance's doctor challenged Lance to be a public cancer survivor whom will share your story. At that time, he did not imagine that his commitment would involve the Tour or the foundation. He admitted that his cancer is the reason that he is a winner! Fighting cancer is the motiviation, not to win but to spread the story of surviving cancer.

The U.S. is spending only $6B per year on a cancer cure. One person will die every minute from cancer. Cancer kills 8M per year around the world. Whichever presidential candidate supports the fight on cancer is the one for whom Lance will vote.

Lance is planning a come back! He has been off the bike competitively for over three years. This message about cancer must go around the world. The best way to spread the message is to train on the bike around the world. Whether Lance wins again or not, his message will be heard.

My take on this. . . My wife has had a recent bout with breast cancer. All is turning out well in her situation. However, I now appreciate the depth and pervasiveness of cancer's impact on our society.. Despite his past and regardless of his future, Lance is effectively championing a good cause.

[Blog stream from the Teradata Partners Conference is here.]


Posted October 15, 2008 7:22 PM
Permalink | No Comments |

TD-P%20logo.jpgThis afternoon I interviewed three persons as podcasts for the BeyeNETWORK.

First, Ernie Loomis, director of Logical Data Models, talked about his work in the Communications industry. I was impressed that Teradata has been refining this LDM through eight versions over eight years.

Second, Paul Longhurst, director of data warehousing with OverStock.com, described his DW architecture, along with challenges and future plans. The real time data streaming from many data sources into the EDW placed serious demands on the system.

And finally, Barb Wixom, Associate Professor for the McIntire School of Commerce at the University of Virginia, talked about educating students in business intelligence. Teradata has been supporting universities through the Teradata University Network (TUN). Barb described how two thousand colleges and universities have used this resources for access to large data sets, analytic tools, case studies, curriculum, whitepapers, and the like.

The complete set of BeyeNETWORK podcasts can be found here.

[Blog stream from the Teradata Partners Conference is here.]


Posted October 14, 2008 5:12 PM
Permalink | No Comments |

TD-P%20logo.jpg

Claudia Imhoff, president of Intelligent Solutions, presented on “Faster, Must Go Faster: The Need for Operational BI”. She started by showing the differences among Strategic, Tactical and Operational BI. This three-way classification (especially shown as a pyramid) is a classic in organizational theory.

Claudia used a a three-way division of the decision cycle as suggested by Barry Devlin of 9Sight : Measure, Evaluate, Decide/Act (MEDA). She elaborated by showing this cycle as a loop with the goal to reduce the latency in taking action.

Claudia discussed the work of Colin White and Judy Davis on a BeyeNETWORK research study on Operational BI. She then went through several examples of Operational BI and then suggested steps for getting started with Operational BI. First, pick a feasible project. Second, develop an architectural that is service-oriented. Third, present information in a proactive manner. Finally, ensure a solid infrastructure for data integration.

[Blog stream from the Teradata Partners Conference is here.]


Posted October 14, 2008 5:06 PM
Permalink | No Comments |

TD-P%20logo.jpgOliver Ratzesberger, Senior Director of Architecture and Operations, described how eBay has adopted an Analytics as a Service (AaaS) approach internally to support its agile data warehousing.

eBay was founded in September of 1995 and has a global presence in 39 markets. eBay has 276 million registered users, revenues of $1.5B, trade volumes of $2,000 per second, and 637 million new listings (Q4 of 2007). They hosted 532,000 stores, most expensive item of which was a private jet for several million dollars. A diamond ring is sold every second and an automobile every minute.

This activity generated more than 50 PB per day or about one terabyte per second. They run an active-active cluster that is always online 24x7x396 with 99.9+% availability. The analytics core runs at two data centers in Phoenix and Sacramento. There is a lot of data moving between the centers to keep them in synch. A custom data link was developed to compress their data stream into only tens of terabytes per day. They use MicroStrategy, Business Objects, and Unica on front end, while Ad Initio and Informatica are on back end.

eBay has become an analytics-driven company. This is far beyond the usually marketing and finance metrics. Analytics is embedded in our daily life. Oliver said, “We think and live analytics. And know how to avoid analysis paralysis.”

Key performance indicators (KPI) are all about aligning performance objectives of both individuals and departments with corporate goals. Example of a KPI is in technology operations. One metric deals with parallel efficiency (i.e., how well is the workload distributed). With their 10,000 servers (oh!), the PE did averaged around 50%. Once this metric was monitored, they were able to raise it from 50% to 80% resulting in the saving of millions in operational expense.

TD-P%20Oliver.JPGDesigning for the Unknown: 85% of eBay analytical workload is NEW and unknown. In other words, exploration is the core of an analytical company. Known metrics come cheap. However, unknown metrics are expensive but also high potential ROI. Therefore, design cannot be static or dependent on specific questions or dimensions.

Staffing needs for the Teradata DW is only one DBA for each data center. However, that staffing must be 24x7 implying a requirement for at least seven DBAs. We decided to out-sourced this function.

Proliferation of Analytics: The objective of ad hoc exploratory analytics is to shortened time to value. This has lead to everyone crying, “I want my own data mart”. Oliver asserted, “A data mart cannot be cheap enough to justify its existence.” There are lots and lots of hidden costs, mainly in people time. It is a data mart dilemma!

The solution: Agile analytics needs Analytics as a Service. Take a massive scale analytical utility computing. Bring your data and perform your analytics. It is scary to allow thousands of user to do their own stuff. Example: Max is a simple portal that shows some key metrics. Gives a simple web-based table upload. The other end of the service is a fully private utility access. About 75 are operating at any one time. They call these projects PET (prototyping environment) and is also known as a sandbox. PETs are free! Can setup in a few hours, only 5-7 users, cannot share data with other systems, and 90-day time boundary. Users are reminded that their PET is not production. This is a prototype.

The benefit of a PET is time to value, which is often in days rather than months. Enables users easily try out new ideas and Fail-Fast. And, this eliminates stray data marts. We also have a resource allocation model. For all 70+ business units, we estimated the resource budget. All computing is attributed to those budgets. We have been able to saturate the system at high utilization throughout the day, aiding in enterprise capacity planning. We run the entire system in a well balanced way. Surprisingly PET environments did not screw up. Using the simple limits in Teradata TASM allows you to control and kill run-away queries.

Find out more at Oliver's technology blog.

During the Q&A, the following issues were discussed:

How does SAS play in this environment? (from Richard Winter) With the newer releases of SAS tools, we are trying to push processing back into the DW and keep the data in DW.

What has been your experience with PETs so far? We started 2.5 years ago and started small. Since it is free, everyone wants one. We do not open or close PETs . We have 75 active at any time with as many as 100. As an example, we rolled out a certain capability in 9 months that would have normally taken several years. We tried many metrics that did not work, but found several that have given eBay 30% efficiency gains. The goal is finding the few PETs that are really big impacts.

Is eBay going to offer AaaS externally? Maybe someday, but not now. This could be of value to many companies, but we cannot say anything more at this time.

How do you get a Fail-Fast culture to work? You absolutely need top executive support. Once you get a critical mass, it then starts to spread. It is not easy to initiate. You cannot walk into a random company and tell everyone to Fail-Fast. Won’t work.

[Blog stream from the Teradata Partners Conference is here.]


Posted October 14, 2008 1:51 PM
Permalink | No Comments |

TD-P%20logo.jpgChris Twogood and Jim Dietz talked on the Teradata product family:

- Departmental (single node) as Data Mart Edition and Data Mart Appliance,
- Scalable (multiple nodes) as Extreme Data Appliance, DW Appliance, Active Enterprise DW

Chris was proud that all editions operating on Teradata Database 12 (going to 13). Criteria to differentiate the platforms are: active (access/loads), investment protection, mixed workload mgt, availability, cross functional complexity, query concurrency, data scalability, database functionality.

Hmmm. . . What is investment protection? How does active differ from workload management?

He mentioned the Software-Only Edition that will operate on any Intel SMP platform, up to 6TB, for $40K. Have been selling this edition for years and is mostly for test and development.

Looking at the enterprise architecture as tiered, Teradata has been utilized as Tier 1 for the heavy duty EDW, but not for the lower tiers. Now Teradata can offer cost effective products for the other tiers.

Jim continued with a detailed description of the product family.

- Data Mart Appliance 551: 551 is replacing 550S and 550P.
- Extreme Data Appliance 1550: purpose-built for a small number of power users doing deep analytics on extremely large data sets, announced yesterday, smallest at 50TB to largest at 50PB (wow!), backup this HUGE data set in half a day, usually once a month.
- DW Appliance 2550: integrated rack containing processors, storage, etc, modular increments with less configuring options, optimized for fast file scans and heavy deep-dive analytics,
- Active DW 5550: three models 5550H (top of line), 5500C (coexistence with older systems and also as an entry into EDW), 5500E (entry-level for small DW).

All systems are using Linux. Has become rock solid. Need 64b addressing. Teradata Enterprise Storage is enterprise class because of high availability, modular, and optimized for high I/O rates.

Teradata is green! Jim argued that they are delivering more performance for the same energy consumption. Adopting TPERF per kW convention to measure energy efficient. Reducing the data center footprint, as measured in number of racks per 100 TPERF. Also, cooling improvement with a 25% saving.

In summary, Chris and Jim asserted, "We got you covered."

UPDATE Oct 23: Curt Monash published a comprehensive overview of Teradata's product line here.

[Blog stream from the Teradata Partners Conference is here.]


Posted October 14, 2008 1:47 PM
Permalink | 1 Comment |