Oliver Ratzesberger, Senior Director of Architecture and Operations, described how eBay has adopted an Analytics as a Service (AaaS) approach internally to support its agile data warehousing.
eBay was founded in September of 1995 and has a global presence in 39 markets. eBay has 276 million registered users, revenues of $1.5B, trade volumes of $2,000 per second, and 637 million new listings (Q4 of 2007). They hosted 532,000 stores, most expensive item of which was a private jet for several million dollars. A diamond ring is sold every second and an automobile every minute.
This activity generated more than 50 PB per day or about one terabyte per second. They run an active-active cluster that is always online 24x7x396 with 99.9+% availability. The analytics core runs at two data centers in Phoenix and Sacramento. There is a lot of data moving between the centers to keep them in synch. A custom data link was developed to compress their data stream into only tens of terabytes per day. They use MicroStrategy, Business Objects, and Unica on front end, while Ad Initio and Informatica are on back end.
eBay has become an analytics-driven company. This is far beyond the usually marketing and finance metrics. Analytics is embedded in our daily life. Oliver said, “We think and live analytics. And know how to avoid analysis paralysis.”
Key performance indicators (KPI) are all about aligning performance objectives of both individuals and departments with corporate goals. Example of a KPI is in technology operations. One metric deals with parallel efficiency (i.e., how well is the workload distributed). With their 10,000 servers (oh!), the PE did averaged around 50%. Once this metric was monitored, they were able to raise it from 50% to 80% resulting in the saving of millions in operational expense.
Designing for the Unknown: 85% of eBay analytical workload is NEW and unknown. In other words, exploration is the core of an analytical company. Known metrics come cheap. However, unknown metrics are expensive but also high potential ROI. Therefore, design cannot be static or dependent on specific questions or dimensions.
Staffing needs for the Teradata DW is only one DBA for each data center. However, that staffing must be 24x7 implying a requirement for at least seven DBAs. We decided to out-sourced this function.
Proliferation of Analytics: The objective of ad hoc exploratory analytics is to shortened time to value. This has lead to everyone crying, “I want my own data mart”. Oliver asserted, “A data mart cannot be cheap enough to justify its existence.” There are lots and lots of hidden costs, mainly in people time. It is a data mart dilemma!
The solution: Agile analytics needs Analytics as a Service. Take a massive scale analytical utility computing. Bring your data and perform your analytics. It is scary to allow thousands of user to do their own stuff. Example: Max is a simple portal that shows some key metrics. Gives a simple web-based table upload. The other end of the service is a fully private utility access. About 75 are operating at any one time. They call these projects PET (prototyping environment) and is also known as a sandbox. PETs are free! Can setup in a few hours, only 5-7 users, cannot share data with other systems, and 90-day time boundary. Users are reminded that their PET is not production. This is a prototype.
The benefit of a PET is time to value, which is often in days rather than months. Enables users easily try out new ideas and Fail-Fast. And, this eliminates stray data marts. We also have a resource allocation model. For all 70+ business units, we estimated the resource budget. All computing is attributed to those budgets. We have been able to saturate the system at high utilization throughout the day, aiding in enterprise capacity planning. We run the entire system in a well balanced way. Surprisingly PET environments did not screw up. Using the simple limits in Teradata TASM allows you to control and kill run-away queries.
Find out more at Oliver's technology blog.
During the Q&A, the following issues were discussed:
How does SAS play in this environment? (from Richard Winter) With the newer releases of SAS tools, we are trying to push processing back into the DW and keep the data in DW.
What has been your experience with PETs so far? We started 2.5 years ago and started small. Since it is free, everyone wants one. We do not open or close PETs . We have 75 active at any time with as many as 100. As an example, we rolled out a certain capability in 9 months that would have normally taken several years. We tried many metrics that did not work, but found several that have given eBay 30% efficiency gains. The goal is finding the few PETs that are really big impacts.
Is eBay going to offer AaaS externally? Maybe someday, but not now. This could be of value to many companies, but we cannot say anything more at this time.
How do you get a Fail-Fast culture to work? You absolutely need top executive support. Once you get a critical mass, it then starts to spread. It is not easy to initiate. You cannot walk into a random company and tell everyone to Fail-Fast. Won’t work.
[Blog stream from the Teradata Partners Conference is here.]