Oops! The input is malformed! Why Do We Need Data Warehouse Appliances? Part 2 by Krish Krishnan - BeyeNETWORK UK


 

Why Do We Need Data Warehouse Appliances? Part 2 The Challenges of High Performance Data Warehousing

Originally published 1 November 2007

The most common challenges that constantly hit the user and management teams are as follows:

  • "My query that runs for 10 minutes in 100,000 records runs for 10 hours in 1,000,000 records."

  • "My ETL processes are slow or have failed due to large data volumes to process into the data warehouse."

  • “My CFO wants to know why we need another $500,000 for infrastructure when we invested a similar amount just 6 months ago.” 

  • “My database administration team is exhausted trying to keep the tuning process online for the databases to achieve speed.”

The key factors that drive effective data warehouse performance and challenge IT departments are data loading, availability, volume, storage and operational issues.

Loading Data

Loading data into a data warehouse is one of the longest processes in terms of time. The process of extracting various data feeds, processing them through data quality, and then data profiling and loading them with or without transformations to a final destination is time-consuming. This can be especially challenging when input volumes are low with smaller bursts of data and speed is impacted by the volume of data in the data warehouse.

Data Availability

Data availability service level agreements (SLAs) have a profound impact on the need to have a high-performance environment. End user requirements must be clearly documented for data to be pristine, integrated and seamlessly available for downstream applications like reporting and analytics. Additionally, organizations often fall short on data growth projections, data demand projections, data retention cycles and associated SLAs that have not been documented.

Data Volumes

Data volumes in the average data warehouse have been exploding by gigabytes every day. Growth rates for capturing and retaining granular details have been increasing over the past 3 years. A few reasons for this data volume explosion could include:

  • Compliance requirements

  • Legal mandates

  • Business mergers and acquisitions

  • Analytics

Storage

Disk and storage systems have consistently improved over the years, both in terms of speed and performance, while costs have been relatively stable and, in some cases, less expensive. In a traditional architecture, storage is shared across all areas of a data warehouse, making it a highly constrained area in terms of availability and performance. ETL and business intelligence (BI) queries produce large amounts of traffic and consuming a lot of space to compute the complex result sets. Shared disk architecture has never been an answer for the data warehouse.

Operational Issues

The cost of maintaining a data warehouse has become monumental in many organizations. With the need for granularity of the data growing and history retention growing, a two-way explosion has resulted in an unmanageable amount of information being processed by the data warehouse. In addition to this data volume, multiple kinds of related activities such as data mining, predictive analysis and other historical trending queries have left IT feeling numb and cold rather than energized and happy with the task of managing this infrastructure. Another indirect effect is heavy demand on resources, both in hardware and IT administration (DBA, system administrator, network administrator roles), which often leaves other tasks incomplete from a holistic perspective.

We have so far outlined the various pain points, the issues and challenges that are faced in managing and maintaining a “high performance” data warehouse. In my next article, we will start looking at why the data warehouse appliance can alleviate a lot of this pain.

SOURCE: Why Do We Need Data Warehouse Appliances? Part 2

  • Krish KrishnanKrish Krishnan
    Krish Krishnan is a worldwide-recognized expert in the strategy, architecture, and implementation of high-performance data warehousing solutions and big data. He is a visionary data warehouse thought leader and is ranked as one of the top data warehouse consultants in the world. As an independent analyst, Krish regularly speaks at leading industry conferences and user groups. He has written prolifically in trade publications and eBooks, contributing over 150 articles, viewpoints, and case studies on big data, business intelligence, data warehousing, data warehouse appliances, and high-performance architectures. He co-authored Building the Unstructured Data Warehouse with Bill Inmon in 2011, and Morgan Kaufmann will publish his first independent writing project, Data Warehousing in the Age of Big Data, in August 2013.

    With over 21 years of professional experience, Krish has solved complex solution architecture problems for global Fortune 1000 clients, and has designed and tuned some of the world’s largest data warehouses and business intelligence platforms. He is currently promoting the next generation of data warehousing, focusing on big data, semantic technologies, crowdsourcing, analytics, and platform engineering.

    Krish is the president of Sixth Sense Advisors Inc., a Chicago-based company providing independent analyst, management consulting, strategy and innovation advisory and technology consulting services in big data, data warehousing, and business intelligence. He serves as a technology advisor to several companies, and is actively sought after by investors to assess startup companies in data management and associated emerging technology areas. He publishes with the BeyeNETWORK.com where he leads the Data Warehouse Appliances and Architecture Expert Channel.

    Editor's Note: More articles and resources are available in Krish's BeyeNETWORK Expert Channel. Be sure to visit today!

Recent articles by Krish Krishnan

 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!