Print this Story
ADVERTISEMENT
Business Intelligence Resources
Data Life Cycle – A First Step in Maintaining Your Data Warehouse
Listen to the audio version of this article
by Krish Krishnan
Published: 3 April 2008
Understanding the value of the data in your data warehouse and managing the data life cycle is a first step in managing the health of your data warehouse.

In the transaction processing world, data archiving is a system design feature. When you move the same data to your data warehouse, rarely do you have a compelling need to have an archival strategy. How much of the data in your data warehouse do you (or does your business) really use every day? I bet it is less than 1%. Then, why do you need to keep all this data online in a data warehouse? Single version of truth, compliance, audit, fraud detection or business needs – whatever your driver, the value of old data should be evaluated.

Old data is a boat anchor, whether you are a wholesale member’s only club with a high volume of transactions or a wireless company with call detail record (CDR) data. The quality of information stored in the data warehouse is valuable only for a period of time in its current state, and for a future period in a summarized state.

Most data warehouses that are in use today were built to satisfy certain data and reporting requirements. If those requirements are no longer valid, how do you really understand the business value of this data, and how do you manage the life cycle of this data?

Data life cycle value determination:

  1. Conduct a survey of your business users to determine the percentage of the data in the data warehouse they are using.

  2. Develop and publish a data usage report and educate the business users on the issues being caused by the extra data, including:
    • Storage costs – the extra data is using extra storage.
    • Service level agreement (SLA) issues – the extra data is affecting loading data and running reports, causing SLA issues.
    • Metadata management issues.
    • Master data maintenance issues.

  3. Ascertain organizational alignment on the business value of the legacy data.

Data retention requirements and storage strategies:

  1. Determine the data retention requirements.

  2. Determine the metadata requirements of the data in your data warehouse that will be archived.

  3. Determine the data archival process – can you store this data on/offline and purge the data warehouse?

  4. Determine the reloading strategy or integration strategy for the data to be in on/offline storage.

Implement an archival program.

Online/Offline Storage – If the legacy data needs to be accessed readily, then online storage of the data is essential. If legacy data does not need immediate access, then offline storage of the data is advised.

  1. Evaluate the options – How will the online storage be implemented?
    • Do you have a platform for implementations?
    • What is the cost of infrastructure?
    • What are alternative choices?
  2. Plan the archival strategy.
    • What is the right time for the data warehouse outage?
    • How long will it take to execute the archival process?
    • Where is the identified metadata stored?
    • Where is the identified master data stored?
    • How are the master data and metadata layers archived?
    • Is there any database space reclamation needed?
    • Are there index rebuilds required?
  3. Implement the archival process.
  4. Verify that data is accessible after archiving:
    • Master data is easily restorable.
    • Metadata is easily restorable.
    • Legacy data can be integrated when needed.
    • The reloading of the data has minimal impact on the data warehouse.

Traditional offline storage is becoming more expensive both to implement and restore the data. One of the emerging trends in the data management area is the consideration of using the data warehouse appliance as an alternative storage platform. Next month’s article will cover this topic.

In conclusion, understanding the value of the data in the data warehouse and managing the data life cycle is a first step in managing the health of your data warehouse.


Recent articles by Krish Krishnan

Krish Krishnan - Krish is an expert in the strategy, architecture and implementation of high performance data warehousing solutions. He is a recognized data warehouse thought leader, writing and speaking at industry leading conferences, user groups and trade publications. He is a certified Bill Inmon professional and holds all major DBA certifications. In his 18 years of professional experience, he has been solving complex solution architecture problems spanning all aspects of data warehousing and business intelligence for Fortune 1000 clients. With his “get it done” approach, he has implemented data warehouse solutions ranging up to hundreds of terabyte data volumes and drives performance tuning into existing BI/DW investments to realize greater than 90% performance gains. Krish leads the Data Warehouse Appliance Expert Channel at BeyeNETWORK.com and is helping drive and mature the data warehouse appliance market. Krish also serves as Associate Vice President of DAMA Chicago. He is a BI Principal at Daugherty Business Solutions in Oak Brook, Illinois.

Editor's note: More Krish Krishnan articles, resources, news and events are available in the Business Intelligence Network's Krish Krishnan Channel. Be sure to visit today!