Oops! The input is malformed! Data Life Cycle – A First Step in Maintaining Your Data Warehouse by Krish Krishnan - BeyeNETWORK UK


 

Data Life Cycle – A First Step in Maintaining Your Data Warehouse

Originally published 3 April 2008

In the transaction processing world, data archiving is a system design feature. When you move the same data to your data warehouse, rarely do you have a compelling need to have an archival strategy. How much of the data in your data warehouse do you (or does your business) really use every day? I bet it is less than 1%. Then, why do you need to keep all this data online in a data warehouse? Single version of truth, compliance, audit, fraud detection or business needs – whatever your driver, the value of old data should be evaluated.

Old data is a boat anchor, whether you are a wholesale member’s only club with a high volume of transactions or a wireless company with call detail record (CDR) data. The quality of information stored in the data warehouse is valuable only for a period of time in its current state, and for a future period in a summarized state.

Most data warehouses that are in use today were built to satisfy certain data and reporting requirements. If those requirements are no longer valid, how do you really understand the business value of this data, and how do you manage the life cycle of this data?

Data life cycle value determination:

  1. Conduct a survey of your business users to determine the percentage of the data in the data warehouse they are using.

  2. Develop and publish a data usage report and educate the business users on the issues being caused by the extra data, including:
    • Storage costs – the extra data is using extra storage.
    • Service level agreement (SLA) issues – the extra data is affecting loading data and running reports, causing SLA issues.
    • Metadata management issues.
    • Master data maintenance issues.

  3. Ascertain organizational alignment on the business value of the legacy data.

Data retention requirements and storage strategies:

  1. Determine the data retention requirements.

  2. Determine the metadata requirements of the data in your data warehouse that will be archived.

  3. Determine the data archival process – can you store this data on/offline and purge the data warehouse?

  4. Determine the reloading strategy or integration strategy for the data to be in on/offline storage.

Implement an archival program.

Online/Offline Storage – If the legacy data needs to be accessed readily, then online storage of the data is essential. If legacy data does not need immediate access, then offline storage of the data is advised.

  1. Evaluate the options – How will the online storage be implemented?
    • Do you have a platform for implementations?
    • What is the cost of infrastructure?
    • What are alternative choices?
  2. Plan the archival strategy.
    • What is the right time for the data warehouse outage?
    • How long will it take to execute the archival process?
    • Where is the identified metadata stored?
    • Where is the identified master data stored?
    • How are the master data and metadata layers archived?
    • Is there any database space reclamation needed?
    • Are there index rebuilds required?
  3. Implement the archival process.
  4. Verify that data is accessible after archiving:
    • Master data is easily restorable.
    • Metadata is easily restorable.
    • Legacy data can be integrated when needed.
    • The reloading of the data has minimal impact on the data warehouse.

Traditional offline storage is becoming more expensive both to implement and restore the data. One of the emerging trends in the data management area is the consideration of using the data warehouse appliance as an alternative storage platform. Next month’s article will cover this topic.

In conclusion, understanding the value of the data in the data warehouse and managing the data life cycle is a first step in managing the health of your data warehouse.

SOURCE: Data Life Cycle – A First Step in Maintaining Your Data Warehouse

  • Krish KrishnanKrish Krishnan
    Krish Krishnan is a worldwide-recognized expert in the strategy, architecture, and implementation of high-performance data warehousing solutions and big data. He is a visionary data warehouse thought leader and is ranked as one of the top data warehouse consultants in the world. As an independent analyst, Krish regularly speaks at leading industry conferences and user groups. He has written prolifically in trade publications and eBooks, contributing over 150 articles, viewpoints, and case studies on big data, business intelligence, data warehousing, data warehouse appliances, and high-performance architectures. He co-authored Building the Unstructured Data Warehouse with Bill Inmon in 2011, and Morgan Kaufmann will publish his first independent writing project, Data Warehousing in the Age of Big Data, in August 2013.

    With over 21 years of professional experience, Krish has solved complex solution architecture problems for global Fortune 1000 clients, and has designed and tuned some of the world’s largest data warehouses and business intelligence platforms. He is currently promoting the next generation of data warehousing, focusing on big data, semantic technologies, crowdsourcing, analytics, and platform engineering.

    Krish is the president of Sixth Sense Advisors Inc., a Chicago-based company providing independent analyst, management consulting, strategy and innovation advisory and technology consulting services in big data, data warehousing, and business intelligence. He serves as a technology advisor to several companies, and is actively sought after by investors to assess startup companies in data management and associated emerging technology areas. He publishes with the BeyeNETWORK.com where he leads the Data Warehouse Appliances and Architecture Expert Channel.

    Editor's Note: More articles and resources are available in Krish's BeyeNETWORK Expert Channel. Be sure to visit today!

Recent articles by Krish Krishnan

 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!