In the transaction processing world, data archiving is a system design feature. When you move the same data to your data warehouse, rarely do you have a compelling need to have an archival strategy. How much of the data in your data warehouse do you (or does your business) really use every day? I bet it is less than 1%. Then, why do you need to keep all this data online in a data warehouse? Single version of truth, compliance, audit, fraud detection or business needs – whatever your driver, the value of old data should be evaluated.
Old data is a boat anchor, whether you are a wholesale member’s only club with a high volume of transactions or a wireless company with call detail record (CDR) data. The quality of information stored in the data warehouse is valuable only for a period of time in its current state, and for a future period in a summarized state.
Most data warehouses that are in use today were built to satisfy certain data and reporting requirements. If those requirements are no longer valid, how do you really understand the business value of this data, and how do you manage the life cycle of this data?
Data life cycle value determination:
- Conduct a survey of your business users to determine the percentage of the data in the data warehouse they are using.
- Develop and publish a data usage report and educate the business users on the issues being caused by the extra data, including:
- Storage costs – the extra data is using extra storage.
-
Service level agreement (SLA) issues – the extra data is affecting loading data and running reports, causing SLA issues.
-
Metadata management issues.
-
Master data maintenance issues.
-
Ascertain organizational alignment on the business value of the legacy data.
Data retention requirements and storage strategies:
- Determine the data retention requirements.
-----
Related content from the BeyeNETWORK
Industry leaders in the insurance industry are using business intelligence to produce breakthrough business results.
View this web seminar to get the details.
-----
- Determine the metadata requirements of the data in your data warehouse that will be archived.
- Determine the data archival process – can you store this data on/offline and purge the data warehouse?
- Determine the reloading strategy or integration strategy for the data to be in on/offline storage.
Implement an archival program.
Online/Offline Storage – If the legacy data needs to be accessed readily, then online storage of the data is essential. If legacy data does not need immediate access, then offline storage of the data is advised.
- Evaluate the options – How will the online storage be implemented?
- Do you have a platform for implementations?
- What is the cost of infrastructure?
- What are alternative choices?
- Plan the archival strategy.
- What is the right time for the data warehouse outage?
- How long will it take to execute the archival process?
- Where is the identified metadata stored?
- Where is the identified master data stored?
- How are the master data and metadata layers archived?
- Is there any database space reclamation needed?
- Are there index rebuilds required?
- Implement the archival process.
- Verify that data is accessible after archiving:
- Master data is easily restorable.
- Metadata is easily restorable.
- Legacy data can be integrated when needed.
- The reloading of the data has minimal impact on the data warehouse.
Traditional offline storage is becoming more expensive both to implement and restore the data. One of the emerging trends in the data management area is the consideration of using the data warehouse appliance as an alternative storage platform. Next month’s article will cover this topic.
In conclusion, understanding the value of the data in the data warehouse and managing the data life cycle is a first step in managing the health of your data warehouse.
-
Krish Krishnan
Krish is a recognized expert worldwide in the strategy, architecture and implementation of high performance data warehousing solutions. He is a visionary data warehouse thought leader and an
independent analyst, writing and speaking at industry leading conferences, user groups and trade publications. He has authored two eBooks, more than 75 articles, viewpoints and case studies
on business intelligence, data warehousing, and data warehouse appliances and architectures. In his 19 plus years of professional experience, he has been solving complex architecture problems
spanning all aspects of data warehousing and business intelligence for Fortune 1000 clients. He has designed and tuned some of the world’s largest data warehouses.
He is the president of Sixth Sense Advisors Inc., a Chicago-based company providing management, strategy, technology and analyst consulting services in data warehousing and business intelligence.
He teaches regularly at TDWI, DAMA, IRM UK and other conferences, and is helping drive and mature the data warehouse appliance market. Krish also serves as Associate Vice President of Programs for
DAMA Chicago and is Ethics and Governance Advisor to DAMA International.
Editor's note: More articles, resources, news and events are available in Krish's BeyeNETWORK Expert Channel.
Be sure to visit today!
Recent articles by Krish Krishnan
Comments
Want to post a comment? Login or become a member today!
Be the first to comment!