Blog: Andy Hayler Subscribe to this blog's RSS feed!

Andy Hayler

Welcome to my blog!

About the author >

Andy Hayler is one of the world’s foremost experts on master data management. Andy started his career with Esso as a database administrator and, among other things, invented a “decompiler” for ADF, enabling a dramatic improvement in support efforts in this area.  He became the youngest ever IT manager for Esso Exploration before moving to Shell. As Technology Planning Manager of Shell UK he conducted strategy studies that resulted in significant savings for the company.  Andy then became Principal Technology Consultant for Shell international, engaging in significant software evaluation and procurement projects at the enterprise level.  He then set up a global information management consultancy business which he grew from scratch to 300 staff. Andy was architect of a global master data and data warehouse project for Shell downstream which attained USD 140M of annual business benefits. 

Andy founded Kalido, which under his leadership was the fastest growing business intelligence vendor in the world in 2001.  Andy was the only European named in Red Herring’s “Top 10 Innovators of 2002”.  Kalido was a pioneer in modern data warehousing and master data management.

He is now founder and CEO of The Information Difference, a boutique analyst and market research firm, advising corporations, venture capital firms and software companies.   He is a regular keynote speaker at international conferences on master data management, data governance and data quality. He is also a respected restaurant critic and author (www.andyhayler.com).  Andy has an award-winning blog www.andyonsoftware.com.  He can be contacted at Andy.hayler@informationdifference.com.

 

November 2010 Archives

I attended some interesting customer sessions at the Netezza user group in London yesterday, following some other good customer case studies at the Teradata conference in the rather sunnier climes of San Diego. Once common thread that came out from some sessions was the way that the use of appliances changes the way in which companies treat ETL processing. Traditionally a lot of work has gone into taking the various source systems for the warehouse. defining rules as to how this data into be converted into a common format, then using an ETL tool (like Informatica or Ab Initio etc) to carry out this pre-processing before presenting a neatly formatted file in consistent form to be loaded into a warehouse.

When you have many terabytes of data then this pre-processing in itself can become a bottleneck. Several of the customers I listened to at these conferences had found it more efficient to move from ETL to ELT. In other words they load essentially raw source data (possibly with some data quality checking only) into a staging area in the warehouse appliance, and then write SQL to carry out the transformations within the appliance before loading up into production warehouse tables. This allows them to take advantage of the power of the MPP boxes they have purchase for the warehouse, which are typically more efficient and powerful than using regular servers that their ETL tools run on. This does not usually eliminate the need for the ETL tool (though one customer did explain how they had switched off some ETL licences) but means that much more processing is carried out in the data warehouse itself.

Back in my Kalido days we found it useful to take this ELT approach too, but for different reasons. It was cleaner to do the transformations based on business rules stored in the Kalido business model, rather than having the transformations buried away in ETL scripts, meaning more transparent rules and so lower support effort. However I had not appreciated that the sheer horsepower available in data warehouse appliances suits ELT for pure performance reasons. Have others found the same experience on their projects? If so then post a comment here.


Posted November 19, 2010 2:53 PM
Permalink | No Comments |