Blog: Rick Barton Subscribe to this blog's RSS feed!

Rick Barton

Hello and welcome to my blog. I am delighted to blog for the BeyeNETWORK, and I'm really looking forward to sharing some of my thoughts with you. My main focus will be data integration, although I don't plan to restrict myself just to this topic. Data integration is a very exciting space at the moment, and there is a lot to blog about. But there is also a big, wide IT and business world out there that I'd like to delve into when it seems relevant. I hope this blog will stimulate, occasionally educate and, who knows, possibly even entertain. In turn, I wish to expand my own knowledge and hope to achieve this through feedback from you, the community, so if you can spare the time please do get involved. Rick

About the author >

Rick is the director (and founder) at edexe. He has more than 20 years of IT delivery experience, ranging from complex technical development through to strategic DI consultancy for FTSE 100 companies. With more than 10 years of DI experience from hands-on development to architecture and consulting and everything in between, Rick’s particular passion is how to successfully adopt and leverage DI tools within an enterprise setting. Rick can be contacted directly at rick.barton@edexe.com.

Following on from my last blog about what next for DI, I have been wondering what the effects of current technologies could have on the data model in the near future. 

There are a few technology areas in particular that I see eroding the need for a data model namely; 

Massively Parallel Processing (MPP) databases; these databases have redefined the speed that queries can be executed with net effect of this change is that the database is more "forgiving" of less than perfectly modelled data.

Data virtualisation; virtualisation enables queries that will access many input data sources without the need for the data to be instantiated, therefore removing the need for a formal model.  Instead tailored views are created for the particular end user requirement.  In addition there is a class of tools that while not quite virtualisation tools do enable rapid access of flat file data via SQL.  These are more specialised, however worthy of note.

Dynamic warehousing;   These products store the data independently of the reporting model, such that changes to the model do not require changes to the underlying  tables.  A good example of this is Kalido's Dynamic Information Warehouse (DIW) technology.    In addition Kalido also drive the product via a semantic business model, rather than a traditional data model.

Profiling;  Many data profilers can infer relationships between fields in the same or even different files, thereby enabling keys to be identified across the data.  Composite software have an interesting product (Discovery) that not only enables the keys to be identified but also then to "fix" those relationships such that they can then be used within queries.

One thought I have had, and one which is very interesting is a match up between profiling and MPP databases; an intuitive warehouse, if you will.  In this technology a new data source is simply added to the database as a new table.  This table is then profiled against the existing tables and the key relationships are then discovered.  The relationships are confirmed and then the table is then usable for querying.  The need for the MPP database is that this "unstructured model" would be more inefficient than a traditional "designed" model, so the MPP engine's horsepower would be needed in order to provide result sets in a timely fashion.

So as you can see, current technologies are allowing a relaxation of the rules around data modelling and are challenging the accepted warehouse solution stack by allowing rapid query design without the use of a model. 

I don't think the data model is going anywhere just yet, but I do think it's place at the very heart of the warehouse will be challenged in the coming years. 


Posted July 27, 2009 8:22 PM
Permalink | No Comments |

Leave a comment