Big Data – Same Problems?

Originally published 13 July 2011

A recent (June 2011) IDC Digital Universe study found that the world's data is doubling every two years – faster than Moore's Law. It reckoned that 1.8 zettabytes (1.8 trillion gigabytes) will be created and replicated in 2011, enterprises will manage 50 times more data, and files will grow 75 times more in the next decade. (Do you have any idea how much data 1.8 zettabytes really is? It’s about the same amount of data if every person in the world sent twenty tweets an hour for the next 1200 years!)

The “big data” phenomenon is driving transformational, technological, scientific, and economic changes; and "information taming" technologies are driving down the cost of creating, capturing, managing and storing information.

We’ve all seen how organisations have an insatiable desire for more data as they believe this information will radically change their businesses. They are right, but data by itself is useless. Only the effective exploitation of this vast mountain of data – using business intelligence to convert it into helpful information, knowledge and applied decision making – will help it reach its true potential.

The problem is that big data analytics push the limit of traditional data management. Allied to this, the most complex big data problems start with huge volumes of data in disparate stores with high volatility of data. However, big data problems aren’t just about volume; there’s also the volatility of the data sources and rate of change, the variety of the data formats, and the complexity of the individual data types themselves. So is it always the most appropriate route to pull all this data into yet another location for its analysis? 

Unfortunately, many organisations are constrained by traditional data integration approaches that can slow adoption of big data analytics. Approaches that can provide high performance data integration to overcome data complexity and data silos will be those that win. These approaches need to integrate the major types of “big data” into the enterprise. The typical “big data” sources include:

  • Key/value data stores such as Cassandra

  • Columnar/tabular NoSQL data stores such as Hadoop and Hypertable

  • Massively parallel processing appliances such as Greenplum and Netezza  

  • XML data stores such as CouchDB and MarkLogic

Fortunately, approaches such as data federation and data virtualisation are stepping up to meet this challenge.

Finally and of utmost importance is managing the quality of the data. What’s the use of this vast resource if its quality and trustworthiness are questionable? Thus, driving your data quality capability up the maturity levels, as listed in Figure 1, is key.

Figure 1: Data Quality Maturity – 5 levels of maturity (© IPL)

  • Chris BradleyChris Bradley

    Christopher Bradley has spent almost 30 years in the data management field, working for several blue-chip organisations in data management strategy, master data management, metadata management, data warehouse and business intelligence implementations.  His career includes Volvo as lead database architect, Thorn EMI as Head of Data Management, Reader's Digest Inc as European CIO, and Coopers and Lybrand’s Management Consultancy where he established and ran the International Data Management specialist practice. During this time, he worked upon and led many major international assignments including data management strategies, data warehouse implementations and establishment of data governance structures and the largest data management strategy undertaken in Europe. 

    Currently, Chris heads the Business Consultancy practice at IPL, a UK based consultancy and has been working for several years with many clients including a British HQ’d super major energy company.  Within their Enterprise Architecture group, he has established data modelling as a service and has been developing a group-wide data management strategy to ensure that common business practices and use of master data and models are promoted throughout the group.  These have involved establishing a data management framework, evangelising the message to management worldwide, developing governance and new business processes for data management and developing and delivering training. He is also advising other commercial and public sector clients on information asset management.

    Chris is a member of the Meta Data Professionals Organisation (MPO) and DAMA, and has archived Certified Data Management Professional Master status (CDMP Master). He has recently co-authored a book Data Modelling For The Business –  A Handbook for Aligning the Business with IT Using High-Level Data Models. You can reach him at Chris.Bradley@ipl.com.

Recent articles by Chris Bradley

 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!