Blog: Andy Hayler Subscribe to this blog's RSS feed!

Andy Hayler

Welcome to my blog!

About the author >

Andy Hayler is one of the world’s foremost experts on master data management. Andy started his career with Esso as a database administrator and, among other things, invented a “decompiler” for ADF, enabling a dramatic improvement in support efforts in this area.  He became the youngest ever IT manager for Esso Exploration before moving to Shell. As Technology Planning Manager of Shell UK he conducted strategy studies that resulted in significant savings for the company.  Andy then became Principal Technology Consultant for Shell international, engaging in significant software evaluation and procurement projects at the enterprise level.  He then set up a global information management consultancy business which he grew from scratch to 300 staff. Andy was architect of a global master data and data warehouse project for Shell downstream which attained USD 140M of annual business benefits. 

Andy founded Kalido, which under his leadership was the fastest growing business intelligence vendor in the world in 2001.  Andy was the only European named in Red Herring’s “Top 10 Innovators of 2002”.  Kalido was a pioneer in modern data warehousing and master data management.

He is now founder and CEO of The Information Difference, a boutique analyst and market research firm, advising corporations, venture capital firms and software companies.   He is a regular keynote speaker at international conferences on master data management, data governance and data quality. He is also a respected restaurant critic and author (www.andyhayler.com).  Andy has an award-winning blog www.andyonsoftware.com.  He can be contacted at Andy.hayler@informationdifference.com.

 

Founded in 1993, Trillium Software has been the largest independent data quality vendor for some years, nestling since the late 1990s as a subsidiary of US marketing services company Harte Hanks. The latter was once a newspaper company dating back to 1928, but switched to direct marketing in the late 1990s. It had overall revenues of $495 million in 2015. There was clearly a link between data quality and direct marketing, since name and address validation is an important feature of marketing campaigns. However the business model of a software company is different from a marketing firm, so ultimately there was always going to be a certain awkwardness in Trillium living under the Harte Hanks umbrella.

On June 7th 2016 the parent company announced that it had hired an advisor to look at ”strategic alternatives” for Trillium, including the possibility of selling the company, though the company’s announcement made clear that a sale was not a certainty. Trillium has around 200 employees and a large existing customer base, so will have a steady income stream from maintenance revenues. The data quality industry is not the fastest growing sector of enterprise software, but is well established and quite fragmented. As well as offerings from Informatica, IBM, SAP and Oracle (all of which were based on acquisitions) there are dozens of smaller data quality vendors, many of them having grown up around the name and address matching issue that is well suited to at least a partially automated solution. While some vendors like Experian have focused traditionally on this problem, other vendors such as Trillium have developed much broader data quality offerings, with functions such as data profiling, cleansing, merge/matching, enrichment and even data governance.

There is a close relationship between data quality and the somewhat faster growing sector of master data management (MDM), so MDM vendors might seem in principle to be natural acquirers of data quality vendors. However MDM itself has somewhat consolidated in recent years, and the big players in it like Informatica, Oracle and IBM all market platforms that combine data integration, MDM and data quality (though in practice the degree of true integration is distinctly more variable than it appears on Powerpoint). Trillium might be too big a company to be swallowed up by the relatively small independents that remain in the MDM space. It will be interesting to see what emerges from this exercise. Certainly it makes sense for Trillium to stand on its own to feet rather than living within a marketing company, but on the other hand Harte Hanks may have missed the boat. A few years ago large vendors were clamouring to acquire MDM and related technologies, but now most companies that need a data quality offering have either built or bought one. The financial adviser in charge of the review may have to be somewhat creative in who it looks at as a possible acquirer.


Posted June 9, 2016 1:22 PM
Permalink | No Comments |

I recently attended the Informatica World event in San Francisco, which drew over 3,000 customers and partners. One key announcement from an MDM perspective was the availability of Informatica MDM for the cloud, called MDM Cloud Edition. Previously Informatica had a Salesforce application only cloud offering via an acquisition in 2012 of a company called Data Scout. This is the first time that the main Informatica MDM offering has been able to be deployed in the cloud, including on Amazon AWS. It is an important step, as moving MDM to the cloud is a slow but inevitable bandwagon and recently start-ups like Reltio, designed from scratch as cloud offerings, have been able to offer cloud MDM with little real competition. The Informatica data quality technology will apparently be fully cloud-ready by the end of 2016.

The company launched a product called Intelligent Streaming. This connects lots of data sources and distributes the data for you e.g. a demo showed data from several sources being streamed to a compute engine using Spark, or Hadoop if you prefer, without needing to code. This approach shields some of the underlying complexity of the Big Data environment from developers. Live Data Map is part of the Informatica infrastructure and is a way to visualise data sources both on premise or cloud. Its also does scheduling in a more sophisticated way than at present, using machine learning techniques.

There were plenty of external speakers, both customers and partners. Nick Millman from Accenture gave a talk about trends in data management, and referred back to his first assignment at a ”global energy company” (actually Shell, where I first met him), in which the replication of an executive dashboard database involved him flying from London to The Hague with a physical tape to load up onto a server in Rijswijk. Unilever gave a particularly good talk about their recent global product information management project, in which the (business rather than IT) speaker described MDM as ”character building” – hard to argue there.

There were new executives on display, in particular Jim Davis as head of marketing (ex SAS) and Lou Attanasio as the new head of sales (ex IBM).
With Informatica having recently gone private, it will be comforting for their customers that the company is investing as much as ever in its core technology, and certainly in MDM the company reckons it has more developers than Oracle, IBM and SAP combined, though such claims are hard to verify. However there certainly seems to be plenty of R&D activity going on related to MDM judging by the detailed sessions. Examples of additional new developments were accelerators and applications for pharmaceuticals, healthcare and insurance.

Informatica continues to have one of the leading MDM technologies at a time when some of its large competitors appear to be losing momentum in the marketplace for assorted reasons, so from a customer perspective the considerable on-going R&D effort is reassuring. Its next major R&D effort will be to successfully blend the two current major MDM platforms that they have (acquired from Siperian and Heiler), something that their large competitors have singularly failed to achieve thus far with their own acquired MDM technologies.


Posted June 6, 2016 1:34 PM
Permalink | No Comments |

Informatica just announced their Big Data Management solution V10, the latest update to their flagship suite of technology. The key objective for this is to enable customers to design data architectures that can accommodate both traditional database sources and newer Big Data ”lakes” without needing to get swim too deeply in the world of MapReduce or Spark.

In particular, the Live Data Map offering is interesting, a tool that builds a metadata catalog as automatically as it can. Crucially, this is updated continuously rather than being a one-off batch exercise, the bane of previous metadata efforts, which can quickly get out of date. It analyses not just database system tables but also semantics and usage, so promises to chart a path through the complexity of today’s data management landscape without need for whiteboards and data model diagrams.

V10 extends the company’s already fairly comprehensive ability to plug into a wide range of data sources, with over 100 pre-built transformations and over 200 connectors. By providing a layer of interface above the systems management level, a customer can gain a level of insulation from the rapidly changing world of Big Data, with its bewildering menagerie of technologies, some of which disappear from current fashion almost as soon as you have figured out where they fit. Presenting a common interface across traditional and new data sources enables organisations to minimise wasted skills investment.

As well as quite new features such as Live Data Map, there are an array of incremental updates to the established technology elements of the Informatica suite, such as improved collaboration capability within the data quality suite, and the ability of the data integration hub to span both cloud and on-premise data flows. A major emphasis of the latest release is performance improvement, with much faster data import and data cleansing.

With Informatica having recently gone private, it will be comforting for their customers that the company is investing as much as ever in its core technology, as well as adding new and potentially very useful new elements. The data management landscape is increasingly fragmented and complex these days, so hard pressed data architects need all the help that they can get.


Posted November 16, 2015 3:08 PM
Permalink | No Comments |

The Informatica World 2015 event in Las Vegas was held as the company was in the process of being taken off the stock market and into private ownership by private equity firm Permira and a Canadian pension fund. The company was still in its quiet period so was unable to offer any real detail about this. However my perception is that one key reason for the change may be that the company executives can see that there is a growing industry momentum towards cloud computing. This is a challenge to all major vendors with large installed bases, because the subscription pricing model associated with the cloud presents a considerable challenge as to how vendors will actually make money compared to their current on-premise business model. A quick look at the finances of publicly held cloud-only companies suggest that even these specialists have yet to really figure it out, with a sea of red ink in the accounts of most. If Informatica is to embrace this change then it is likely that it’s profitability will suffer, and private investors may offer a more patient perspective than Wall Street, which is notoriously focused on short-term earnings. It would seem to me that there is unlikely to be any real change of emphasis around MDM from Informatica, given that it seems to be their fastest growing business line.

On the specifics of the conference, there were announcements for the company around its major products, including its recent foray into data security. The most intriguing was the prospect of a yet to be delivered product called “live data map”. The idea is to allow semantic discovery within corporate data, and allow end-users to vote on how reliable particular corporate data elements are, rather as consumers vote for movies on IMDB or rate others on eBay. The idea is that this approach may be particularly useful as companies have to deal with “data lakes” where data will have little or none of the validation applied to it that would (in theory) be the case with current corporate systems. The idea is tantalising but this was a statement of direction rather than a product that was ready for market.

The thing that I found most useful was the array of customer presentations, over a hundred in all. BP gave an interesting talk about data quality in the upstream oil industry, which has typically not been a big focus for data quality vendors (there is no name and address validation in the upstream). Data governance was a common theme in several presentations, clearly key to the success of both master data and data quality projects. There was a particularly impressive presentation by GE Aviation about their master data project, which had to deal with very complex aeroplane engine data.

Overall, Informatica’s going private should not have any negative impact on customers, at least unless its executives end up taking their eye off the ball due to the inevitable distractions associated with new ownership.


Posted May 16, 2015 11:25 AM
Permalink | No Comments |

The Teradata Universe conference in Amsterdam in April 2015 was particularly popular, with a record 1,200 attendees this year. Teradata always scores unusually high in our customer satisfaction surveys, and a recurring theme is its ease of maintenance compared to other databases. At this conference the main announcement continued this theme with the expansion of its QueryGrid, allowing a common administrative platform across a range of technologies. QueryGrid can now manage all three major Hadoop implementations, MapR, Cloudera and HortonWorks, as well as its own Aster and Teradata platforms. In addition the company announced a new appliance, the high-end 2800, as well as a new feature they call the software-defined warehouse. This allows multiple Teradata data warehouses to be managed as one logical warehouse, including allow security management across multiple instances.

The conference had its usual heavy line-up of customer project implementation stories, such as an interesting one by Volvo, who are doing some innovative work with software in their cars, at least in the prototype stage. For example in one case the car sends signals to any cyclists with a suitably equipped helmet, using a proximity alert. In another example the car can seek out spare parking spaces in a suitably equipped car park. A Volvo now has 150 computers in it, generating a lot of data that has to be managed as well as creating new opportunities. Tesla is perhaps the most extreme example so far of cars becoming software-drive, in their case literally allowing remote software upgrades in the same way that occur with desktop computers (though hopefully car manufacturers will do a tad more testing than Microsoft in this regard). The most entertaining speech thatI saw was by a Swedish academic, Hans Rosling, who advises UNICEF and the WHO and who gave a brilliant talk about the world’s population trends using extremely advanced visualisation aids, an excellent example of how to display big data in a meaningful way.


Posted April 23, 2015 11:24 AM
Permalink | No Comments |

Yesterday Informatica announced that it was being bought, not by a software firm but by a private equity company Permira. At £5.3 billion, this values the data integration vendor at over five times the billion dollar revenue that Informatica saw in 2014, compared to an industry average of 4.4 recently. This piece of financial engineering will not change the operational strategy for Informatica. Rather it is a reflection of a time when capital is plentiful and private equity firms are feeling bullish about the software sector. Tibco and Dell have followed a similar route. Company managers will not have to worry about quarterly earnings briefings to pesky financial analysts, and will instead be accountable only to their new owners. However, private equity firms seek a return on their investment, usually leveraging plenty of debt into such deals (debt is tax efficient compared to equity), and can be demanding of their acquisitions. From a customer viewpoint there is little to be concerned about. One exit for the investors will be a future trade sale or return to the stock market, so this deal does not in itself change the picture for Informatica in terms of possible acquisition by a bigger software company one day.


Posted April 8, 2015 9:55 AM
Permalink | No Comments |

At a conference in Lausanne in June 2014 SAS shared their current business performance and strategy. The privately held company (with just two individual shareholders) had revenues of just over $3 billion, with 5% growth. Their subscription-only license model has meant that SAS has been profitable and growing for 38 years in a row. 47% is Americas, 41% from Europe and 12% from Asia Pacific. They sell to a broad range of industries, but the largest in terms of revenue are banking at 25% and government at 14%. SAS is an unusually software-oriented company, with just 15% of revenue coming from services. Last year SAS was voted the second best company globally to work for (behind Google), and attrition is an unusually low 3.5%.

In terms of growth, fraud and security intelligence was the fastest growing area, followed by supply chain, business intelligence/visualisation and cloud-based software. Data management software revenue grew at just 7%, one of the lowest rates of growth in the product portfolio (fraud management was the fastest growing). Cloud deployment is still relatively small compared to on-premise but growing rapidly, expected to exceed over $100 million in revenue this year.

SAS has a large number of products (over 250), but gave some general update information on broad product direction. Its LASR product, introduced last year, provides in-memory analytics. They do not use an in-memory database, as they do not want to be bound to SQL. One customer example given was a retailer with 2,500 stores and 100,000 SKUs that needed to decide what merchandise to stock their stores with, and how to price locally. They used to analyse this in an eight-hour window at an aggregate level, but can now do the analysis in one hour at an individual store level, allowing more targeted store planning. The source data can be from traditional sources or from Hadoop. SAS have been working with a university to improve the user interface, starting from the UI and trying to design to that, rather than producing a software product and then adding a user interface as an afterthought.

In Hadoop, there are multiple initiatives to apply assorted versions of SQL to Hadoop from both major and minor suppliers. This is driven by the mass of skills in the market with SQL skills compared to the relatively tiny number of people that can fluently program using MapReduce. Workload management remains a major challenge in the Hadoop environment, so a lot of activity has been going on to integrate the SAS environment with Hadoop. Connection is possible via Hive QL. Moreover, SAS processing is being pushed to Hadoop with Map Reduce rather than extracting data. A SAS engine is placed on each cluster to achieve this. This includes data quality routines like address validation, directly applicable to Hadoop data with no need to export data from Hadoop. A demo was shown using the SAS Studio product to take some JSON files, do some cleansing, and then use Visual Analytics and In-Memory Statistics to analyze a block of 60,000 Yelp recommendations, blending this with another recommendation data set.


Posted June 6, 2014 11:53 PM
Permalink | No Comments |

At Informatica World in Las Vegas recently the company made a number of announcements. The label ”Intelligent Data Platform” was used to encompass the existing technology plus some new elements around metadata. The key new element was software that helped end-users to provision data directly in an attempt to unblock the traditional bottleneck when IT and business users. The software allows a business user to search for available data using business terms, and then presents to the users the best matches for that, including showing the source systems. The system presents an interface to the user and can capture the actions the user takes in selecting data in the form of workflow steps, that can later be automated by IT if appropriate. This certainly demonstrated well, and seems to have had some happy early adopters.

Separately, Informatica announced a data security product. Secure@Source that will be an early application of the Intelligence Data Platform. A demo included an attractive looking ”heat map” showing data sensitivity, proliferation and levels of risk, was based on prior DI mappings between sources and targets defined in PowerCenter. . The obvious issue here is whether Informatica’s sales force understands the specialised security market, and whether customers will perceive it is a natural brand in an area that it not currently perceived to be associated with by many, although to be fair it already has offerings in PDM, DDM and test management.

There was plenty of discussion around Big Data at the conference, and partners such as Cloudera spoke on the subject as well as staff from the company. Certainly the scale of data now can be vast, with Facebook apparently having 500 petabytes to manage. The company has several initiatives in this area linking to Hadoop. Certainly all that data will have to be managed somehow, so companies with core strengths in integration and data quality ought in principle to be able to carve out a place in the Big Data world, which at present still seems very formative and immature in general. The Vibe Data Stream product is clearly aimed at this new type of data, such as that generated by sensors.

Financially, Informatica seems back on track after the issues of 2012, and seems to have had a good quarter. One intriguing thing is just how significant MDM is now to Informatica – a whole day at the conference was devoted to MDM, and although the company does not break out software sales by product line, it was clear that MDM is both the fastest growing segment for it, and now is a significant chunk of its new license revenues. The $130 million acquisition of Siperian may in retrospect seem to be money well spent. Version 10 is the next major release, due out in late 2014.

The company is clearly investing heavily at present, with 17% of its spend going into R&D at the moment. As the company seeks to maintain growth against a backdrop where the core integration market is maturing, the main challenge for the company would seem to me to be whether its sales staff, used to selling integration software to IT folks, can adapt to selling the new products effectively, some of whom are aimed at business people.


Posted May 22, 2014 9:47 AM
Permalink | No Comments |

At a recent analyst briefing Pitney Bowes executives shared a number of aspects of the current business. The company is an intriguing one, with deep heritage in the US postal business. The company dates back to 1920, and was the inventor of the franking machine. With 16,000 employees it is a major corporation. In more recent times Pitney Bowes has built up a quite large software business, largely through acquisition of both data quality software (Group 1) and GIS software (MapInfo). They do not break out their revenues by business line publicly, but their software business is a lot larger than most people realize. After a period of share underperformance from 2010 to 2012, the stock price has more than doubled since 2012 under its new CEO (Marc Lautenbach joined Pitney Bowes January 2013).

On the software side, Pitney Bowes has had a good reputation for address validation, bolstered by its ability to enrich location data through its full-fledged GIS capabilities, which is a capability that other data quality vendors do not have. However, this potential synergy has not always borne fruit, with a generic sales force not always able to put these value propositions together. The company has a reputation as something of a sleeping giant. A recent change of sales leadership has resulted in changed commission structures and more specialist sales staff being recruited, with some ambitious software sales targets being set for 2014.

A sign that some of this positioning may actually been bearing fruit has been seen in two recent deals in particular, both in the area of geocoding and reverse geocoding i.e. working out exact latitude and longitude from an address, or vice versa. This capability is important in the world of social media as the world increasingly interacts via mobile devices rather than desktops. Recently both Facebook and Twitter have signed major license deals to use the geocoding capabilities of Pitney Bowes. For example when a tweet has a location tag, this uses the Pitney Bowes software. Such high profile endorsements, along with other recent deals such as ones with INRIX (a crowd sourced traffic data company), are potentially very significant. To be able to sign up Facebook and Twitter adds a great deal of credibility to any company claiming location capabilities.

One interesting case study at the conference was that of a UK local authority. A recent boom in construction work in the south east of the UK, but a shortage of affordable accommodation, has resulted in a broad influx of frequently illegal migrant workers from abroad. These building workers are often accommodated in squalid ”beds in sheds” at the bottom of residential gardens. These extra residents are not included in official figures and pay no property tax, yet create a burden on local authority services. This particular local authority used reconnaissance planes with infrared cameras to compare human activity at night in the borough with the notional housing. They used land registry data to compare official properties with the number of people actually living in residential areas, often finding considerable quantities of people residing in places where no houses exist, and in these cases avoiding paying local taxes. They were able to use this data, combined via the Pitney Bowes GIS software, to direct police to follow up on concentrations of illegal accommodation, with apparently significant results.

In software terms, the latest GIS release supports 64 bit processing, allowing real time drill down: a demo was shown of zooming from a picture of the earth from orbit directly down to Mount Fuji (as an example) with no redraw delay; this brings the GIS software in this aspect into line with its main competitor ESRI.

Within data quality, Pitney Bowes has street level geocoding support for 122 countries, an unusually deep level of coverage. The company claim that all 25 top US property insurers uses its software. Recently they have added an MDM solution based on a graph database, a logical extension given their considerable customer base in data quality. The unusual database underpinning should in principle allow some interesting analytics e.g. around connections to customer such as which products they use, but also the sphere of influence of a customer, a type of analysis that that a graph database should be able to accommodate more easily than a relational one. However, progress in the market here has been rather pedestrian thus far, and a challenge for 2014 will be to see whether they can develop early MDM sales into a solid cadre of reference customers. Certainly it is always challenging for a new piece of software, even an innovative one, to be marketed against relatively mature and entrenched solutions from large vendors. The coming year will show whether the sales force changes and greater marketing investment around this MDM solution will result in significantly enhanced market share. In our most recent Landscape research, Pitney Bowes customers were amongst the top 7 happiest data quality customers.


Posted May 8, 2014 3:41 PM
Permalink | No Comments |

I recently attended a Teradata conference in Prague. In our regular Landscape research The Information Difference consistently find that Teradata has some of the happiest customers of any data warehouse vendor. In the last four years in a row their customers have been in the top two spots in our survey for overall highest satisfaction. Moreover, this is based on a large sample of customers. This hard survey data is backed up by anecdotal discussion at their events.

At the recent conference Teradata made three significant announcements. At present their architecture encompasses three technical platforms: the traditional relational, the analytical database they acquired via Aster, and Hadoop, where they have partnered with Hortonworks. Their approach is to layer their software around these platforms, allowing customers to deploy on whichever combination is most appropriate. The Teradata Querygrid allows a single SQL query to be orchestrated across systems without moving the data. Certainly as a concept this will be appealing to many customers.

It also announced the Active Enterprise Data Warehouse 6750 platform, aimed at the highest end use cases, claiming to be able to handle up to 61 petabytes of data. Certainly Teradata has dozens of customers in its ”petabyte club”, so its on-going investment here will be welcome to those with the ultra-high volumes of data. The core database itself received an upgrade in the form of Teradata Database 15, which allows users to run analytical queries across multiple systems as well as run non-SQL languages within the database, and supports JSON (the low overhead alternative to XML) data. This last is aimed at the increasingly important area of sensor data and embedded processor data.

Overall, Teradata continues to be a major player at the high end of the data warehouse market. It has actively embraced newer technologies e.g. the multi-processing columnar approach of Aster, and more recently with Hadoop, going well beyond paying lip service to the newer analytic approaches. Customers with especially demanding workloads should certainly consider its capabilities.


Posted April 27, 2014 12:57 AM
Permalink | No Comments |