This BeyeNETWORK article features an interview of Chris Twogood, vice president of product and services marketing with Teradata, conducted by Phil Bowermaster, independent analyst and consultant specializing in big data and data warehousing.There is so much Teradata is doing these days that I thought it would be good to step back and talk about some of the industry trends that are driving these big changes. Maybe we could start off with this whole idea of an evolution toward an analytical ecosystem. Can you tell us what that means?Chris Twogood:
Itís really interesting. The market at large has really come to the conclusion that there is no single system that is going to do everything. You really need a cohesive analytic ecosystem that ties together different file systems and different analytic engines and brings Hadoop technology together with discovery platform technology and data warehousing technology. The challenge though is that it introduces complexity. So how do you minimize the cost and minimize the complexity in that kind of environment? Teradata has made some interesting announcements, very specifically one that helps minimize that complexity. With QueryGrid we integrate Teradata to Teradata systems, Teradata to Aster, and Teradata to Hadoop through our extended partnership with Cloudera on top of our partnership with Hortonworks. We also announced a QueryGrid Teradata to Oracle system. So you can have this cohesive environment, and the business user doesnít have to know where the data is sitting. Itís transparent, the business users work through the architecture and then Teradata does all of the heavy lifting. The market is moving toward an analytical ecosystem, and Teradata is making it easier for them to do that.The reality is that there is not one product or one technology that is going to solve all the problems so you have to be, to some extent, in the business of making all of those different parts work together.Chris Twogood:
Absolutely, and we call that orchestration. Itís not about federation because you want to be able to leverage the processing power of each of those systems so we push down processing to open source Hadoop, and then we deliver the result sets back to Teradata. Weíll orchestrate that kind of analytic framework.Letís talk about open source. A lot of these solutions that are being incorporated into an overall ecosystem are open source. What role does open source play now and into the future and how is Teradata addressing that?Chris Twogood:
If you look at all the investments in big data startups and all the different names out there, there is always the latest solution that says it will solve all of your specific problems out in the marketplace. But I think part of the challenge is how we as an industry create clarity around how these work together versus confusion. Thus, there are a number of different things that Teradata is doing. One of them is our partnership with Cloudera. Traditionally youíve heard Cloudera say, for example, that Hadoop is going to replace the data warehouse. And I think thereís a market maturity that has come around that says that doesnít make sense. They each have a play in the ecosystem.
Teradata also just announced a new service called the Data Integration Optimization Service that helps companies understand where the best place is to do your data integration. Is it better to do it on Hadoop? Is it better to optimize it within your warehouse? Is it better to do it on another server?
The other thing that is very key is an acquisition weíve made of Think Big Analytics. Think Big is exclusively focused on the role of leveraging open source, whether thatís Cassandra, Hadoop, Storm, Spark or MongoDB and integrating them into a broader ecosystem. As you build out an analytic ecosystem, it is necessary to look at how to integrate these different technologies to drive real value. Teradata is doing a number of things like the acquisition of Think Big Analytics, our Data Integration Optimization Services and our Cloudera partnership to help give that clarity.
One of the things that this is doing for Teradata is putting you in a position where you really have to be a trusted advisor to the customer. You really have to be looking at each customerís particular situation and put together the right mix of technologies that will solve those problems.Chris Twogood:
If you look at our heritage as Teradata, we have always been about helping companies gain value from data. While we have great technology that empowers that, over half of what we do as a company is consulting services. Itís about helping people understand how to get value, and that extends to all these emerging technologies. I think Teradata is a trusted advisor for people to understand how to get value and how to integrate the technologies to help drive that value.
Speaking of emerging technologies, one of the big buzzwords we hear lately is the ďdata lakeĒ architecture. Whatís Teradataís take on that and how does what youíre doing factor that in?Chris Twogood:
We really view data lakes as an emerging architectural pattern. Not everybody is deploying them, but the promise of a data lake is actually quite interesting: bring in all of your data, process it, have it sitting there in its original fidelity, and then feed downstream analytic foundations for doing additional serving up to business users. Now the problem with a data lake is that the more data you put in it, the more it disappears. Thatís because Hadoop as an architecture so far hasnít done a great job of managing metadata. In fact, I heard someone say the other day that if you had a glass of water and you walk over to the lake and pour it in, the minute you pour it in, the water is gone. You cannot recreate that glass of water again. We recently announced Teradata Loom as a result of the acquisition we did with Revelytix. Teradata Loom provides integrated metadata, data lineage and data wrangling Ė all in a single self-service UI Ė to be able to understand the metadata within a Hadoop cluster so that you can start to understand your data lake and get real value out of it.
Having that element and understanding of metadata is also really important in terms of being able to orchestrate along with other analytic systems. So itís a key value-add in the market Ė and, by the way, itís available for download combined with the Hortonworks or the Cloudera sandbox so developers can try it out for free and see the value of it.So you get the advantage of the data lake architecture but you donít have that one-way street problem, right?Chris Twogood:
Exactly Ė and youíve heard that people in the marketplace say it could be a data swamp if you donít manage it appropriately.Letís talk about analytics in big data environments. One of the things I keep hearing is that there are a lot of challenges related to making traditional models. What do you think? Is it time for a new kind of analytics for big data?Chris Twogood:
I think people have started to get to the point where theyíre capturing lots of data, but now they are trying to figure out how they can look through all of that data. They are trying to determine how to do it in a way that enables them to parse through the data, look for patterns and really uncover unique types of insights from the data. As part of our drive toward introducing new algorithms for helping with big data, we announced our Connection Analytics capability. Connection Analytics provides a lot of new algorithms that sit on top of a native graph engine as well as SQL and our MapReduce engine. It enables us to do things about analyzing the relationships between different entities. The entities could be people to people, people to products, products to processes, or machines to people. Itís not about analyzing the discrete units of the entities, but about the influx of information that flows between those two. There are very interesting algorithms Ė that include loopy belief propagation, personalized salsa and Shapley Values. All of these things really help drill into the data and uncover insight that wasnít available before Ė and doing it in a very automated and machine-learning type of architecture. So itís very interesting, and I think it will drive a whole new level of Connection Analytics.Itís not just new insights Ė itís new kinds of insights. Chris Twogood:
Absolutely. You might see things that help you with churn, social network analysis, viral marketing or fraud detection. There are all kinds of interesting use cases around that.Itís interesting what you donít know that you donít know until you can start making connections that you couldnít make before.Chris Twogood:
Exactly. There is one more trend Iíd like to talk about and that is in-memory. There seems to be greater demand for in-memory performance, and obviously Teradata has been playing in this space for some time. What can you tell us about the future for in-memory technology?
I think that the demand for analytics and the demand for data are growing faster than ever before. And when it grows, then it means your systems have to be able to perform really at future scale. How do you provide the performance required to meet the demand? What that means is that as vendors in this space we have to reduce the amount of bottlenecks that sit between different tiers in an architecture. The challenge has always been that your disks are the slowest component in your architecture. Then you have memory, and then you have CPU. And when they interoperate and use I/O, that degrades performance. When in-memory first came out, people were saying, ďCan I take a lot of the stuff that was on the disk and put it in memory?Ē What Teradata has done is really driving even more advanced engineering around how to process data in-memory Ė doing advanced pipelining and in-memory friendly structures like columns so that theyíre easily consumed by the CPU. Weíve even taken it a step further, and weíve put it into the CPU using new technology from Intel around vectorization to reduce the movement from CPU to in-memory because in-memory is almost becoming the new bottleneck. Now we have to move up to the CPU.
So weíre going from in-memory storage to in-memory computing Ė fascinating. Thank you for taking the time to share these insights into Teradataís advancements to help organizations gain value from their data.
SOURCE: Big Data Analytics Requires Orchestration, not Federation: A Q&A with Chris Twogood of Teradata
Recent articles by Phil Bowermaster