BeyeNETWORK UK Blogs BeyeNETWORK UK Blogs. Copyright BeyeNETWORK 2005 - 2019 http://www.b-eye-network.co.uk/rss/content.php 150 31 BeyeNETWORK UK Blogs http://www.b-eye-network.com/images/logo_b-eye_rss.gif http://www.b-eye-network.co.uk/rss/content.php Negative energy prices and artificial intelligence Since renewable energy has started to become popular, an odd problem has appeared in wholesale energy markets: negative prices.

In other words, energy plants sometimes pay their customers to take energy off their hands. Usually older, less flexible plants that can‘t shut down without incurring costs are affected.
One solution to this problem is batteries. The idea is to store the energy when it is overabundant, and use it later when it is expensive. This is sometimes called "peak shaving".
Batteries are a great idea, but not the only solution. Another is to simply find an application that is energy hungry and can be run intermittently.

One possible application for soaking up excess energy is desalination. For example, a desert region near an ocean could build solar plants to desalinate water during the day only. The question is whether building a desalination plant that only runs 12 hours a day is worth the savings in energy. 
Another way to make use of energy that might go to waste is using it to power computers that perform analytics. The energy demand of data centers is growing quickly.

One source of energy needs is Bitcoin. Bitcoin mining consumes huge amounts of energy, so it is a great example of a use for negative energy prices. In fact there are already a lot of bitcoin miners in Western China, where solar and wind installations have outstripped grid upgrades. In these areas renewable energy is often curtailed because the grid can‘t keep up. So the energy is basically free to the miners.

Extremely cheap bitcoin mining arguably undermines the whole concept, but here is a more productive idea: Training artificial intelligence. For example, have a look at this link to gcp leela, a clone of Google Deepmind Alphago zero:
https://github.com/gcp/leela-zero
The entire source code is free, and it‘s not a lot of code. But that free code is just the learning model, and its based on well known principles. It‘s probably just as good as Deepmind Alphago Zero when trained, but they figure it would take them 1700 years to train -- unless of course they could harness other resources. This is partly because they don‘t have access to Google‘s specialized TPU hardware. Whatever the reason, training it is going to burn through a lot of energy.
This would be a great application for negatively priced energy. Game playing is more a stunt than a commercial application, but when they are paying you to use the energy, why not? And as time passes, more useful AI apps will need training.
So it gets down to whether the business model of peak shaving with batteries makes more economic sense than banks of custom chips for training neural networks for AI in batches. The advantage of batteries is that you can sell the energy later for more, but it‘s not terribly efficient, and using it directly is a better idea. Cheap computer hardware and a growing demand for AI may fit this niche very well.
This puts a whole new twist on the idea that big tech companies are investing in renewables. These companies make extensive used of AI, which is trained in batch processes. 


]]>
http://www.b-eye-network.co.uk/blogs/finucane/archives/2018/01/negative_energy_prices_and_art.php Thu, 4 Jan 2018 08:33:00 MST http://www.b-eye-network.co.uk/blogs/finucane/archives/2018/01/negative_energy_prices_and_art.php
Understanding Artificial Neural Networks Artificial neural networks are computer programs that learn a subject matter of their own accord. So an artificial neural network is a method of machine learning. Most software is created by programmers painstakingly detailing exactly how the program is expected to behave. But in machine learning systems, the programmers create a learning algorithm and feed it sample data, allowing the software to learn to solve a specific problem by itself.

Artificial neural networks were inspired by animal brains. They are a network of interconnected nodes that represent neurons, and the thinking is spread throughout the network. 

But information doesn‘t fly around in all directions in the network. Instead it flows in one direction through multiple layers of nodes from an input layer to an output layer. Each layer gets inputs from the previous layer and then sends calculation results to the next layer. In an image classification system, the initial input would be the pixels of the image, and the final output would be the list of classes.

The processing in each layer is simple: Each node get numbers from multiple nodes in the previous layer, and adds them up. If the sum is big enough, it sends a signal to the nodes in the layer below it. Otherwise it does nothing. But there is a trick: The connections between the nodes are weighted. So if node A sends a 1 to nodes B and C, it might arrive at B as 0.5, and a C as 3, depending on the weights in the connections. 

The system learns by adjusting the weights of the connections between the nodes. To stay with visual classification, it gets a picture and guesses which class it belongs to, for example "cat" or "fire truck". If it guesses wrong, the weights are adjusted.This is repeated until the system can identify pictures.

To make all this work, the programmer has to design the network correctly. This is more an art than a science, and in many cases, copying someone else‘s design and tweaking it is the best bet.

In practice, neural network calculations boil down to lots and lots of matrix math operations as well at the threshold operation the neurons use to decide whether to fire. It‘s fairly easy to imagine all this as a bunch of interconnected nodes sending each other signals, but fairly painful to implement in code. 

The reason it is so hard is that there can be many layers that are hard to tell apart, making it easy to get confused about which is doing what. The programmer also has to keep in mind how to orient the matrices the right way to make the math work, and other technical details. 

It is possible to do all this from scratch in a programming language like Python, and recommended for beginner systems. But fortunately there is a better way to do advanced systems: In recent years a number of libraries such as Tensorflow have become available that greatly simplify the task. These libraries take a bit of fiddling to understand at first, and learning how to deal with them is key to learning how to create neural networks. But they are a huge improvement over hand coded systems. Not only do they greatly reduce programming effort, they also provide better performance.

]]>
http://www.b-eye-network.co.uk/blogs/finucane/archives/2018/01/understanding_artificial_neura.php Wed, 3 Jan 2018 15:21:00 MST http://www.b-eye-network.co.uk/blogs/finucane/archives/2018/01/understanding_artificial_neura.php
Psst – Wanna buy a Data Quality Vendor? Founded in 1993, Trillium Software has been the largest independent data quality vendor for some years, nestling since the late 1990s as a subsidiary of US marketing services company Harte Hanks. The latter was once a newspaper company dating back to 1928, but switched to direct marketing in the late 1990s. It had overall revenues of $495 million in 2015. There was clearly a link between data quality and direct marketing, since name and address validation is an important feature of marketing campaigns. However the business model of a software company is different from a marketing firm, so ultimately there was always going to be a certain awkwardness in Trillium living under the Harte Hanks umbrella.

On June 7th 2016 the parent company announced that it had hired an advisor to look at ”strategic alternatives” for Trillium, including the possibility of selling the company, though the company’s announcement made clear that a sale was not a certainty. Trillium has around 200 employees and a large existing customer base, so will have a steady income stream from maintenance revenues. The data quality industry is not the fastest growing sector of enterprise software, but is well established and quite fragmented. As well as offerings from Informatica, IBM, SAP and Oracle (all of which were based on acquisitions) there are dozens of smaller data quality vendors, many of them having grown up around the name and address matching issue that is well suited to at least a partially automated solution. While some vendors like Experian have focused traditionally on this problem, other vendors such as Trillium have developed much broader data quality offerings, with functions such as data profiling, cleansing, merge/matching, enrichment and even data governance.

There is a close relationship between data quality and the somewhat faster growing sector of master data management (MDM), so MDM vendors might seem in principle to be natural acquirers of data quality vendors. However MDM itself has somewhat consolidated in recent years, and the big players in it like Informatica, Oracle and IBM all market platforms that combine data integration, MDM and data quality (though in practice the degree of true integration is distinctly more variable than it appears on Powerpoint). Trillium might be too big a company to be swallowed up by the relatively small independents that remain in the MDM space. It will be interesting to see what emerges from this exercise. Certainly it makes sense for Trillium to stand on its own to feet rather than living within a marketing company, but on the other hand Harte Hanks may have missed the boat. A few years ago large vendors were clamouring to acquire MDM and related technologies, but now most companies that need a data quality offering have either built or bought one. The financial adviser in charge of the review may have to be somewhat creative in who it looks at as a possible acquirer.



]]>
http://www.b-eye-network.co.uk/blogs/hayler/archives/2016/06/psst_wanna_buy_a_data_quality.php Thu, 9 Jun 2016 13:22:32 MST http://www.b-eye-network.co.uk/blogs/hayler/archives/2016/06/psst_wanna_buy_a_data_quality.php
Informatica MDM Moves To The Cloud I recently attended the Informatica World event in San Francisco, which drew over 3,000 customers and partners. One key announcement from an MDM perspective was the availability of Informatica MDM for the cloud, called MDM Cloud Edition. Previously Informatica had a Salesforce application only cloud offering via an acquisition in 2012 of a company called Data Scout. This is the first time that the main Informatica MDM offering has been able to be deployed in the cloud, including on Amazon AWS. It is an important step, as moving MDM to the cloud is a slow but inevitable bandwagon and recently start-ups like Reltio, designed from scratch as cloud offerings, have been able to offer cloud MDM with little real competition. The Informatica data quality technology will apparently be fully cloud-ready by the end of 2016.

The company launched a product called Intelligent Streaming. This connects lots of data sources and distributes the data for you e.g. a demo showed data from several sources being streamed to a compute engine using Spark, or Hadoop if you prefer, without needing to code. This approach shields some of the underlying complexity of the Big Data environment from developers. Live Data Map is part of the Informatica infrastructure and is a way to visualise data sources both on premise or cloud. Its also does scheduling in a more sophisticated way than at present, using machine learning techniques.

There were plenty of external speakers, both customers and partners. Nick Millman from Accenture gave a talk about trends in data management, and referred back to his first assignment at a ”global energy company” (actually Shell, where I first met him), in which the replication of an executive dashboard database involved him flying from London to The Hague with a physical tape to load up onto a server in Rijswijk. Unilever gave a particularly good talk about their recent global product information management project, in which the (business rather than IT) speaker described MDM as ”character building” – hard to argue there.

There were new executives on display, in particular Jim Davis as head of marketing (ex SAS) and Lou Attanasio as the new head of sales (ex IBM).
With Informatica having recently gone private, it will be comforting for their customers that the company is investing as much as ever in its core technology, and certainly in MDM the company reckons it has more developers than Oracle, IBM and SAP combined, though such claims are hard to verify. However there certainly seems to be plenty of R&D activity going on related to MDM judging by the detailed sessions. Examples of additional new developments were accelerators and applications for pharmaceuticals, healthcare and insurance.

Informatica continues to have one of the leading MDM technologies at a time when some of its large competitors appear to be losing momentum in the marketplace for assorted reasons, so from a customer perspective the considerable on-going R&D effort is reassuring. Its next major R&D effort will be to successfully blend the two current major MDM platforms that they have (acquired from Siperian and Heiler), something that their large competitors have singularly failed to achieve thus far with their own acquired MDM technologies.



]]>
http://www.b-eye-network.co.uk/blogs/hayler/archives/2016/06/informatica_mdm_moves_to_the_c.php Mon, 6 Jun 2016 13:34:18 MST http://www.b-eye-network.co.uk/blogs/hayler/archives/2016/06/informatica_mdm_moves_to_the_c.php
Alphago probably isn‘t learning from Lee Sedol There has been quite a bit of discussion about whether Alphago can learn from the games it plays against Lee Sedol. I think not. At least, not directly. 
The heart of the program is the ”policy network” a convolutional neural network (CNN) that was designed for image processing. CNNs return a probability that a given image belongs to each of a predefined set of classifications, like ”cat”, ”horse”, etc. CNNs work astonishingly well, but have the weakness that they can only be used with a fix size image to estimate a fixed set of classifications.
The policy network views go positions as 19í—19 images and returns probabilities that human players would make one of 361 possible moves. This probability drives with the Monte Carlo tree search for good moves that has been used for some time in go computers.The policy network is trained on 30 million positions (or moves) initially. 
CNN (aka ”deep learning”) behavior is pretty well understood. The number of samples required for learning depends on the complexity of the model. A model of this complexity probably requires tes of thousands of example positions before it changes much. 
The number of samples required to train any machine learning program depends on the complexity of the strategy, not on the number of possible positions. For example, Gomoku ("five in a row", also called goban) on a 19í—19 board would take many fewer examples to train than go would, even though the number of possible positions is also very large.
Another point: Any machine learning algorithm will eventually hit a training limit, after which it won’t be able to improve itself by more training. After that, a new algorithm based on a new model of game play would be required to improve the play. It is interesting that the Alphago team seems to be actively seeking ideas in this area. Maybe that is because they are starting to  hit a limit, but maybe it‘s just because they are looking into the future.
So Alphago probably can’t improve its play measurably by playing any single player five times, no matter how strong. That would be ”overfitting”. The team will be learning from the comments of the pro players and modifying the program to improve it instead.
Interesting tidbit: Alphago said the chances of a human playing move 37 in game 2 was 1 in 10,000. So the policy network doesn’t decide everything.


]]>
http://www.b-eye-network.co.uk/blogs/finucane/archives/2016/03/alphago_probably_isnt_learning.php Sun, 13 Mar 2016 13:13:00 MST http://www.b-eye-network.co.uk/blogs/finucane/archives/2016/03/alphago_probably_isnt_learning.php
Alphago is a learning machine more than a go machine The key part of Alphago is a convolutional neural network. These are usually used for recognizing cat pictures and other visual tasks, and progress in the last five years has been incredible.
Alphago went from the level of a novice pro last October to world champion level for this match. It did so by playing itself over and over again.
Chess programs are well understood because they are programmed by humans. Alphago is uses an algorithm to pick a winning move in a given go position. But the heart of the program is a learning program to find that algorithm, not the algorithm itself.
Go programs made steady progress for a decade with improved tree pruning methods, which reduce the total number of positions the program has to evaluate. The cleverest method is Monte Carlo pruning, which simply prunes at random. 


]]>
http://www.b-eye-network.co.uk/blogs/finucane/archives/2016/03/alphago_is_a_learning_machine.php Sun, 13 Mar 2016 06:00:00 MST http://www.b-eye-network.co.uk/blogs/finucane/archives/2016/03/alphago_is_a_learning_machine.php
Informatica V10 emerges Informatica just announced their Big Data Management solution V10, the latest update to their flagship suite of technology. The key objective for this is to enable customers to design data architectures that can accommodate both traditional database sources and newer Big Data ”lakes” without needing to get swim too deeply in the world of MapReduce or Spark.

In particular, the Live Data Map offering is interesting, a tool that builds a metadata catalog as automatically as it can. Crucially, this is updated continuously rather than being a one-off batch exercise, the bane of previous metadata efforts, which can quickly get out of date. It analyses not just database system tables but also semantics and usage, so promises to chart a path through the complexity of today’s data management landscape without need for whiteboards and data model diagrams.

V10 extends the company’s already fairly comprehensive ability to plug into a wide range of data sources, with over 100 pre-built transformations and over 200 connectors. By providing a layer of interface above the systems management level, a customer can gain a level of insulation from the rapidly changing world of Big Data, with its bewildering menagerie of technologies, some of which disappear from current fashion almost as soon as you have figured out where they fit. Presenting a common interface across traditional and new data sources enables organisations to minimise wasted skills investment.

As well as quite new features such as Live Data Map, there are an array of incremental updates to the established technology elements of the Informatica suite, such as improved collaboration capability within the data quality suite, and the ability of the data integration hub to span both cloud and on-premise data flows. A major emphasis of the latest release is performance improvement, with much faster data import and data cleansing.

With Informatica having recently gone private, it will be comforting for their customers that the company is investing as much as ever in its core technology, as well as adding new and potentially very useful new elements. The data management landscape is increasingly fragmented and complex these days, so hard pressed data architects need all the help that they can get.



]]>
http://www.b-eye-network.co.uk/blogs/hayler/archives/2015/11/informatica_v10_emerges.php Mon, 16 Nov 2015 15:08:44 MST http://www.b-eye-network.co.uk/blogs/hayler/archives/2015/11/informatica_v10_emerges.php
Leaving Las Vegas The Informatica World 2015 event in Las Vegas was held as the company was in the process of being taken off the stock market and into private ownership by private equity firm Permira and a Canadian pension fund. The company was still in its quiet period so was unable to offer any real detail about this. However my perception is that one key reason for the change may be that the company executives can see that there is a growing industry momentum towards cloud computing. This is a challenge to all major vendors with large installed bases, because the subscription pricing model associated with the cloud presents a considerable challenge as to how vendors will actually make money compared to their current on-premise business model. A quick look at the finances of publicly held cloud-only companies suggest that even these specialists have yet to really figure it out, with a sea of red ink in the accounts of most. If Informatica is to embrace this change then it is likely that it’s profitability will suffer, and private investors may offer a more patient perspective than Wall Street, which is notoriously focused on short-term earnings. It would seem to me that there is unlikely to be any real change of emphasis around MDM from Informatica, given that it seems to be their fastest growing business line.

On the specifics of the conference, there were announcements for the company around its major products, including its recent foray into data security. The most intriguing was the prospect of a yet to be delivered product called “live data map”. The idea is to allow semantic discovery within corporate data, and allow end-users to vote on how reliable particular corporate data elements are, rather as consumers vote for movies on IMDB or rate others on eBay. The idea is that this approach may be particularly useful as companies have to deal with “data lakes” where data will have little or none of the validation applied to it that would (in theory) be the case with current corporate systems. The idea is tantalising but this was a statement of direction rather than a product that was ready for market.

The thing that I found most useful was the array of customer presentations, over a hundred in all. BP gave an interesting talk about data quality in the upstream oil industry, which has typically not been a big focus for data quality vendors (there is no name and address validation in the upstream). Data governance was a common theme in several presentations, clearly key to the success of both master data and data quality projects. There was a particularly impressive presentation by GE Aviation about their master data project, which had to deal with very complex aeroplane engine data.

Overall, Informatica’s going private should not have any negative impact on customers, at least unless its executives end up taking their eye off the ball due to the inevitable distractions associated with new ownership.



]]>
http://www.b-eye-network.co.uk/blogs/hayler/archives/2015/05/leaving_las_vegas.php Sat, 16 May 2015 11:25:47 MST http://www.b-eye-network.co.uk/blogs/hayler/archives/2015/05/leaving_las_vegas.php
The Teradata Universe The Teradata Universe conference in Amsterdam in April 2015 was particularly popular, with a record 1,200 attendees this year. Teradata always scores unusually high in our customer satisfaction surveys, and a recurring theme is its ease of maintenance compared to other databases. At this conference the main announcement continued this theme with the expansion of its QueryGrid, allowing a common administrative platform across a range of technologies. QueryGrid can now manage all three major Hadoop implementations, MapR, Cloudera and HortonWorks, as well as its own Aster and Teradata platforms. In addition the company announced a new appliance, the high-end 2800, as well as a new feature they call the software-defined warehouse. This allows multiple Teradata data warehouses to be managed as one logical warehouse, including allow security management across multiple instances.

The conference had its usual heavy line-up of customer project implementation stories, such as an interesting one by Volvo, who are doing some innovative work with software in their cars, at least in the prototype stage. For example in one case the car sends signals to any cyclists with a suitably equipped helmet, using a proximity alert. In another example the car can seek out spare parking spaces in a suitably equipped car park. A Volvo now has 150 computers in it, generating a lot of data that has to be managed as well as creating new opportunities. Tesla is perhaps the most extreme example so far of cars becoming software-drive, in their case literally allowing remote software upgrades in the same way that occur with desktop computers (though hopefully car manufacturers will do a tad more testing than Microsoft in this regard). The most entertaining speech thatI saw was by a Swedish academic, Hans Rosling, who advises UNICEF and the WHO and who gave a brilliant talk about the world’s population trends using extremely advanced visualisation aids, an excellent example of how to display big data in a meaningful way.



]]>
http://www.b-eye-network.co.uk/blogs/hayler/archives/2015/04/the_teradata_universe.php Thu, 23 Apr 2015 11:24:04 MST http://www.b-eye-network.co.uk/blogs/hayler/archives/2015/04/the_teradata_universe.php
The Private Side of Informatica Yesterday Informatica announced that it was being bought, not by a software firm but by a private equity company Permira. At ÂŁ5.3 billion, this values the data integration vendor at over five times the billion dollar revenue that Informatica saw in 2014, compared to an industry average of 4.4 recently. This piece of financial engineering will not change the operational strategy for Informatica. Rather it is a reflection of a time when capital is plentiful and private equity firms are feeling bullish about the software sector. Tibco and Dell have followed a similar route. Company managers will not have to worry about quarterly earnings briefings to pesky financial analysts, and will instead be accountable only to their new owners. However, private equity firms seek a return on their investment, usually leveraging plenty of debt into such deals (debt is tax efficient compared to equity), and can be demanding of their acquisitions. From a customer viewpoint there is little to be concerned about. One exit for the investors will be a future trade sale or return to the stock market, so this deal does not in itself change the picture for Informatica in terms of possible acquisition by a bigger software company one day.



]]>
http://www.b-eye-network.co.uk/blogs/hayler/archives/2015/04/the_private_side_of_informatic.php Wed, 8 Apr 2015 09:55:28 MST http://www.b-eye-network.co.uk/blogs/hayler/archives/2015/04/the_private_side_of_informatic.php
Snowflake is a New SQL Database Server for the Cloud
One of these new kids on the block is Snowflake Elastic Data Warehouse by Snowflake Computing. It's not available yet, we still have to wait until the first half of 2015, but information is available and beta versions can be downloaded.

Defining and classifying Snowflake with one term is not that easy. Not even with two terms. To start, it's a SQL database server that supports a rich SQL dialect. It's not specifically designed for big data environments (the word doesn't even appear on the website), but to develop large data warehouses. In this respect, it competes with other so-called analytical SQL database servers.

But the most distinguishing factor is undoubtedly that it's architected from the ground up to fully exploit the cloud. This means two things, one, it's not an existing SQL database server that has been ported to the cloud, but its internal architecture is designed specifically for the cloud. All the lines of codes are new, no existing open source database server is used and adapted. It makes Snowflake highly scalable and really elastic, which is why organizations turn to the cloud.

Second, it also means that the product can really be used as a service. It only requires a minimal amount of DBA work. So, the term service doesn't only mean that it offers a service-based API, such as REST or JDBC, but that the product has been designed to operate hassle-free. Almost all the tuning and optimization is done automatically.

In case you want to know, no, the name has no relationship with the data modeling concept called snowflake schema. The name snowflake has been selected because many of the founders and developers have a strong relationship with skiing and snow.

Snowflake is a product to keep an eye on. I am looking forward to its general availability. Let's see if there is room for another database server. If it's sufficiently unique, there may well be.



]]>
http://www.b-eye-network.co.uk/blogs/vanderlans/archives/2014/10/snowflake_is_a.php Wed, 29 Oct 2014 02:20:51 MST http://www.b-eye-network.co.uk/blogs/vanderlans/archives/2014/10/snowflake_is_a.php
Pneuron is a Platform for Distributed Analytics Pneuron. Initially you would say it's a jack of all trades, a Swiss army knife, but it isn't.

Pneuron is a platform that offers distributed data and application integration, data preparation, and analytical processing. With its workflow-like environment, a process can be defined to extract data from databases and applications, to perform analytics natively or to invoke different types of analytical applications and data integration tools, and to deliver final results to any number of destinations, or to simply persist the results so that other tools can easily access them.

Pneuron's secret is its ability to design and deploy distributed processing networks, which are based on (p)neurons (hence the product name). Each pneuron represents a task, such as data extraction, data preparation, or data analysis. Pneurons can run across a network of machines, and are, if possible, executed in parallel. It reuses the investment that companies have already made in ERP applications, ETL tools, and existing BI systems. It remains agnostic to and coordinates the use of all those prior investments.

Still, Pneuron remains hard to clarify. It's quite unique in its sort. But whatever the category is, Pneuron is worth checking out.



]]>
http://www.b-eye-network.co.uk/blogs/vanderlans/archives/2014/10/pneuron_is_a_pl.php Tue, 28 Oct 2014 10:03:17 MST http://www.b-eye-network.co.uk/blogs/vanderlans/archives/2014/10/pneuron_is_a_pl.php
QueryGrid is New Data Federation Technology by Teradata QueryGrid at their Partners event in Nashville, Tennessee. QueryGrid allows developers of the Teradata database engine to transparently access data stored in Hadoop, Oracle, and Teradata Aster Database. Users won't really notice that data is not stored in Teradata's own database, but in one of the other data stores.

The same applies to developers using the Teradata Aster database. With QueryGrid they can access and manipulate data stored in Hadoop and the Teradata Database.

With QueryGrid, for both Teradata's database servers, access to big data stored in Hadoop becomes even more transparent than with its forerunner SQL-H. QueryGrid allows Teradata and Aster developers to seamlessly work with big data stored in Hadoop without the need to learn the complex Hadoop APIs.

QueryGrid is a data federator, so data from multiple data stores can be joined together. However, it's not a traditional data federator. Most data federators sit between the applications and the data stores that are being federated. It's the data federator that is being accessed by the applications. QueryGrid sits between, on one hand, the Teradata database or the Aster database, and, on the other hand, Hadoop, Oracle, and the Teradata database and the Aster database. So, applications do not directly access QueryGrid.

QueryGrid supports all the standard features one expects from a data federator. What's special about QueryGrid is that it's deeply integrated with Teradata and Aster. For example, developers using Teradata can specify one of the pre-built analytical functions supported by the Aster database, such as sessionization and connection analytics. The Teradata Database will recognize the use of this special function, knows it's supported by Aster, and automatically passes the processing of the function to Aster. In addition, if the data to be processed is not stored in Aster, but, for example, in Teradata, the relevant data is transported to Aster so that the function can be executed. This means that, due to QueryGrid, functionality of one of the Teradata database servers becomes available for the other.

QueryGrid is definitely an enrichment for organizations that want to develop big data systems by deploying the right data storage technology for the right data.





]]>
http://www.b-eye-network.co.uk/blogs/vanderlans/archives/2014/10/querygrid_is_ne.php Tue, 28 Oct 2014 09:59:40 MST http://www.b-eye-network.co.uk/blogs/vanderlans/archives/2014/10/querygrid_is_ne.php
Pneuron, a Platform for Distributed Analytics? Pneuron. Initially you would say it's a jack of all trades, a Swiss army knife, but it isn't.

Pneuron is a platform that offers distributed data and application integration, data preparation, and analytical processing. With its workflow-like environment, a process can be defined to extract data from databases and applications, to perform analytics natively or to invoke different types of analytical applications and data integration tools, and to deliver final results to any number of destinations, or to simply persist the results so that other tools can easily access them.

Pneuron's secret is its ability to design and deploy distributed processing networks, which are based on (p)neurons (hence the product name). Each pneuron represents a task, such as data extraction, data preparation, or data analysis. Pneurons can run across a network of machines, and are, if possible, executed in parallel. It reuses the investment that companies have already made in ERP applications, ETL tools, and existing BI systems. It remains agnostic to and coordinates the use of all those prior investments.

Still, Pneuron remains hard to clarify. It's quite unique in its sort. But whatever the category is, Pneuron is worth checking out (www.pneuron.com).




]]>
http://www.b-eye-network.co.uk/blogs/vanderlans/archives/2014/10/pneuron_a_platf.php Mon, 27 Oct 2014 06:23:42 MST http://www.b-eye-network.co.uk/blogs/vanderlans/archives/2014/10/pneuron_a_platf.php
SAS Update At a conference in Lausanne in June 2014 SAS shared their current business performance and strategy. The privately held company (with just two individual shareholders) had revenues of just over $3 billion, with 5% growth. Their subscription-only license model has meant that SAS has been profitable and growing for 38 years in a row. 47% is Americas, 41% from Europe and 12% from Asia Pacific. They sell to a broad range of industries, but the largest in terms of revenue are banking at 25% and government at 14%. SAS is an unusually software-oriented company, with just 15% of revenue coming from services. Last year SAS was voted the second best company globally to work for (behind Google), and attrition is an unusually low 3.5%.

In terms of growth, fraud and security intelligence was the fastest growing area, followed by supply chain, business intelligence/visualisation and cloud-based software. Data management software revenue grew at just 7%, one of the lowest rates of growth in the product portfolio (fraud management was the fastest growing). Cloud deployment is still relatively small compared to on-premise but growing rapidly, expected to exceed over $100 million in revenue this year.

SAS has a large number of products (over 250), but gave some general update information on broad product direction. Its LASR product, introduced last year, provides in-memory analytics. They do not use an in-memory database, as they do not want to be bound to SQL. One customer example given was a retailer with 2,500 stores and 100,000 SKUs that needed to decide what merchandise to stock their stores with, and how to price locally. They used to analyse this in an eight-hour window at an aggregate level, but can now do the analysis in one hour at an individual store level, allowing more targeted store planning. The source data can be from traditional sources or from Hadoop. SAS have been working with a university to improve the user interface, starting from the UI and trying to design to that, rather than producing a software product and then adding a user interface as an afterthought.

In Hadoop, there are multiple initiatives to apply assorted versions of SQL to Hadoop from both major and minor suppliers. This is driven by the mass of skills in the market with SQL skills compared to the relatively tiny number of people that can fluently program using MapReduce. Workload management remains a major challenge in the Hadoop environment, so a lot of activity has been going on to integrate the SAS environment with Hadoop. Connection is possible via Hive QL. Moreover, SAS processing is being pushed to Hadoop with Map Reduce rather than extracting data. A SAS engine is placed on each cluster to achieve this. This includes data quality routines like address validation, directly applicable to Hadoop data with no need to export data from Hadoop. A demo was shown using the SAS Studio product to take some JSON files, do some cleansing, and then use Visual Analytics and In-Memory Statistics to analyze a block of 60,000 Yelp recommendations, blending this with another recommendation data set.



]]>
http://www.b-eye-network.co.uk/blogs/hayler/archives/2014/06/sas_update.php Fri, 6 Jun 2014 23:53:33 MST http://www.b-eye-network.co.uk/blogs/hayler/archives/2014/06/sas_update.php