Blog: Rick van der Lans http://www.b-eye-network.co.uk/blogs/vanderlans/ Welcome to my blog where I will talk about a variety of topics related to data warehousing, business intelligence, application integration and database technology. Currently my special interests include virtual data warehousing, mashups and service-oriented architectures. If there are any topics you'd like me to address, send them to me at rick@r20.nl Copyright 2010 Mon, 21 Jun 2010 23:42:00 +0000 http://www.movabletype.org/?v=4.261 http://blogs.law.harvard.edu/tech/rss In-Database Analytics with Aster Data's SQL-MapReduce Most analytical tools process a large portion of the analytical logic themselves. For example, the logic to perform a regression analysis which determines the relationship between a dependent variable and one or more independent variables, is executed by the tool. The role of the database server is minimal, it's only used for retrieving all the required data from the database.

Because most of the analytic processing takes place on the machine where the tool runs, it's very likely that too much data is transmitted from the database server to the application, which is bad for performance. Additionally, the processing is not taking place on the most powerful machine.

With in-database analytics, the analytical processing is primarily done by the database server itself. The remaining task of the analytical tool is to present results on the screen and do some minimal processing. This approach has several performance advantages. For example, because the database server (almost certainly) runs on a more powerful machine, the analytical logic is processed more quickly. Secondly, because most of the analytical processing is executed very close to the where the data is stored, the I/O is optimal. And thirdly, because only the result set is transmitted back to the tool, minimal time is wasted on transmitting data from the database server to the tool.

But moving the analytical processing from the application to the database server by itself does not automatically lead to a considerable performance improvement. A serious performance improvement is realized when the analytical logic is executed in parallel by the database server.

A solution based on SQL-MapReduce does allow to push most of the analytical processing to the database server and most of that processing will be executed in parallel. My technical whitepaper Using SQL-MapReduce for Advanced Analytical Queries, which describes Aster Data's implementation of SQL-MapReduce, explains in detail how this works.

 

]]>
http://www.b-eye-network.co.uk/blogs/vanderlans/archives/2010/06/in-database_ana.php http://www.b-eye-network.co.uk/blogs/vanderlans/archives/2010/06/in-database_ana.php Mon, 21 Jun 2010 23:42:00 +0000
Self-Service Business Intelligence: What a Strange term! Quite a hip and new term in the world of business intelligence is self-service business intelligence. If you visit this website regularly, you must have come across it. But is the term self-service not a term in contradiction?

 

To me the term service to me means that someone or something offers me a service, and that implies that I do less and the service provider does all or most of the work. For example, if I drive my car through a car wash, my car is automatically cleaned. It's the service that's being provided. Or, if I step into a hotel, packed with luggage, a porter will probably take over my bags, and will bring them to my room. Ok, I have carried them for hundreds of miles and he only does the last 100 yards, but it's still a service the hotel provides. That's basically the idea of service.

 

Now let's go back to the term self-service. The term self placed in front of the term service means you will do it yourself. In the context of self-service business intelligence, it means that the user can develop his own reports. But doing it yourself means you're not receiving service, you are actually doing it yourself. So, self-service means that no one offers you a service, you do all the work yourself.

 

For example, if a hotel positions itself as a self-service hotel, they would offer the service that you can carry your own luggage all the way up to your room. Comparably, a self-service carwash would provide the service that you can wash your car yourself. That's not service!

 

So combining the terms self and service make no sense, because the opposite of service is doing-it-yourself. Maybe we should rename self-service business intelligence to do-it-yourself BI, or no-service BI.

]]>
http://www.b-eye-network.co.uk/blogs/vanderlans/archives/2010/03/self-service_bu.php http://www.b-eye-network.co.uk/blogs/vanderlans/archives/2010/03/self-service_bu.php Tue, 30 Mar 2010 08:10:30 +0000
Call for Speakers: Data Warehousing and Business Intelligence Conference in London Who is interested in speaking at the Data Warehouse & Business Intelligence European Conference in London coming November? If you are, please fill in this call for speakers.

 

Last year, this event was a big success, more than 200 delegates showed up. Evaluations showed that the attendees were very pleased with the selected speakers (Bill Inmon, Barry Devlin, Neil Raden, Frank Buytendijk, Daniel Linstedt, and many more), the topics, and setup of the conference.

 

The 2010 edition is aimed at all aspects of data warehousing and business intelligence, including: trends, design guidelines, product overviews and comparisons, best practices, and new evolving technologies. And like last year, the conference is organized together with the highly successful Data Management and Information Quality Conference.

 

With this year's call for speakers we are trying to attract proposals for sessions on traditional and future data warehousing and business intelligence aspects. Delegates have expressed a preference for the use of case studies rather than theoretical or abstract topics. We would particularly like practitioners in the field to respond to this call for papers. We encourage new speakers to apply. Success stories - case studies where data warehousing and business intelligence have produced real bottom-line benefits are very much appreciated.

 

Example topics for proposals are:

 

  • Business analytics
  • BI in the cloud
  • Data modeling for data warehouses
  • The maturity of data warehouses appliances
  • Star schema, snowflake and data vault models
  • Selling business intelligence to the business
  • The relationship between master data management and data warehousing
  • Guidelines for using ETL tools
  • Developing virtual data warehouses with federation servers
  • The BI mashup
  • The need for Master Data Management in a data warehouse environment
  • BAM (Business Activity Monitoring) and KPI (Key Performance Indicators)
  • New database technology for implementing data warehouses
  • Who needs real-time data warehouses?
  • Business Optimization through BPEL, BAM and SOA
  • BI score carding
  • Customer analytics and insight
  • Text mining and text analytics
  • Open source BI
  • Corporate Performance Management

 

Looking forward to your call for speaker, and hope to see you in London coming November.

 

Rick van der Lans

Chairman of the Data Warehouse & Business Intelligence European Conference

]]>
http://www.b-eye-network.co.uk/blogs/vanderlans/archives/2010/03/call_for_speake.php http://www.b-eye-network.co.uk/blogs/vanderlans/archives/2010/03/call_for_speake.php Mon, 15 Mar 2010 03:02:43 +0000
Visit of SQLStream Quite recently, I visited the SQLStream. As the majority of database server vendors, SQLStream is located in California; in San Francisco to be more precise. Of course, their primary product (also called SQLStream) supports the database language SQL, and they try to follow the SQL standard as much as possible. So far, nothing new under the sun. You would almost think that this is again one of many new vendors trying to dethrone Oracle, Microsoft, and IBM. However, that would be an incorrect assumption.


As the name implies, SQLStream is a so-called streaming database server, comparable to IBM InfoSphere Streams and StreamBase Server. The main difference between SQLStream on one hand and most other products on the other hand, is that the former is a pure SQL-based product. The statements to stream are according to the SQL standard. Most other streaming products use proprietary languages, such as Spade, or use extensions.

 
For those who haven't studied this topic in detail yet, a streaming database server allows us to formulate queries on streams of data. Examples of streams are log files of certain systems, messages that are entered, or web logs. Even before this data is stored in tables, we can already access them and analyze the data. Someone once explained streaming database servers as follows: queries executed in the context of a classic database server are like: how many fishes live in this pond, whilst queries executed in the context of a streaming database server  is like: how many fishes swim by in a fast-flowing river during a certain period of time.


SQLStream offers all the features above. In addition, views are used to define streams, and this type of streaming views can serve as input for other streaming views. Through join and union operators, data of different sources can be integrated. In fact, SQLStream supports many of the features normally found in an ETL tool, except that SQLStream uses streams and SQL. Data streams are integrated live the moment they arrive. The result of an integrated stream can be send to an application or data warehouse. See the following link that contains an explanation on how SQLStream can be used together with SQL Power.

In short, SQLStream is absolutely worth studying.

Note: The owners of SQLStream are also the founders of Eigenbase.org. This organization supplies a toolset with which database servers can be developed. As can be expected, SQLStream is also developed with this toolset.

 

]]>
http://www.b-eye-network.co.uk/blogs/vanderlans/archives/2010/03/visit_of_sqlstr.php http://www.b-eye-network.co.uk/blogs/vanderlans/archives/2010/03/visit_of_sqlstr.php Tue, 09 Mar 2010 05:18:34 +0000
After 40 Years of Practice, Developing IT Systems Is Still Hard Last week, my family and I visited a basketball game in Phoenix. The Phoenix Suns were playing against the Philadelphia 76ers. It was a great game, the Suns won with 106-95. However, before I was able to get into the stadium I had the following experience.

 

A few days prior to this game, I bought the tickets through Ticketmaster.com. To buy those tickets, I had to enter my credit card information. Normally, this doesn't cause any problems, because credit companies operate internationally and know that some of their customers are based outside the US and they know those addresses might have different formats and structures.

 

What I had to enter was, as you might expect, my name, credit card number, expiration date, and a security number. In addition, I had to enter my address information so that they can verify a few things. So, dutifully I entered my address including the zip code. Entering the address components went well until I got to the zip code. The zip code was not accepted because the system expects five digits and I tried to enter four digits and two letters, which is the format of the Dutch zip code. But it didn't accept the letters. They had probably switched on a simple check: digits only please.

 

Now, this caused a problem, because for getting through the verification process I had to enter the correct zip code, but for buying a ticket I had to enter an incorrect zip code. Eventually I made the decision to enter the zip code of the hotel I was staying at. And, to my surprise it worked. I got my tickets and printed them.

 

Unfortunately, Ticketmaster.com had accepted my address information, however the credit card company's IT system had not. I discovered that when I entered the stadium and showed them my tickets. They were not accepted. Guess why? The zip code didn't match the rest of the address and it didn't correspond to the correct address.

 

How is it possible that in the year 2010 we still have problems with this simple type of data entry. Didn't they get the right definition of zip code from the credit card companies, or don't they check whether the zip code matches the rest of the address? Is their system not aware that the formats of zip codes can be different in other countries? And how is it possible that they first inform me that the credit company has accepted the credit card information, and later on they indicate they haven't. We have about forty years of experience in developing IT systems, and we still make errors such as this.

 

In the end, I did get in, I just bought new tickets at the ticket sales, and guess what, I got exactly the same seats. I still wonder if I had also entered the full address of the hotel, whether it would have been accepted.

]]>
http://www.b-eye-network.co.uk/blogs/vanderlans/archives/2010/03/after_40_years.php http://www.b-eye-network.co.uk/blogs/vanderlans/archives/2010/03/after_40_years.php Tue, 02 Mar 2010 05:09:20 +0000
Ingres enters the Data Warehouse Market with VectorWise This week I had an interesting meeting with Peter Boncz, founder and director of VectorWise. VectorWise is a small startup based in The Netherlands (see www.vectorwise.com). In fact, they are spin-off of a Dutch organization called CWI (the National Research Center for Mathematics and Computer Science). This is the organization that researched and developed MonetDB, an incredibly fast open source database server that uses state of the art database technology. For MonetDB, CWI also created a spin-off company with the same name as the product; see www.monetdb.com .

 

VectorWise is not a database server in itself, it is more like a smart storage engine. Therefore, they needed another database server to complete the product. And they picked the open source database server we have all known for a long, long time Ingres. The last year, both companies have worked hard to integrate the two products.

 

But what's so special about VectorWise? If we forget about the hundreds of details, they are trying to do the same thing as what appliance vendors such as Netezza, Aster Data, and Greenplum, have tried to do: develop a database server environment that is capable of running typical warehouse queries very fast without the need for extensive, time-consuming tuning and optimization. In other words, VectorWise is also aiming for out-of-the-box query performance. But this is where the comparison between VectorWise and the other products end. Most of the other products need special hardware and/or clustered machines to be able to offer these performance rates. VectorWise doesn't. It can use and exploit clustered machines, but the magic is that it can even get great performances on uni-core machines. It goes too far to explain in a blog how it all works, but the product has been designed to exploit the CPU's of today, for example, by not only using the internal memory, which is what most other database servers do, but also by using the CPU cache. And that makes a serious difference. Therefore, VectorWise will improve queries without the need for special hardware. It can even do a great job on some of your existing uni-, dual-, or quad-core low-end servers stored somewhere in the basement.

 

Most of the other vendors of database appliances and analytical databases aim at the largest warehouses and largest customers - the top 500. VectorWise, because it will be open source and because it can run on low-end machines, will also be very suitable for and attractive to the midsize market.

 

By merging VectorWise with Ingres, you get the new technology of the former and the sales and marketing channels of the latter, a company that has been around for some time, and that has a very stable and extensive installed base.

 

For current Ingres customers, switching from Ingres to Ingres-VectorWise will be a very small change. Because on the outside, on the side where the queries come in, nothing changes. That also means that porting existing Business Objects or Cognos reports to this new product will be straightforward.

 

If you're interested in new database technology for data warehousing, this is the product to study. They expect Ingres-VectorWise to be released in the spring of 2010. It looks all very promising, but as we all know, the proof of the pudding is in the eating. How well will Ingres-VectorWise perform in a real life situation? Hopefully, we can check that very soon.

]]>
http://www.b-eye-network.co.uk/blogs/vanderlans/archives/2009/10/ingres_enters_t.php http://www.b-eye-network.co.uk/blogs/vanderlans/archives/2009/10/ingres_enters_t.php Thu, 15 Oct 2009 05:32:33 +0000
Oracle buys Sun I just received the news that Oracle will buy Sun. The first thing that came to my mind was, what will they do with MySQL? And how will the MySQL community react to this? Oracle has always been seen as the the main competitor. This must have an impact on the database market. Let's see what happens.

]]>
http://www.b-eye-network.co.uk/blogs/vanderlans/archives/2009/04/oracle_buys_sun.php http://www.b-eye-network.co.uk/blogs/vanderlans/archives/2009/04/oracle_buys_sun.php Mon, 20 Apr 2009 06:13:36 +0000
The Independent Analyst Platform July last year, we organized the first edition of the Independent Analyst Platform (IAP). This event was attended by some of the most well-known independent BI analysts. Around that time, you might have seen many intriguing blog entries from various analysts who were writing about the sessions and publishing them real-time. Although I must admit, that some of the blogs focused on the weather; Phoenix, Arizona in July can be a little warm, although, as they say, it is a dry heat!

 

The idea of the event is to bring analysts and vendors together. Vendors get a chance to talk about their new technologies, their products, and ideas. And the analysts have the opportunity the ask questions. And trust me, 20+ analysts can ask a lot of questions, a lot of tough questions.

 

Coming July, we will organize the second edition; see www.independentanalystplatform.com . Many of the same analysts will be present again, plus a few new names. Various vendors have already signed up and have the bravery to present in front of this critical crowd.

 

I am looking forward to this event again. I don't think this event will go unnoticed to regular readers of the BeyeNetwork blogs. Again, on those days, you will see a tsunami of blog entries. But, as opposed to last year, now you are warned. Stay connected to the BeyeNetwork on July 7, 8, and 9 for all the content that will be published.

]]>
http://www.b-eye-network.co.uk/blogs/vanderlans/archives/2009/04/the_independent.php http://www.b-eye-network.co.uk/blogs/vanderlans/archives/2009/04/the_independent.php Fri, 17 Apr 2009 05:07:15 +0000
Once upon a time A few years ago I wrote an article with the title Once upon a time. This article was published in a Dutch magazine, so most people never got to read it. Because I think the article is still useful, still relevant to most IT projects, I decided to publish it here in my blog. Note that the article does not relate to business intelligence or data warehousing specifically, but to IT in general. Nevertheless, here it is. I hope you enjoy it.

 

Once upon a time, there was a gallant prince who, to perform an important task, needed to multiply two numbers regularly. However, he often lost his calculator due to his careless character. One fine day he come upon a brilliant idea. For years he had used his desktop PC to mail the princess. He asked his personal lackey to write a computer program to help him multiply two numbers. The program would run on his PC, and obviously, that PC was too big to lose.

 

Years ago, this lackey had attended a class on Pascal, so he knew what to do. He immediately threw himself into this job. After an hour of playing around he had finished the program. He called it program X, and it was made up of only four lines of code. When the prince saw the program, he was thrilled. The only thing he needed to do was to enter X followed by the two numbers. Obviously, the prince lived happily ever after. Well, he lived happily, until the wicked IT manager, working for the Royal Family, came by. The IT manager had heard that the prince was using a program that was essential for his duties. When the prince agreed that that was the case, the IT manager explained how dangerous such a stand-alone program could be. Who is responsible for maintenance, who takes care of the backups? He decided to take over control. The prince was taken off guard and agreed.

 

That same day, the IT manager studied the details of the program, and decided he had to rewrite the program in Java, because that was the default development language for all applications in use by the Royal Family. Several checks had to be implemented to determine, for example, whether the user was really entering digits and not letters. The decision was made to define a Java class for entering digits so that the check had to be included once, and the program would have a higher level of code reuse. Quite quickly, the program was no longer a simple program of four lines, but grew to fifty lines of code.

 

After a couple of weeks, the department responsible for software management studied the new version of the program and came to the conclusion that the objects in the program where not conform the naming rules. Code was edited and its name was changed to PROG_MAN_X. The prince was not too happy about this. He asked his lackey to develop a program that allowed him to enter just X; that new program would call PROG_MAN_X under the hood. However, he was not allowed to tell anyone that this new program existed. Everyone in the palace was happy again.

 

Then the security department heard about the program. They looked at it and recommended several improvements. First, a log-in screen had to be developed in which the prince could enter his user identification number and password. Imagine that someone from outside the palace could get his hands on this most important computer program. A log was also indispensable; every time a user started the program, the database needed to register that this user had used the program at that moment. The motto was: 'logging is always good for auditing.'

 

To create a link to a database, a specialist needed to create an object relational 'mapping.' This mapping could store the Java objects in the database server. The effect was that the program grew to three hundred lines of code.

 

Because a database was involved now, the decision was made to run the next version within a Java application server by using Enterprise JavaBeans. The result was four hundred lines of code.

 

The entire IT department attended an intensive class on extreme programming where the topic refactoring was discussed extensively. The IT manager was impressed with what he had learned, so he decided that the 'business' critical program PROG_MAN_X also needed to be refactored. Eventually, this took several weeks, but in the end everyone was pleased with the result. To no one's surprise, the program grew to five hundred line of code.

 

Several years later, a new IT manager was appointed, and he decided to switch to standard packages. Slowly, the program PROG_MAN_X was put aside, and an application was bought. Because this was the first standard package for this IT department, the implementation took several months.

 

Meanwhile the prince had asked his lackey to write that original small program again. And he is using it for years now, but that is their secret.

 

Currently, we are hearing rumors that the Royal Family has decided to adopt a Service Oriented Architecture, and that there are plans to create a SOAP service interface on the program. But, we cannot confirm these rumors yet.

 

Ah well, why would you do something easy, when it also can be done the hard way.

 

]]>
http://www.b-eye-network.co.uk/blogs/vanderlans/archives/2009/04/once_upon_a_tim.php http://www.b-eye-network.co.uk/blogs/vanderlans/archives/2009/04/once_upon_a_tim.php Thu, 16 Apr 2009 02:16:57 +0000
The first one Let me begin my first blog for the BeyeNETWORK by saying: WOW! In the beginning of March of this year, my first article for the BeyeNETWORK was published. The article focused on the flaws of the classic data warehouse architectures. It was Part 1 of a series of articles on the Data Delivery Platform. I had expected it would attract some attention, but not this much. Right now, it is the most read article on the BeyeNETWORK. Therefore, WOW!

In a way, the popularity of the article makes sense. We have been working on classic warehouse architectures for so long. If something new is introduced, it always leads to reactions: some positive, some negative. But whatever the reaction, what's important is that it will make people rethink their classic architectures. Architectural decisions that were made more than ten years ago, might not be the right decisions anymore. Requirements have changed, and technology has changed, and new technology leads to new opportunites. Just remember the steam engine and the Internet itself. Both technologies gave us new opportunities.

On April 1 the second part of this series will be published. In this article, I introduce the Data Delivery Platform. Let's see what this article will do.

]]>
http://www.b-eye-network.co.uk/blogs/vanderlans/archives/2009/03/the_first_one.php http://www.b-eye-network.co.uk/blogs/vanderlans/archives/2009/03/the_first_one.php Mon, 30 Mar 2009 02:33:23 +0000