Blog: Rick van der Lans Subscribe to this blog's RSS feed!

Rick van der Lans

Welcome to my blog where I will talk about a variety of topics related to data warehousing, business intelligence, application integration and database technology. Currently my special interests include virtual data warehousing, mashups and service-oriented architectures. If there are any topics you'd like me to address, send them to me at rick@r20.nl

About the author >

Rick is an independent consultant, speaker and author, specializing in data warehousing, business intelligence (BI), application integration and database technology. He is managing director and founder of R20/Consultancy. He is an internationally acclaimed speaker who has lectured worldwide for the last 25 years. He is the chairman of the successful annual European Business Intelligence and Data Warehouse Conference held in London and the annual BI event in The Netherlands. Currently, he is promoting a new architecture for data warehousing called the Data Delivery Platform. He is the author of several books on computing, including the popular Introduction to SQL. Some of his books are available in English, Chinese, Dutch, Italian and German. He is also the author of the successful books SQL for MySQL Developers and The SQL Guide to Ingres. Rick may be contacted by sending an email to info@r20.nl.

Editor's note: More articles, news and resources are available in Rick van der Lans BeyeNETWORK expert channel. Be sure to visit today.

Quite recently, I visited the SQLStream. As the majority of database server vendors, SQLStream is located in California; in San Francisco to be more precise. Of course, their primary product (also called SQLStream) supports the database language SQL, and they try to follow the SQL standard as much as possible. So far, nothing new under the sun. You would almost think that this is again one of many new vendors trying to dethrone Oracle, Microsoft, and IBM. However, that would be an incorrect assumption.


As the name implies, SQLStream is a so-called streaming database server, comparable to IBM InfoSphere Streams and StreamBase Server. The main difference between SQLStream on one hand and most other products on the other hand, is that the former is a pure SQL-based product. The statements to stream are according to the SQL standard. Most other streaming products use proprietary languages, such as Spade, or use extensions.

 
For those who haven't studied this topic in detail yet, a streaming database server allows us to formulate queries on streams of data. Examples of streams are log files of certain systems, messages that are entered, or web logs. Even before this data is stored in tables, we can already access them and analyze the data. Someone once explained streaming database servers as follows: queries executed in the context of a classic database server are like: how many fishes live in this pond, whilst queries executed in the context of a streaming database server  is like: how many fishes swim by in a fast-flowing river during a certain period of time.


SQLStream offers all the features above. In addition, views are used to define streams, and this type of streaming views can serve as input for other streaming views. Through join and union operators, data of different sources can be integrated. In fact, SQLStream supports many of the features normally found in an ETL tool, except that SQLStream uses streams and SQL. Data streams are integrated live the moment they arrive. The result of an integrated stream can be send to an application or data warehouse. See the following link that contains an explanation on how SQLStream can be used together with SQL Power.

In short, SQLStream is absolutely worth studying.

Note: The owners of SQLStream are also the founders of Eigenbase.org. This organization supplies a toolset with which database servers can be developed. As can be expected, SQLStream is also developed with this toolset.

 


Posted March 9, 2010 5:18 AM
Permalink | No Comments |

Last week, my family and I visited a basketball game in Phoenix. The Phoenix Suns were playing against the Philadelphia 76ers. It was a great game, the Suns won with 106-95. However, before I was able to get into the stadium I had the following experience.

 

A few days prior to this game, I bought the tickets through Ticketmaster.com. To buy those tickets, I had to enter my credit card information. Normally, this doesn't cause any problems, because credit companies operate internationally and know that some of their customers are based outside the US and they know those addresses might have different formats and structures.

 

What I had to enter was, as you might expect, my name, credit card number, expiration date, and a security number. In addition, I had to enter my address information so that they can verify a few things. So, dutifully I entered my address including the zip code. Entering the address components went well until I got to the zip code. The zip code was not accepted because the system expects five digits and I tried to enter four digits and two letters, which is the format of the Dutch zip code. But it didn't accept the letters. They had probably switched on a simple check: digits only please.

 

Now, this caused a problem, because for getting through the verification process I had to enter the correct zip code, but for buying a ticket I had to enter an incorrect zip code. Eventually I made the decision to enter the zip code of the hotel I was staying at. And, to my surprise it worked. I got my tickets and printed them.

 

Unfortunately, Ticketmaster.com had accepted my address information, however the credit card company's IT system had not. I discovered that when I entered the stadium and showed them my tickets. They were not accepted. Guess why? The zip code didn't match the rest of the address and it didn't correspond to the correct address.

 

How is it possible that in the year 2010 we still have problems with this simple type of data entry. Didn't they get the right definition of zip code from the credit card companies, or don't they check whether the zip code matches the rest of the address? Is their system not aware that the formats of zip codes can be different in other countries? And how is it possible that they first inform me that the credit company has accepted the credit card information, and later on they indicate they haven't. We have about forty years of experience in developing IT systems, and we still make errors such as this.

 

In the end, I did get in, I just bought new tickets at the ticket sales, and guess what, I got exactly the same seats. I still wonder if I had also entered the full address of the hotel, whether it would have been accepted.


Posted March 2, 2010 5:09 AM
Permalink | No Comments |

This week I had an interesting meeting with Peter Boncz, founder and director of VectorWise. VectorWise is a small startup based in The Netherlands (see www.vectorwise.com). In fact, they are spin-off of a Dutch organization called CWI (the National Research Center for Mathematics and Computer Science). This is the organization that researched and developed MonetDB, an incredibly fast open source database server that uses state of the art database technology. For MonetDB, CWI also created a spin-off company with the same name as the product; see www.monetdb.com .

 

VectorWise is not a database server in itself, it is more like a smart storage engine. Therefore, they needed another database server to complete the product. And they picked the open source database server we have all known for a long, long time Ingres. The last year, both companies have worked hard to integrate the two products.

 

But what's so special about VectorWise? If we forget about the hundreds of details, they are trying to do the same thing as what appliance vendors such as Netezza, Aster Data, and Greenplum, have tried to do: develop a database server environment that is capable of running typical warehouse queries very fast without the need for extensive, time-consuming tuning and optimization. In other words, VectorWise is also aiming for out-of-the-box query performance. But this is where the comparison between VectorWise and the other products end. Most of the other products need special hardware and/or clustered machines to be able to offer these performance rates. VectorWise doesn't. It can use and exploit clustered machines, but the magic is that it can even get great performances on uni-core machines. It goes too far to explain in a blog how it all works, but the product has been designed to exploit the CPU's of today, for example, by not only using the internal memory, which is what most other database servers do, but also by using the CPU cache. And that makes a serious difference. Therefore, VectorWise will improve queries without the need for special hardware. It can even do a great job on some of your existing uni-, dual-, or quad-core low-end servers stored somewhere in the basement.

 

Most of the other vendors of database appliances and analytical databases aim at the largest warehouses and largest customers - the top 500. VectorWise, because it will be open source and because it can run on low-end machines, will also be very suitable for and attractive to the midsize market.

 

By merging VectorWise with Ingres, you get the new technology of the former and the sales and marketing channels of the latter, a company that has been around for some time, and that has a very stable and extensive installed base.

 

For current Ingres customers, switching from Ingres to Ingres-VectorWise will be a very small change. Because on the outside, on the side where the queries come in, nothing changes. That also means that porting existing Business Objects or Cognos reports to this new product will be straightforward.

 

If you're interested in new database technology for data warehousing, this is the product to study. They expect Ingres-VectorWise to be released in the spring of 2010. It looks all very promising, but as we all know, the proof of the pudding is in the eating. How well will Ingres-VectorWise perform in a real life situation? Hopefully, we can check that very soon.


Posted October 15, 2009 5:32 AM
Permalink | No Comments |

I just received the news that Oracle will buy Sun. The first thing that came to my mind was, what will they do with MySQL? And how will the MySQL community react to this? Oracle has always been seen as the the main competitor. This must have an impact on the database market. Let's see what happens.


Posted April 20, 2009 6:13 AM
Permalink | No Comments |

July last year, we organized the first edition of the Independent Analyst Platform (IAP). This event was attended by some of the most well-known independent BI analysts. Around that time, you might have seen many intriguing blog entries from various analysts who were writing about the sessions and publishing them real-time. Although I must admit, that some of the blogs focused on the weather; Phoenix, Arizona in July can be a little warm, although, as they say, it is a dry heat!

 

The idea of the event is to bring analysts and vendors together. Vendors get a chance to talk about their new technologies, their products, and ideas. And the analysts have the opportunity the ask questions. And trust me, 20+ analysts can ask a lot of questions, a lot of tough questions.

 

Coming July, we will organize the second edition; see www.independentanalystplatform.com . Many of the same analysts will be present again, plus a few new names. Various vendors have already signed up and have the bravery to present in front of this critical crowd.

 

I am looking forward to this event again. I don't think this event will go unnoticed to regular readers of the BeyeNetwork blogs. Again, on those days, you will see a tsunami of blog entries. But, as opposed to last year, now you are warned. Stay connected to the BeyeNetwork on July 7, 8, and 9 for all the content that will be published.


Posted April 17, 2009 5:07 AM
Permalink | No Comments |

A few years ago I wrote an article with the title Once upon a time. This article was published in a Dutch magazine, so most people never got to read it. Because I think the article is still useful, still relevant to most IT projects, I decided to publish it here in my blog. Note that the article does not relate to business intelligence or data warehousing specifically, but to IT in general. Nevertheless, here it is. I hope you enjoy it.

 

Once upon a time, there was a gallant prince who, to perform an important task, needed to multiply two numbers regularly. However, he often lost his calculator due to his careless character. One fine day he come upon a brilliant idea. For years he had used his desktop PC to mail the princess. He asked his personal lackey to write a computer program to help him multiply two numbers. The program would run on his PC, and obviously, that PC was too big to lose.

 

Years ago, this lackey had attended a class on Pascal, so he knew what to do. He immediately threw himself into this job. After an hour of playing around he had finished the program. He called it program X, and it was made up of only four lines of code. When the prince saw the program, he was thrilled. The only thing he needed to do was to enter X followed by the two numbers. Obviously, the prince lived happily ever after. Well, he lived happily, until the wicked IT manager, working for the Royal Family, came by. The IT manager had heard that the prince was using a program that was essential for his duties. When the prince agreed that that was the case, the IT manager explained how dangerous such a stand-alone program could be. Who is responsible for maintenance, who takes care of the backups? He decided to take over control. The prince was taken off guard and agreed.

 

That same day, the IT manager studied the details of the program, and decided he had to rewrite the program in Java, because that was the default development language for all applications in use by the Royal Family. Several checks had to be implemented to determine, for example, whether the user was really entering digits and not letters. The decision was made to define a Java class for entering digits so that the check had to be included once, and the program would have a higher level of code reuse. Quite quickly, the program was no longer a simple program of four lines, but grew to fifty lines of code.

 

After a couple of weeks, the department responsible for software management studied the new version of the program and came to the conclusion that the objects in the program where not conform the naming rules. Code was edited and its name was changed to PROG_MAN_X. The prince was not too happy about this. He asked his lackey to develop a program that allowed him to enter just X; that new program would call PROG_MAN_X under the hood. However, he was not allowed to tell anyone that this new program existed. Everyone in the palace was happy again.

 

Then the security department heard about the program. They looked at it and recommended several improvements. First, a log-in screen had to be developed in which the prince could enter his user identification number and password. Imagine that someone from outside the palace could get his hands on this most important computer program. A log was also indispensable; every time a user started the program, the database needed to register that this user had used the program at that moment. The motto was: 'logging is always good for auditing.'

 

To create a link to a database, a specialist needed to create an object relational 'mapping.' This mapping could store the Java objects in the database server. The effect was that the program grew to three hundred lines of code.

 

Because a database was involved now, the decision was made to run the next version within a Java application server by using Enterprise JavaBeans. The result was four hundred lines of code.

 

The entire IT department attended an intensive class on extreme programming where the topic refactoring was discussed extensively. The IT manager was impressed with what he had learned, so he decided that the 'business' critical program PROG_MAN_X also needed to be refactored. Eventually, this took several weeks, but in the end everyone was pleased with the result. To no one's surprise, the program grew to five hundred line of code.

 

Several years later, a new IT manager was appointed, and he decided to switch to standard packages. Slowly, the program PROG_MAN_X was put aside, and an application was bought. Because this was the first standard package for this IT department, the implementation took several months.

 

Meanwhile the prince had asked his lackey to write that original small program again. And he is using it for years now, but that is their secret.

 

Currently, we are hearing rumors that the Royal Family has decided to adopt a Service Oriented Architecture, and that there are plans to create a SOAP service interface on the program. But, we cannot confirm these rumors yet.

 

Ah well, why would you do something easy, when it also can be done the hard way.

 


Posted April 16, 2009 2:16 AM
Permalink | No Comments |

Let me begin my first blog for the BeyeNETWORK by saying: WOW! In the beginning of March of this year, my first article for the BeyeNETWORK was published. The article focused on the flaws of the classic data warehouse architectures. It was Part 1 of a series of articles on the Data Delivery Platform. I had expected it would attract some attention, but not this much. Right now, it is the most read article on the BeyeNETWORK. Therefore, WOW!

In a way, the popularity of the article makes sense. We have been working on classic warehouse architectures for so long. If something new is introduced, it always leads to reactions: some positive, some negative. But whatever the reaction, what's important is that it will make people rethink their classic architectures. Architectural decisions that were made more than ten years ago, might not be the right decisions anymore. Requirements have changed, and technology has changed, and new technology leads to new opportunites. Just remember the steam engine and the Internet itself. Both technologies gave us new opportunities.

On April 1 the second part of this series will be published. In this article, I introduce the Data Delivery Platform. Let's see what this article will do.


Posted March 30, 2009 2:33 AM
Permalink | No Comments |