BeyeNETWORK UK Blogs BeyeNETWORK UK Blogs. Copyright BeyeNETWORK 2005 - 2009 http://www.b-eye-network.co.uk/rss/content.php 150 31 BeyeNETWORK UK Blogs http://www.b-eye-network.com/images/logo_b-eye_rss.gif http://www.b-eye-network.co.uk/rss/content.php Data Integration tools saving the planet? I was driving home last week listening to the radio, when an article came on discussing alternative energy sources. A stray right brain neuron must have fired and I began wondering how what we do in DI impacts on the environment. After all almost every other technology has some kind of environmental grading. Certainly my fridge and my car do, and every company is aware of the impact of their hardware consumption, but there seems to be little consideration as to whether software can have an impact on carbon footprint.

It is generally recognised that code (correctly) created using DI tools is more performant than hand cranked code. This means that less hardware is used to perform the same actions. Less hardware means lower carbon footprint, means more environmentally friendly.

This is even more profound when one considers the MPP technologies that can dramatically increase performance and scalability. If one can negate the need for new hardware or even remove existing infrastructure then the impact is not only positive from an environmental perspective, but also from a cost perspective.

And so I have arrived at the conclusion that DI tools are at the positive end of the green scale and perhaps performance should be given a higher priority when evaluating software.

Extending this argument slightly, should software vendors be touting their green credentials? Should there perhaps be benchmarks that could provide an "energy rating" for software?

If this did happen then one would hope we would see a return to leaner code and more efficient programming practices, meaning that we are all less likely to hear the words "just throw some more tin at it"



]]>
http://www.b-eye-network.co.uk/blogs/barton/archives/2009/06/data_integration_tools_saving.php Tue, 30 Jun 2009 12:10:15 MST http://www.b-eye-network.co.uk/blogs/barton/archives/2009/06/data_integration_tools_saving.php
Back to basics; 'EIS revisited'
Let's name a few; BI, Neuro BI, Ambient BI, Sentient BI, Process driven BI, process Intelligence, Pervasive BI, BI 2.0, BI as a Service, BI for SOA, SOA for BI, Mobile BI, google BI, Personal BI, BI-Tools-where-you-do-not-need-a-DWH-and-ETL-Buy-it-now, Mashups-the-new-BI-Desktop (oh my god), BI-in-the-box,Business Analytics,BI-in-the-cloud (love this one!), BI-virtualization, Agile-BI,Operational-BI,Decision Intelligence, Decision Management, Even driven analytics, Complex Event Processing, BAM, Collabarative Decision Making......

Do not get me wrong; I certainly do not dismiss them all.....just most of them :)

Most of these 'attractions' are highly sponsored, look super-dooper on the outside and if you won't sit in them you will loose out big time (so they say). However, when you finally decide to ride them you feel that they are not that stable, not really safe and there is hardly any enjoyment when you exit them (but you can tell you neighbour you dared to ride it!). To put it in other words, not really grounded in theory and the relevance for practice is extermely hard to find.

There are however some attractions in this Decision Support 'Neverland' that are very much dusty, spiderwebs all over the place, but the attraction is still extremely solid and if we would overhaul it with new architectural insights and technology, it could be a smash-hit. These attractions are named DSS, EIS, ESS....

I feel that we - as an industry - failed miserably in continuing on the path that was made for us by people like John Dearden, John Rockart, David Delong, Ralph Sprague, Hugh Watson, Steven Alter, Daft and Lengel, Peter Keen, Michael Scott Morton, Herbert Simon, Henry Mintzberg and many more.

Dan Power is one of those brave souls who is standing with the ticketbox - selling tickets for his 'attraction'. I recommend people to read his last written article as well as the blog post written by Wouter van Aerle.

Finally, I wanna contribute to the Decision Support 'Neverland': BI goes Retro


]]>
http://www.b-eye-network.co.uk/blogs/damhof/archives/2009/06/back_to_basics.php Tue, 30 Jun 2009 06:54:54 MST http://www.b-eye-network.co.uk/blogs/damhof/archives/2009/06/back_to_basics.php
Analyst Blogging: Let's Have Some Etiquette! link for details.

This is the second blog this month that I have read where an analyst makes an attack, not only on the vendor, but also one of its employees. The other blog (and an associated article) was by Stephen Few entitled "Business is Personal - Let's Stop Pretending It Isn't." See this link for details.

The good thing about social computing is that it provides a fast way of sharing and collaborating about industry developments. However, these technologies have the same problems as e-mail and instant messaging, they enable people to react immediately to something that upsets or annoys them. With blogging, unlike email and instant messaging, everyone gets to see the results!

As analysts our job is to write balanced reviews of industry developments that provide useful information to the reader. My concern is that some analysts are behaving as though they are on cable television or writing for the tabloids. I believe we can critique a product without attacking a company, its products or its employees. Personal attacks by analysts are unprofessional, even if the company fights back against a review they take exception to. What do you think?

]]>
http://www.b-eye-network.co.uk/blogs/white/archives/2009/06/analyst_bloggin.php Thu, 25 Jun 2009 13:52:42 MST http://www.b-eye-network.co.uk/blogs/white/archives/2009/06/analyst_bloggin.php
The ParAccel TPC-H Benchmark Controversy this link.

Not everyone agreed with Merv's balanced review. Curt Monash commented that "The TPC-H benchmark is a blight upon the industry." See his blog entry at this link.

This blog entry resulted in some 41 (somewhat heated) responses. At one point Curt made some negative comments about ParAccel's VP of Marketing, Kim Stanick, which in turn led to accusations that his blog entry was influenced by personal feelings.

I have two comments to make about this controversy. The first concerns the TPC-H benchmark and the second is about an increasing lack of social networking etiquette by analysts.

TPC benchmarks have always been controversial. People often argue that that do not represent real life workloads. What this really means is that you mileage may vary. These benchmarks are expensive to run and vendors throw every piece of technology at the benchmark in order to get good results. Some vendors are rumored to have even added special features to their products to improve the results. The upside of the benchmarks is that they are audited and reasonably well documented.

The use of TPC benchmarks has slowed over recent years. This is not only because they are expensive to run, but also because they have less marketing impact than in the past. In general, they have been of more use to hardware vendors because they demonstrate hardware scalability and provide hardware price/performance numbers. Oracle was perhaps an exception here because they liked to run full-page advertisements saying they were the fastest database system in existence.

TPC benchmarks do have some value to both the vendor and the customer. The benefits to the vendor are are increased visibility and credibility. Merv Adrian described this as a "rite of passage." It helps the vendor get on the short list. For the customer these benchmarks show the solution to be credible and scalable. All products work well in PowerPoint, but the TPC benchmarks demonstrate that the solution is more than just vaporware.

I think most customers are knowledgeable enough to realize that the benchmark may not match their own workloads or scale as well in their own environments. This is where the proof of concept (POC) benchmark comes in. The POC enables the customer to evaluate the product using their own workloads.

TPC benchmarks are not perfect, but they do provide some helpful information in the decision making process.

I will address the issue of blog etiquette in a separate blog entry.




]]>
http://www.b-eye-network.co.uk/blogs/white/archives/2009/06/the_paraccel_tp.php Thu, 25 Jun 2009 13:43:10 MST http://www.b-eye-network.co.uk/blogs/white/archives/2009/06/the_paraccel_tp.php
Information Richness and Business Intelligence Management Science:

ORGANIZATIONAL INFORMATION REQUIREMENTS, MEDIA RICHNESS AND STRUCTURAL DESIGN.

A very interesting article that combines the processing of information by decision makers, the use of structural mechanisms and organizational design. The latter one - organizational design - I will not discuss thoroughly in this blog. Just remember that the choice of structural mechanism to overcome uncertainty/equivocality will impact the organizational design.



]]>
http://www.b-eye-network.co.uk/blogs/damhof/archives/2009/06/information_ric.php Thu, 25 Jun 2009 02:16:29 MST http://www.b-eye-network.co.uk/blogs/damhof/archives/2009/06/information_ric.php
A response to 'The flaws of the classic data warehouse' (3)
This post is a second reaction to the first article in a series of three which were written by a highly respectfull thoughtleader in the field and publisher on the B-Eye-Network; Rick van der Lans. The papers are titled 'The Flaws of the Classic Data Warehouse Architecture'.

This blog post is a reaction to the first part. It deals with the flaws of the classic data warehouse architecture (CDWA).

Rick signals five flaws which will lead in article two and three to a new architecture. This post is addressing the second flaw.

- My reaction to flaw #1 can be read here
- My reaction to flaw #2 can be read here

Flaw 3 according to Rick
Rick signals the need to do analytics on external data and on datastores that are unstructured. I quote Rick: 'Most vendors and analysts propose to handle these two forms of data as follows: if you want to analyze external or unstructured data, copy it into the CDW to make it availablefor analytics and reporting'. Ricks is wondering why? Unstructured data can be 'handled' on the source and external data can be done by mashup tools.

My reaction to flaw 3
Where are these vendors and analysts that propose to copy unstructured data into the CDW? I do not know them....really I don't. And if they exist - I agree with Rick; don't do it. Especially for the unstructured data I think other architectural choices are more optimal at the moment. But where is the flaw in the CDWA architecture? The CDWA was not meant for unstructured data and still is not. I still do not see the flaw...

But for the external data, I really believe in the years to come that there is still a solid business case of getting this data into your data warehouse. Sure enough - especially for more
situational BI - mashups offer very fast time to market for new informational products. Although I believe the securtity issue is not to be underestimated as well as the need to perform analytics on the combinations of internal and - multiple sources of - external data.

Mashups also need solid architectures.......

So:
- I challenge the notion that the vendors and analysts in data warehousing massively propose to put unstructured data in the DWH. The CDWA was not meant for that purpose. Not a flaw.
- There are solid business cases for getting your external data into your DWH. The CDWA is still a valid approach. Not a flaw.
- Mashups - as is new (BI) technology in general - surely offer new features and promising functionality.


]]>
http://www.b-eye-network.co.uk/blogs/damhof/archives/2009/06/a_response_to_t_1.php Mon, 22 Jun 2009 05:03:03 MST http://www.b-eye-network.co.uk/blogs/damhof/archives/2009/06/a_response_to_t_1.php
Data warehousing is failure prone......or is it not?

I often critisize vendors and others for not being thorough enough. Now it's time to critisize science a bit as well as those consultants and analysts that are 'riding the wave' of negative sentiment surrounding data warehousing.

In several papers I am reading at the moment (ranging from MIS Quarterly, Information & Management, Decision Support Systems and many more journals) I encounter something similar.

Please read the following quotes:

"....the road to DW success has been littered with failures [43,63,80]"
"....nearly half of all DW initiatives end up as failures [38]"
"According to a press release 2005 by Gartner: through 2007 more then 50% of data warehouse projects will have limited acceptance, or will be outright failures."

The above quotes I got from one paper in one paragraph in Decision Support Systems (paper is from 2008 - very recent), which is an important journal in our field of expertise. Since I encounter these statements over and over again, I decided to follow up on the citations.

Let me begin by dismissing the quote refering to Gartner's press release. That's not really a sound scientific basis.....So we are left with two more quotes.

[43] C.Hall Corporate use of data warehousing and Enteprise Analytic Technologies, Arlington, Massachusets, 2003 URL: www.cutter.com
Ronald: can not validate the information. You need to buy a report......

[63] S.Kotler When enterprise hit open road: move beyond the silos and let the idead roll, Teradata Magazine 3 (3) 2003
Ronald: I read this article...now it becomes shocking. Let me quote this article:

"According to a recent article in Information Week, "41% of all companies surveyed by the Cutter Consortium, an IT consultant and market analysis firm, have failed data warehouse projects, and only 15% call their data warehousing efforts to date a major success."

Ok....this citation is actually referencing the first one [43] (I think...can't access it). We got a loop here....

[80] M.Quaddus, A.Intrapairot, Management Policies and the diffusion of DWH: a case study using dynamic based decision support systems, Decision Support Systems 31 (2) 2001, 223-240
Ronald: ok, this is becoming complex. I quote this paper:

"..quite a few DW projects end up in failure even before full implementation owing to lack of immediate substantial economic returns on massive investment [24,25,67]"

So a citation is used in a paper that refers to a claim in another paper but with different citations (you guys still with me here?). So we end up with three more citations which are off course even further in time, let's examine these:

[24] R. Hackathorn, Data warehousing energises your enterprise, Datamation 41 (2) 1995. 38-42
- I could not find this article, so I was not able to validate the claim that was being made. It sounds to me like some sort of column though. But again - can not validate.

[25] C. Horrock, Making the Warehouse Work, 1996, available from http://www.computerworld.comrsear . . . -htm1r9606r960624DW1SL96dw10.html.
- I could not find this article, so I was not able to validate the claim that was being made.

[67] The Siam Commercial Bank's Staff, Data Warehouse Questionnaire and Interview, (6 January-28 February 1998). Personal Communication.
- I could not find this 'article', so I was not able to validate the claim that was being made.

[38] L.Greenfield, The Data Warehousing Information Center www.dwinfocenter.org, december 19, 2003
Ronald: This is a link to a whole site.......oh my, how on earth can you make a reference to a whole site??


To summarize things; I was not able to establish any (empirical) evidence that would support the claim made in the paper of DSS (and in several other papers as well) - which is by the way quite a recent paper (2008). Somehow we are being made to believe that Data Warehousing are failure prone. Increasingly I encounter consultants and analysts that fuel this negative sentiment surrounding data warehousing. I challenge anyone to deliver some real (empirical) evidence. As for now - I suggest we all use caution in communicating that Data Warehouse undertakings tend to fail a lot.

On the subject, there is one interesting piece of paper from TDWI, written by Hugh J.Watson (I think in 2006) that seems to be relevant and hitting the nail on the head by saying - and I quote:

"The data suggests that whether data warehouses are failure-prone depends on one's definition of "failure." Varying with the architecture implemented, there is approximately a 32-47 percent chance that a warehouse will be behind schedule, and a 30-43 percent chance that it will be over budget. However, this does not mean that the warehouse will not succeed. By a more global measure of success, only 10-20 percent of warehouses are potentially in trouble, while the others are either up-and-coming systems or runaway successes"

Ronald: And yes, Watson is using an empirical basis.

]]>
http://www.b-eye-network.co.uk/blogs/damhof/archives/2009/06/data_warehousin.php Mon, 22 Jun 2009 03:24:11 MST http://www.b-eye-network.co.uk/blogs/damhof/archives/2009/06/data_warehousin.php
Cognos and the midmarket
One take is that it is at least partly a step towards alignment with the policies of IBM's service organization. IBM Services has had a mid-market offering for years now. And my impression is that Cognos is focusing a lot of energy on aligning itself to towards IBM Services as a sales channel. The alignment towards IBM Services also shows in Cognos's increased focus on prepackaged business content.

However the midmarket idea is in line with what appears to be an internal shift at Cognos back towards the end user. Cognos 8 shifted the product line towards the needs of the enterprise. The shift has been going on for some time, probably as a reaction to customers who prefer to stick with Cognos 7. "The pendulum is swinging back" was the message in Berlin, and recent versions of Cognos 8 Bi have revived the PowerPlay product family.

Targeting the midmarket is not exactly the same thing as targeting the business user. Targeting business users can also mean creating departmental solutions in large companies. However there certainly is an overlap. The new product is intended to be easy to install and administer and to cover planning reporting and analysis in one package. It will have Web access and an Excel front-end. I'm guessing its also going to be relatively cheap.

Cognos is adamant that it is building a new purpose built solution, not simply bundling existing tools. But it also says the new product is "based on existing proven technology like IBM Cognos TM1 and IBM Cognos 8". It will be interesting to find out exactly what that means.

Issue they should be looking at is
  • Compatibility with existing products. Customers using TM1 might want to upgrade to this product. If they can't they might start worrying about the future of the existing product.
  • Organizational isues. Will IBM Cognos have a sales team willing to make the effort to sell this product?
  • Easy installation. Self service is a key goal in the project, and installation needs to be automatic.
  • Automatic metadata exchange. TM1 models need to be automatically visible in the Cognos 8 bits of the offereing, even at the cost of flexibility.


]]>
http://www.b-eye-network.co.uk/blogs/finucane/archives/2009/06/cognos_and_the_midmarket.php Mon, 15 Jun 2009 00:45:00 MST http://www.b-eye-network.co.uk/blogs/finucane/archives/2009/06/cognos_and_the_midmarket.php
A response to 'The flaws of the classic data warehouse' (2)
This post is a second reaction to the first article in a series of three which were written by a highly respectfull thoughtleader in the field and publisher on the B-Eye-Network; Rick van der Lans. The papers are titled 'The Flaws of the Classic Data Warehouse Architecture'.

This blog post is a reaction to the first part. It deals with the flaws of the classic data warehouse architecture (CDWA).

Rick signals five flaws which will lead in article two and three to a new architecture. This post is addressing the second flaw.

- My reaction to flaw #1 can be read here.

Flaw 2 according to Rick
The CDWA stores a lot of redundant data. The more redundant the data, the less flexible the architecture is. We could simplify our data warehouse architectures considerably by getting rid of most of the redundant data. Hopefuly, the new database technology on the market, such as data warehouse appliances and column-based database technologies, will decrease the need to store so much redundant data. Rick commented on this flaw in his closing keynote statement on a BI event we had last week, stating basically that the DWH professional did an extremely lousy job last decades in building these redundancy monsters. Like in his article he strengthened this argument by research done by Nigel Pendse claiming that the average BI application only needed a fraction of the stored (redundant) data.

My reaction to flaw 2
First of all, I agree that new technologies can limit the volume of redundant data considerably.

But to say that in the last decades the data warehouse professional did an etremely lousy job because of the huge redundancy they created in their data warehouses...well, that's just plain stupid and for the people that are applauding this statement I would like to say; 'I bet you never actually build a data warehouse'.

BI populism.....thats what it is.

As for the flexibility argument; more redundant data kills flexibility. Hmm...it's a bit of a bs-argument. Because flexibility is not only affected by redundant data. If I had build my data warehouses in the last decades without redundant data I would have ended up with huge complex transformation rules and a big strain on processing capacity. Both issues woud have killed the flexibility big time and I am leaving aside the degradation of performance, degradation in ease of use, degradation in maintainability and the degradation of the testability of the system. But I agree - I would not have redundant data...I would not have any quality of service either....but who cares.

BI populism.....thats what it is.

But is the CDWA architecture flawed by this redundancy problem? I do not think so at all. We would still need a datastore of some kind (Rick seems to acknowledge that by advocating the use of appliances), we would still have several layers after this datastore, preparing the data for several different functionalities (reporting, mining, advanced analytics, datasharing to third parties, etc.). Let's take the datamart layer, will it dissapear? I don't think so. The question is whether it needs to be materialized. And that's where new technology will be extremely valuable. It seems that Rick is translating the word 'Architecture' with 'Technical Architectue' as a 1:1 relationship.

The hub-spoke architecture of the CDWA model is still extremely valid. Off course, technology within this architecture will evolve and will enable us to deliver an even better quality of service.



]]>
http://www.b-eye-network.co.uk/blogs/damhof/archives/2009/06/a_response_to_t.php Sun, 14 Jun 2009 03:57:06 MST http://www.b-eye-network.co.uk/blogs/damhof/archives/2009/06/a_response_to_t.php
More Information on Data warehousing in the Cloud and SaaS BI
Claudia Imhoff and I recently published a research report on the BeyeNETWORK entitled "Pay as You Go: Software-as-a-Service Business Intelligence and Data Management." The report was sponsored by Blinklogic, Host Analytics, PivotLink and SAP BusinessObjects who offer SaaS BI solutions. It was also sponsored by Kognitio who (like Aster, GreenPlum and Vertica mentioned in my previous blog) have a data warehousing in the cloud offering. The report discusses SaaS BI and data warehousing and reviews the pros and cons of using this type of deployment model.

The report can be found on beyeresearch.com.


]]>
http://www.b-eye-network.co.uk/blogs/white/archives/2009/06/more_informatio.php Thu, 11 Jun 2009 16:58:09 MST http://www.b-eye-network.co.uk/blogs/white/archives/2009/06/more_informatio.php
A response to 'The flaws of the classic data warehouse' (1) It is only by means of good and respectfull discussion that knowledge and insight will evolve. This post should be regarded as such. Furthermore, it is from a good friend from whom I understood that Rick meant to be controversial with these papers.....

This post is a first reaction to the first article in a series of three which were written by a highly respectfull thoughtleader in the field and publisher on the B-Eye-Network; Rick van der Lans. The papers are titled 'The Flaws of the Classic Data Warehouse Architecture'.

This blog post is a reaction to the first part. It deals with the flaws of the classic data warehouse architecture (CDWA) according to Rick. If you wanna know what exactly constitutes a CDWA - I would suggest to read this first part.

Rick signals five flaws which will lead in article two and three to a new architecture. This post is addressing the first flaw. In upcoming postings on this blog I will also adress the other four and I will also respond to the solution he is proposing.

Flaw 1 according to Rick
The CDWA does not support the concept of Operational Business Intelligence. This conclusion is drawn from the fact that the CDWA can not include 100% up-to-date information. Rick concludes that we have to remove storage layers and minimize the copy steps.

My reaction to flaw 1
A metaphor; I am driving my car and suddenly I say 'damn; I wanna fly'. Looking at my car, I can not seem to find the 'fly' button and I therefore conclude that my car is flawed.

Although a bit of a corny metaphor it reflects the core of my criticism. Aparently there is a new requirement called Operational Business Intelligence* that can not be served by the existing architecture. Is the existing architecture then flawed? I do not think so. Does the existing architecture fit the needs of the organisation? I do not think so. So flaw 1 in my opinion is not a flaw, it might simply be not a good fit between requirement and architecture.

Let's take this corny metaphor one step further. Suppose there is a genuine need for me to fly (e.g 100% up-to-date information for decision-like processes*). Is it then considered common sense to build wings on my car and put in a jet engine? I wouldn't ......I would just buy a plane ticket and get to an airfield or maybe I would use a substitute to achieve my objectives....the train.

To conclude; requirements are evolving and architecture needs to follow. The data warehouse architecture depicted as a hub-spoke model is still valid for it's intented use (although the design is evolving). New requirements can lead to new choices in architecture (and subsequently in design).


Although I do not agree on the flaw issue, I do agree that new requirements can require new architecture which - in the end - is exactly what Rick is proposing (although I do not agree completely on this new architecture - but lets keep that in mind for a next posting).


* as you can see I am eluding the tedious discussion regarding the term Operational Business Intelligence. I am also eluding the so-called 'fact' that organizations all need 100% up-to-date information for decision like processes.





]]>
http://www.b-eye-network.co.uk/blogs/damhof/archives/2009/06/a_response_to_the_flaws_of_the_classic_data_warehouse_1.php Wed, 10 Jun 2009 13:20:18 MST http://www.b-eye-network.co.uk/blogs/damhof/archives/2009/06/a_response_to_the_flaws_of_the_classic_data_warehouse_1.php
Disruptive innovation - to be or not to be
'So what', I hear you say. Well, there is off course not much of a problem when a vendor defines their own technology or produtc as being 'disruptive'. I know where it comes from and I understand the vendors wish to increase its turnover by claiming to sell a disruptive technology/product/etc..

But when analysts do it, I am getting more suspicious and sometimes extremely annoyed. It is the analyst that needs to be neutral, a bit restrained and off course critical. The analyst needs to put this 'disruptive' stuff a bit in perspective for the reader.

Let's try to get some sort of definition to the word 'disruptive'. So I did some research and ended up with Kalle Lyytinen's paper from 2003 in MIS Quaterly called;
"The disruptive nature of Information Technology Innovations: The Case of Internet Computing in Systems Development Organizations"
In my opinion a very good paper. And by the way; in his study he shows that Internet Computing has radically impacted the IT innovations of firms both in terms of development processes and services. Maybe not at all suprising, but if you compare this type of innovation with (let's take an arbitrary example that is often defined as disruptive*) DW appliances......

Lyytinen defines disruptive innovation as:
They radically deviate from an established trajectory of performance improvement, or redefine what performance means in a given industry (Chistensen and Bower 1996). They are radical (Zaltman et al. 1977) in that they significantly depart from existing alternatives and are
shaped by novel, cognitive frames that need to be deployed to make sense of the innovation (Bijker 1987). Consequently, disruptive innovations are truly transformative (Abernathy and Clark 1985). To become widely adopted, disruptive architectural innovations demand provisioning of complementary assets in the form of additional innovations
that make the original innovation useful over its diffusion trajectory (Abernathy and Clark 1985;
Teece 1986). By doing so, disruptive innovations destroy existing competencies (Schumpeter 1934) and
break down existing rules of competition.

Are appliances or new technology for data storage and data management really disruptive? Or are they just the natural flow of continuing innovation. I think the latter.

Let's be cautious in using big words like 'disruptive'.......



* Did a quick search on 'Disruptive' in B-eye-Network and found 128 hits, most of them appliances or other revolutionary database products/technologie



]]>
http://www.b-eye-network.co.uk/blogs/damhof/archives/2009/06/disruptive_inno.php Wed, 10 Jun 2009 02:11:37 MST http://www.b-eye-network.co.uk/blogs/damhof/archives/2009/06/disruptive_inno.php
Data Warehousing in the Cloud Gains Momentum
I have interviewed all three vendors over the past week and while there are some common characteristics in the approaches being taken by the three vendors to cloud computing, there are also some differences.

Common characteristics include:
  • Software only analytic DBMS solutions running on commodity hardware
  • Massively parallel processing
  • Focus on elastic scaling, high availability through software, and easy administration
  • Acceptance of alternative database models such as MapReduce
  • Very large databases supporting near-real-time user-facing applications, scientific applications, and new types of business solution
The emphasis of Greenplum is on a platform that enables organizations to create and manage data warehouses and data marts using a common pool of physical, virtual or public cloud infrastructure resources. The concept here is that multiple data warehouses and data marts are a fact life and the best approach is to put these multiple data stores onto a common and flexible analytical processing platform that provides easy administration and fast deployment using good enough data. Greenplum sees this approach being used initially on private clouds, but the use of public clouds growing over time.

Aster's emphasis is on extending analytical processing to the large audience of Java, C++ and C# programmers who don't know SQL. They see these developers creating custom analytical MapReduce functions for use by BI developers and analysts who can use these functions in SQL statements without any programming involved.

Although MapReduce has typically been used by Java programmers, there is also a large audience of Microsoft .NET developers who potentially could use MapReduce. A recent report by Forrester, for example, shows 64% of organizations use Java and 43% use C#. The objective of Aster is to extend the use of MapReduce from web-centric organizations into large enterprises by improving its programming, availability and administration capabilities over and above open source MapReduce solutions such as HADOOP.

Vertica see its data warehouse cloud computing environment being used for proof of concept projects, spill over capacity for enterprise projects and for software-as-service (SaaS) applications. Like Greenplum it supports virtualization. Its Analytic Database v3.0 for the Cloud adds support for more cloud platforms including Amazon Machine Images and early support for the Sun Compute Cloud. It also adds several cloud-friendly administration features based on open source solutions such as Webmin and Ganglia.

It is important for organizations to understand where cloud computing and new approaches such as MapReduce fit into the enterprise data warehousing environment. Over the course of the next few months my monthly newsletter on the BeyeNETWORK will look at these topics in more detail and review the pros and cons of these new approaches.



]]>
http://www.b-eye-network.co.uk/blogs/white/archives/2009/06/data_warehousin.php Tue, 9 Jun 2009 00:00:01 MST http://www.b-eye-network.co.uk/blogs/white/archives/2009/06/data_warehousin.php
Who owns the data?

Always a tricky question and every organisation is answering this differently. It is however an important one. Those who owns the data are resposible for it's quality, right? Not a light-harded question if you consider the compliancy pressure these days and the issue of clear responsibilities regarding the data. Those who own the data are responsible for it's data whereever it is used within the organisation. Is the latter one really true?

In my article that I published in september 2008 I strongly advise to register authentic factual data in the Central Data Warehouse. Business rules should be implemented downstream, after the Central warehouse.

Who owns the data in the Central Data Warehouse? Is it the BICC? No, they are not owner. Since 'we' store authentic factual data in the Central Data Warehouse (we do not change, enrich or integrate* it!) the owner of the data should still be the same as the owner of the source. Let me put it more simple:

The people that create the authentic data also own the data, wherever it goes.

Classic architecture
Let me just highlight the importance of architecture here. What would happen if I made a classic/old style hub/spoke data warehouse. Where a central data warehouse is developed in which the data is neatly integrated, cleansed etc. In other words, data is changed on the way into the Central Data Warehouse. Or maybe even changed on the way into the staging area!! Well, taking into account the above rule, the data warehouse team, that creates/changes the data, becomes the owner now.

Classic EDW

The result of the latter - more classic - data warehouse architectures is that there are two owners of the data and that it's decoupled between the data created by application and the data coming into the staging. The above is a classic case in the majority of organisations. It creates massive problems in governing your data warehouse, especially your change management. What happens of a change occurs in the source data? Is the source owner responsible for cascading the change to the data warehouse? Well, he and the data warehouse team probably got some kind of SLA stipulating the source owner to signal the data warehouse team that a change is eminent.....The Data Warehouse team needs to go to work. What happens if you have 50 sources, or even 100. What happens if they are big changes....chaos and the sustainability of your data warehouse is in great risk.

New Architecture
So how does it look like in the new generation Enterprise Data Warehouses?

Edwnew
Now, what happens if a change occurs in the data? The owner of the data is going to do an impact analysis on all it's interfaces he owns and he is responsible for. He is also responsible in the new generation EDW for engineering the change up untill the central data warehouse!!! This is a hugely more manageable governance model regarding change mangement in an Enterprise Data Warehouse.

Is the source owner also owner of data coming into the datamarts? Yes, this is however a tricky one. Integration, cleansing is taking place downstream, between the Central Data Warehouse and the datamarts. In the second article Lidwine van As and I published in Database Magazine (november 2008) we state that this part of the EDW is pushed by demand (where as data getting into the Central Data Warehouse is pushed by supply); in other words, those who demand put up the requirements; the business rules they want to apply on the factual data.

In datamarts, data is changed according to rules defined by a user. Is the intial owner of the data still accountable for this data? Well...yes they are. But...The rules being used - going into the datamarts - are not their responsibility. If somebody comes up with a highly fantastic rule for calculating turnover, I would say that Finance should have the ownership on this formula. But I am embarking on a whole other subject here. The subject of definition ownership...let's not go there.

IT Artificact and boundary; Data Warehouse and Business Intelligence
As you can see in the above pictures, the boundaries of the IT system has changed. The boundary of the IT system is not the system itself. The boundary is determined by the propagation of it's data. The data warehouse is to be regarded as an integral part of the IS environment. You could also say that the data warehouse is evolving into a functional interface on top of the operational-transactional systems. This functional interface rationalizes the data structure of these systems and can thereby serve other functionalities like Business Intelligence. It does not have to be perse Business Intelligence! It can also be used for data sharing to third parties (e.g. co-making), data quality projects, operational control, accounting (remember; it's factual, auditable data), etc..

This shift in system-boundary is also acknowledging that building sustainable data structures differs from building succesfull Business Intelligence systems. They both differ in competencies and skills, in organisational design (Account management, Exploitation, Development, Maintenance), technical architecture (tools, version management, security, performance etc..) and cultural aspects.

Business metadata
A small sidenote on the metadata part, the business metadata part in particular (definitions, domain values, etc..). I see a lot of EDW architectures where the responsibility for the registration, administration and publication of the business metadata is put on the shoulders of the DWH team. In the above graphic of the new generation EDW and considering this blog post, this is not the right approach. Those that create the authentic data should also take care of the business metadata. It's their job! And yes, this extends the Enterprise Data Warehouse big time! This entails the Enterprise Data Management; all data and probably also all services within the organisation. Meta data Administration should be an existing function in any organisation (remember the scale - I am not talking about midsize companies here!) The Data Warehouse team however does have a responsibility for registering, administring and publicing the business metadata for the datamart.

To summarize:

  1. Those who create the data, also own the data - all the way through the enterprise;
  2. If this data is changed/cleansed, they still own the raw data as well as the enriched/cleansed data. But
    • they can not be held responsible for the business rule or the definition. Ownership of these rules/definition can transfer to a definition owner or the user that is asking for the data.
    • they can not be account for the DWH 'service'. This 'service' is owned by the DWH team or any-which-way-you-wanna-call-this-organisational-unit

This is a blog post - so I am allowed not to be thorough, scientificly correct etc...So, I am leaving out a lot of nuances, restrictions and pre-requisites. Let me just give you a few. There are some major Enterprise Architectural principles that need to be considered here.
1 - Data must be decoupled fromt it's application/proces (huge one!!!)
2 - Ownership of data cascades all the way through it's use within the organisation
3 - The data warehouse hub MUST register authentic, factual data
4 - The data warehouse hub is to be designed in such a way that it supports a federated deployment (worth a whole new blog post) - without this one; forget it.
5 - Interfaces between source data and staging must be standardized
6 - Metadata Administration must be implemented enterprise wide
7 - Release management for CDW and datamarts need to be setup
8 - ...

Most of these pre-requisites I have written down in two articles which can be downloaded here.

Just a small brain dump I wanted to share with you guys, just give me your 2 cents on this one.


* a small level of data integration is necessary



]]>
http://www.b-eye-network.co.uk/blogs/damhof/archives/2009/06/who_owns_the_da.php Mon, 8 Jun 2009 00:28:06 MST http://www.b-eye-network.co.uk/blogs/damhof/archives/2009/06/who_owns_the_da.php
MIS is a mirage
In my quest for sustainable knowledge I am constantly searching for those little pieces of 'gold' in the literature. John Dearen's publication in the Harvard Business Review in 1970 is a great example. It's called MIS is a mirage.

Please remember that this article was written in 1970 - so try to go back in time when you read this article or this blog. It's fun!!

Some Quotes:
"Of all the ridiculous things that have been foisted on the long-suffering executive in the name of science and progress, the real-time management information system is the silliest"

In trying to decribe MIS as a term Dearden writes: "It is difficult to even describe MIS in a satisfactory way because this conceptual entity is embedded in a mish-mash of fuzzy thinking and incomprehensible jargon."

I wonder if I replace the word MIS for BI.....whether John Deardens' remark is still valid.

Back in those days MIS was apparently defined as some sort of holistic computer-based, centralized entity that can solve all management information problems.....

"In short, the proponents promise, experts can design a MIS that is more effective, more efficient, more consistent, and more dynamic than the haphazard aggregate of individual systems a company would otherwise employ."

As Dearden continues (freely translated); any manager would be a total idiot not to go for this amazing technology. Well, going back to 2009, I still encounter vendors and consultants that sell 'Mikes amazing kitchenmachine - that can do virtually anything you want'.

Off course Dearden is right in claiming that MIS - as defined in those days - was a total fallacy. And believe me - he's not holding back in his paper.

One nice sitesnote is that in this article Dearden talks about 'The System Approach' (without referencing it btw). It stems from Churchman's book 'The System Approach' (1968), which is quite another piece of 'gold' in science literature. Although he patronises it a bit, in the next 3 to 4 decades, system thinking did some real good things in theory-building.

To end this posting...Dearden finishes his paper with 'back-to-earth' insights and approaches into the informational challenges for executives, most of them still valid and most of them have nothing to do with the technical dimension but much more with the people- and organisational dimension.

Nothing new - aint it? But isn't that amazing if you consider the article to be from 1970?





]]>
http://www.b-eye-network.co.uk/blogs/damhof/archives/2009/06/mis_is_a_mirage.php Fri, 5 Jun 2009 00:32:31 MST http://www.b-eye-network.co.uk/blogs/damhof/archives/2009/06/mis_is_a_mirage.php