Many different approaches are now available for data integration, yet far and away the most popular approach is still extract, transform and load(ETL).
However, the pace of business change and the requirement for agility demands that organizations support multiple styles of data integration. Three leading options present themselves; I will now describe the differences among the three major styles of integration.
Physical Movement and Consolidation
Probably the most commonly used approach is physical data movement. This is used to replicate data from one database to another. There are two major genres of physical data movement: extract, transform and load (ETL) and change data capture (CDC). ETL
is typically run according to a schedule and is used for bulk data movement, usually in batch. CDC is event driven and delivers real-time incremental replication. Example products in these areas are Informatica (ETL) and Oracle GoldenGate (CDC).
Message-Based Synchronization and Propagation
Whilst ETL and CDC are database-to-database integration approaches, the next approach, message-based syncronisation and data propogation, is used for application-to-application integration. Once again there are two main genres: enterprise application integration (EAI) and enterprise service bus (ESB) approaches, but both of these are used primarily for the purpose of event- driven business process automation. A leading product example in this area is the ESB from TIBCO.
Abstraction / Virtual Consolidation (aka Federation)
The third major style of integration is data virtualization (DV). The key here is that the data source (usually a database) and the target or consuming application (usually a business application) are isolated from each other. The information is delivered on demand to the business application when the user needs it. The consuming business application can consume the data as a database table, a star schema, an XML message or in many other forms. The key point with a DV approach is that the form of the underlying source data is isolated from the consuming application. The key rationale for data virtualization within an overall data integration strategy is to overcome complexity, increase agility and reduce cost. A leading product example in this area is Composite Software.
Extract, Transform and Load or Data Virtualization?
The suitability of data integration approaches needs to be considered for each case. Here are six key considerations to ponder:
- Will the data be replicated in both the data warehouse (DW) and the operational system?
- Will data need to be updated in one or both locations?
- If data is physically in two locations, beware of regulatory and compliance issues associated with having additional copies of the data (e.g., SOX, HIPPA, BASEL II, FDA, etc.).
- Data governance
- Is the data only to be managed in the originating operational system?
- What is the certainty that a data warehouse will be a reporting data warehouse only (versus operational DW)?
- Currency of the data (i.e., does it need to be up to the minute?)
- How up to date are the data requirements of the data warehouse?
- Is there a need to see the operational data?
- Time to solution (i.e., how quickly is the solution required?)
- Immediate requirement?
- Confirmed users and usage?
- What is the life expectancy of source system(s)?
- Are any of the source systems likely to be retired?
- Will new systems be commissioned?
- Are new sources of data likely to be required?
- Need for historical / summary / aggregate data
- How much historical data is required in the DW solution?
- How much aggregated / summary data is required in the DW solution?
Leading analyst firms like Gartner are recommending that you add data virtualization to your integration tool kit, and that you should use the right style of data integration for the job for optimal results.
SOURCE: How Data Virtualization Helps Data Integration Strategies
Recent articles by Chris Bradley