The Extended Corporate Information Factory Supports the Smart Business

Originally published 13 December 2005

The 2005 Extended Corporate Information Factory Architecture
The first article, Supporting the Smart Business: The Extended Corporate Information Factorysummarizes the business and technology changes that made reconfiguring the CIF necessary. Part two of this series will focuses on the Extended Corporate Information Factory (CIFe) itself, and the changes within this popular and well-established architecture for business intelligence. Over the years, we have added or renamed certain components of the architecture as needs arose. For example, the data warehouse was originally called the atomic database and data marts were called departmental databases. We added the popular operational data store (ODS) to the architecture in the mid-1990’s. But overall the architecture has remained remarkably stable while accommodating new breakthroughs in its implementation as they became available.

At first glance, the Extended Corporate Information Factory may appear to be a complete departure from the “old” CIF. However, a closer examination shows that we have maintained the basic functionality and principles of the original Corporate Information Factory. These principles include:

  • The program orientation of the business intelligence environment.
  • The sourcing of data from the operational systems environment and external data stores.
  • The storage of an enterprise view of the data, in both the data warehouse and operational data store (ODS).
  • The delivery of that data to the business community through data marts or views into the main storage components tailored to the business users and their applications.
  • The inclusion of metadata management throughout the environment.

Many of the changes deal with a new set of technologies and techniques used in the overall process of creating the CIFe databases (Data Integration and Data Delivery) and in extending the focus of the Extended Corporate Information Factory into the more operational aspects of business intelligence. In addition, the outer ring of entities (Governance, Infrastructure Management, Quality Management, etc.) was added to ensure that the Extended Corporate Information Factory remained an enterprise resource. These entities have formalizing best practices that we have learned over the years to support this focus. See Figure 1 for the complete new architecture – the Extended Corporate Information Factory.

Figure 1: The Extended Corporate Information Factory

We have divided this article into these two main categories of CIFe changes for further explanation. Before going further, though, we must revisit the purpose for the CIFe architecture and the benefits it has brought to many business intelligence and data warehousing projects. We offer this latest edition of the Extended Corporate Information Factory freely and without restrictions in the hopes that you will find it useful and informative.

CIFe– A Conceptual Architecture Supporting Business Intelligence for the Smart Business

There have been many articles, presentations and other intellectual property created about the Corporate Information Factory. In its purest essence, it must be understood that the Extended Corporate Information Factory is a logical or conceptual depiction of a business intelligence environment. How companies physically implement this is their own choice. In all our years of designing and building these architectures, we have never seen two exactly alike. The generic nature of the architecture ensures that all technologies and techniques can be considered in its creation; there is no bias toward any particular technological solution. However, the basic tenets of implementation must not be compromised. These tenets include:

  • Data is captured from the operational systems, integrated to form the “one version of the truth,” and then made available for analysis or other business intelligence activities. There are two main mechanisms for making the data available. These can be made virtually through views or data federation techniques, or physically through data consolidation and propagation techniques. More information about this is given in the next section.
  • The data warehouse must be a real, physical entity without exception. A virtual data warehouse does not make sense. The data warehouse is a constant in the architecture since it contains the historical, integrated data for all strategic analytics.
  • The data marts must be dependent upon the data warehouse or operational data store (ODS) as their source of data. These data marts may be virtual, (e.g., through the use of views into the data warehouse or ODS) or real physical entities requiring use of proprietary technologies (e.g., MOLAP cubes) or the need for specialized data formats or subsets of data (e.g., certain data mining, statistical or exploration technologies and subsets of data).
  • The ODS is the source of current, integrated data and plays an important role in operational business intelligence for many enterprises. This component could make use of both data federation or consolidation technologies allowing it to be partially physical and partially virtual. It is the advance in data federation technologies that allow the ODS to contain real-time data for operational decision making purposes.
  • Metadata is the glue that holds the entire architecture together. Just like any other database component, it must be managed and maintained.

Another result of the conceptual or logical nature of the Extended Corporate Information Factory is that many physical technological components are left out. Some examples of this include:

  • Staging areas (either persistent or temporary), which must be understood as part of the overall technical environment for the data integration process. Therefore, no formal component called “Staging Area” is shown in the Extended Corporate Information Factory.
  • Disaster recovery, backup or archive technologies are also not explicitly depicted. Because these are considered to be part of the overall routine database maintenance, they do not need to show up in such a conceptual architecture.
  • There are many different forms of data marts. And there will be more forms of data marts in the future. Therefore, we chose to consolidate these into just a single set of generic data marts. There are currently at least five different types of marts. Each has its own technology supporting it, and its own business purpose or problem requiring its existence. See Figure 2 for examples of the five types of data marts.

Figure 2: Examples of various data marts and mart technologies.

  • Finally, the Extended Corporate Information Factory does not distinguish between structured and unstructured data. While the processes to capture and integrate these important pieces of content may be different, they are not shown as separate entities in a conceptual architecture such as the CIFe. It must be recognized that both are simply forms of data that must be incorporated into a business intelligence environment in some fashion.

Using an architectural diagram like the CIFe has a number of valid and tangible benefits for business intelligence implementers. First, it offers a well-planned roadmap for using data integration and business intelligence technologies. The back-end of the roadmap consists of operational systems, the processes to integrate and make data accessible, and the storage units (data warehouse and ODS) that are the backbone for an environment’s maintainability and sustainability. And the front-end delivery mechanisms consisting of data delivery, data marts and the various access and display capabilities, which yield the tactical and strategic analytics. These are mandatory for today’s smart businesses.

The CIFe serves as an excellent blueprint for all systems that support and drive business analytics and operations. It ensures the coordinated deployment of CRM analytics, BPM and other business intelligence technologies by mapping or documenting the overall data flows, which occur into and out of the various CIFe components and the corresponding process interactions. Using this architecture as a business intelligence blueprint enables seamless integration across various architectural components and promotes the re-use of components thus reducing overall development costs.

To extend the CIFe throughout the enterprise, the environment requires massive data collection, storage and access. If it is to perform correctly, the CIFe must be created with scalable, interoperable and reliable technologies. The success of its performance is measured by all the CIFe technologies to become so ubiquitous that they become invisible—to the customer, the business user and the overall enterprise. This transparency requires flexibility… and flexibility requires an appropriate architecture.

With this introduction, let’s turn our attention to the latest innovations in the 2005 Extended Corporate Information Factory.

Data Integration and Data Delivery
Perhaps the most significant change to the Extended Corporate Information Factory is in how we acquire and access the integrated data used throughout the entire environment. The CIFe combines two formerly separated processes, data acquisition and data delivery, into a single process—Data Integration and Delivery. This new process contains three techniques for acquiring and delivering data: data consolidation, data propagation and data federation (Figure 3). Each of these techniques, in turn, has its own applicable technologies for performing the technique:

  • Data Consolidation is the technique of integrating disparate pieces of data together to create a single record. The main technology used to perform this technique is ETL (extraction, transformation and loading) software. This software was the first form of data integration technology used in business intelligence environments, and has considerably matured in the past decade. It captures data from operational systems, performs matching and screening processes to integrate the data, converts the integrated data to the corporate standard formats and appends, inserts or updates records into the data warehouse or ODS. It is also commonly used to selectively extract data from the warehouse or ODS for creation of analytical data marts. In the second case, the selected data may need to be reformatted to fit the technology used in the mart, and then delivered to the mart environment. Data consolidation is typically an event-driven technique.
  • Data Propagation is the technique of replicating large amounts of data from one source and delivering the replicated data into a target database. It too has been around for a number of years. The technique uses EAI (enterprise application integration) technology to perform this task. Generally, minimal data transformation is needed. The most common uses for data propagation have been between operational systems, e.g., bulk movement of billing information from the billing system to the general ledger system. However, EAI technology is increasingly being used to replicate data from a data warehouse into a mart (if no reformatting or complex derivations are required). This technology can also replicate analytical results from a mart to the operational environment (e.g., replicating customer segment and lifetime value scores into a CRM system or ODS).
  • Data Federation is the technique in which data is virtually combined and then presented to the requestor. This relatively new technique is supported by EII (enterprise information integration) technologies. Data is not physically moved from source to target; rather the data is accessed from a variety of databases, and combined virtually, and then presented as if it were physically integrated. If the request is common, the data may be cached in the EII technology for better performance with minimal impact on the source applications. This software is commonly used to combine historical information from a mart with the current data from an operational system. It may also be used to combine data from multiple ERP installations into a single view of the combined data.

Figure 3: Data integration and data delivery

All three techniques and technologies can be used (in some combination) to create the data integration and data delivery process in the new Extended Corporate Information Factory. This change from the traditional CIF architecture gives the integration and delivery process much more flexibility, in terms of how components are created and maintained. It also gives the implementers more options about what technologies or techniques work best in their particular environments. You must remember, though, that the basic tenets of good CIFe construction cannot be violated.

Enterprise Business Intelligence Best Practices
The outer band of the Extended Corporate Information Factory shows the major components of the environment management function. These components provide a more modern approach to the major activities that must be performed to ensure that the business intelligence environment (1) operates smoothly, (2) operates cost-effectively and (3) increases in value to the organization as the business learns to leverage and expand its application. These components are gleaned from the best practices learned over the past two decades. These best practices ensure that the business intelligence environment is focused on supporting the entire enterprise, while efficiently satisfying the individual needs of each department or subdivision. They ensure that the environment is sustainable and maintainable over the long haul and new technologies and techniques can be easily incorporated. There are six major components within the environment management function. They are:

  • Governance: This consists of the people and processes for controlling and coordinating the environment with the individual business intelligence projects. Governance ensures that the various projects adhere to CIFe standards and nomenclature, that the data models are integrated and that the technologies are compatible and appropriate.
  • Infrastructure Management: This consists of the people, processes and technologies for ensuring that the environment operates smoothly and reliably. Activities within this component include version upgrades, incorporation of new technologies, retirement of older technologies, etc.
  • Center of Excellence: This consists of the people, processes and technologies for promoting collaboration and applying best practices. Typical activities for the Center of Excellence are maintaining source system expertise and data integration intelligence, gathering and understanding end-user requirements and preserving tool expertise.
  • Quality Management: This consists of the people, processes and technologies that ensure that data quality meets business expectations. Many organizations today are creating data stewardship functions to manage the overall data quality process. This function must interface with both the business intelligence and operational resources to ensure that data quality processes are fully adopted throughout the enterprise.
  • Application Management: This consists of the people, processes and technologies that create and coordinate application development within the business intelligence environment to provide maximum business value. Understanding the business problem may not be enough to create a successful and satisfying application. It may take a more complete understanding of how the application or technology fits into a person’s overall workflow. It means understanding the bigger picture of how people use the applications to perform their daily tasks. This function ensures that the appropriate technology is used for a specific business problem as well. An example of this is using statistical technologies for statistical problems, multidimensional technology for multidimensional problems, etc.
  • Metadata Management: This consists of the people, processes, technologies and data stores for managing the information about the enterprise’s data resources and activities. This includes not only the technical metadata generated from the data integration and delivery process, but also the metadata generated from administrative processes (who is using the environment, what data is frequently used, what data can be archived) and business metadata containing business definitions and rules.

Summary
It is with a great sense of accomplishment and happiness that we offer the extended version of the Corporate Information Factory. We see the CIFe as bringing together the best practices learned from prior implementations with the latest technological innovations, pushing the frontiers of business intelligence. The last part of this series will describe how the CIFe works with the Smart BI Framework to create the ultimate Smart Business.

  • Claudia ImhoffClaudia Imhoff
    A thought leader, visionary, and practitioner, Claudia Imhoff, Ph.D., is an internationally recognized expert on analytics, business intelligence, and the architectures to support these initiatives. Dr. Imhoff has co-authored five books on these subjects and writes articles (totaling more than 150) for technical and business magazines.

    She is also the Founder of the Boulder BI Brain Trust, a consortium of independent analysts and consultants (www.BBBT.us). You can follow them on Twitter at #BBBT

    Editor's Note:
    More articles and resources are available in Claudia's BeyeNETWORK Expert Channel. Be sure to visit today!

     

  • Colin WhiteColin White

    Colin White is the founder of BI Research and president of DataBase Associates Inc. As an analyst, educator and writer, he is well known for his in-depth knowledge of data management, information integration, and business intelligence technologies and how they can be used for building the smart and agile business. With many years of IT experience, he has consulted for dozens of companies throughout the world and is a frequent speaker at leading IT events. Colin has written numerous articles and papers on deploying new and evolving information technologies for business benefit and is a regular contributor to several leading print- and web-based industry journals. For ten years he was the conference chair of the Shared Insights Portals, Content Management, and Collaboration conference. He was also the conference director of the DB/EXPO trade show and conference.

    Editor's Note: More articles and resources are available in Colin's BeyeNETWORK Expert Channel. Be sure to visit today!

Recent articles by Claudia Imhoff, Colin White

 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!