Mastering Data Management in Health and Life Sciences

Originally published 12 May 2009

It’s often said that there are only two things that go in an out of our healthcare system – patients and data. With the meteoric rise in digitized assets in health and life sciences and of data warehouses in general, we find ourselves at the epicenter of both challenge and opportunity: to create usable data systems while managing the deluge of data.

Last week, I had the opportunity to attend the SAS® Health and Life Sciences conference where we heard John D. Halamka, MD, MS, speak about the role of IT in healthcare and the fact that our databases are growing beyond our capacity to use them in a meaningful way. Given the fact that there are no others to tackle this challenge, let’s assume that for a moment that we can and we will. It becomes our job to take what’s important from that data and somehow make it useful. We call that thinking data® – data that is more predictive, more accessible, more usable and more coherent.

At the forefront of that challenge, we have the fundamental problem of data integration. While there are many strategies about how to do this technically (data warehouses, data marts, federation, in-memory, column based), we will focus on the contextualizing of information – that is, linking the facts (e.g., blood pressure) with context (history, concomitant medications). In this article, we will first outline a few of the opportunities in health and life sciences that could benefit from a good master data management strategy. Next, we’ll talk about some of the challenges that organizations would have to overcome and conclude the article with a brief discussion on the value or ROI that might be achievable if one were successful.

Master Data Management: A Brief Overview

Before we dive into the purview of master data management (MDM) as applied to health and life sciences, let’s take a moment to talk about what MDM is in the first place. Think of MDM as a way to manage reference data so that it can help us understand the context of our transactions. Using Ralph Kimball’s terminology, MDM helps us contextualize our facts with dimensions. MDM helps us by providing processes for how we collect, summarize and cleanse our data to ensure consistency and appropriate governance in the ongoing maintenance and use of this data.


From the data perspective, the world of health and life sciences is daunting. With pharma focused on drug discovery and research, operations, manufacturing, and sales and marketing, there are untold examples where organizations have successfully used data and technology to support decisioning. In healthcare, we have an even more complex environment of payers, providers, researchers and patients (and their advocates both in the government and in the public sector). The table below highlights just two examples from clinical research and health outcomes of the range of uses for primary patient data.

 Business Area

 Need for Content

Clinical Trial Site Selection,

Enrollment and Management

Protocol Context (Phase, Sponsor, Therapeutic Area)



People / Roles





Health Outcomes
Attending Physician

Sending Hospital

Admitting Diagnosis

Primary Payor

Patient's Insurers

Surgical Teams

Sending-Receiving Hospitals

FPA Practice Structure

Nursing Unit Beds

Drug Terapeutic Classes


Admission, Discharge & Transfer


NDC Drug Code

Physician License Number


CPT-4 Code


If we look at the data collected at the point of patient care (see diagram below), we see an amazing accumulation of facts that should be useful in uncovering patterns of care and improving outcomes for patients. As we move away in time from that point of care, the problem is that data is often NOT enriched with new information or linkages, but rather siphoned off into sub-parts that are used by other organizations.

(mouse over image to enlarge)

Take for example, the case of a medical claim. The information sent to the payer is but a snippet of the entire story told about the patient, encounter, care event or action. Health economists at pharmaceutical companies are also at a disadvantage, as they don’t see the contextual information that provides more detail when determining the comparative effectiveness of therapies or the economics of a drug.

While there appears to be a huge amount of effort in healthcare around digitizing electronic medical records, there doesn’t appear to be as much effort going into ensuring that it is useful beyond supporting continuity of care. Getting the right data into a secure environment for the purposes of patient care is, of course, a massively important undertaking, but we cannot lose sight of the fact that we still need to employ good techniques for making this data useful once the patients have been treated and are on their way.

Return on Investment

As an example, let’s take the problem of a clinical trial site management. Within a clinical trial, we have information about the trial itself, information about the patient population, information about sites where these patients will be enrolled into the study and so on. A typical pharmaceutical company runs hundreds of trials each year. Each study may have upwards of several thousands of patients enrolled across hundreds of sites.

As each clinical trial is often run within a therapeutic area department within the pharmaceutical company and many functions of the trial are outsourced, the process of patient and physician/site selection is a perfect example of a business problem where a master data management strategy would deliver tremendous value.

Prior to implementing a good MDM strategy, a pharmaceutical company typically would “copy” an existing site selection database from a similar study, then “buy” additional sites from data vendors who specialize in knowing things like the correct address, number of physicians, etc. They would then use this data throughout the trial: updating, adding and deleting information all the while.

After implementing an MDM strategy, the pharmaceutical company would have a master site database with linkages to patient population and epidemiological information. It would contain physician level information such as specialties and certifications. Whenever you wanted to start a new study, you could query the data to answer questions such as:
  • Which sites have been involved in similar studies?

  • What drugs do the physicians at these sites prescribe most?

  • In previous trials, did they meet or exceed their enrollment forecasts for patients?

  • What are my top performing sites/physicians?

  • Which physicians are rated worst?

  • Which sites had the most dropouts?
Having a single version of the truth moves the data from merely names and addresses to a source of operational advantage and a strategic asset to the organization.

Prior to implementing the MDM strategy, the pharmaceutical company would often manage these site databases separately. But as we look at the question of value (an essential question for any technology adoption process), we take note of the fact that the cost of running trials is approaching 30% their entire drug development budgets and 75% of patient studies fail to make their timelines.


Master data management is about creating a consistent version of the truth: data that can be shared across the organization and used for both tactical (operational) and strategic uses. In healthcare, purposefully linking context to patient care will no doubt help us perform analysis to help improve patient safety and outcomes. In pharmaceutical companies, master data management can be used to provide operational support for conducting clinical trials more effectively and efficiently.

To get started, we’ve outlined a few key steps that we have found useful in designing a strategy:
  1. Inventory the data sources throughout the organization (division, department, therapeutic area, service line) that seem redundant and/or useful for master data.

  2. Perform some analysis on the source data and find commonalities and differences among the data sources.

  3. Understand the organizational context. Before you can build an MDM strategy, you have to understand how the data is used in context. Where does it come from? Where does it go? Who uses it? Who “owns” it? This will help ensure that you do the right thing for the entire organization.
Once you’ve gotten a lay of the land, you can then move toward a unifying plan for your data and implement a road map for success that would include having data stewards, operational processes and data governance as well as the technical environment for the master data management program.

SOURCE: Mastering Data Management in Health and Life Sciences

  • Greg NelsonGreg Nelson

    Greg Nelson is the Founder and Chief Executive Officer of ThotWave Technologies, the health and life sciences business intelligence company. Greg provides professional services to healthcare, biopharma as well as government and academic researchers. Greg has served as the Director of Technology for the largest, privately held CRO, Director of Application Development for the Gallup Organization and a director at the University of Georgia’s computer center. He has published and presented more than 150 professional papers in the United States and Europe.  

    While Greg has been a practitioner for the past 23 years, his academic roots began with a BA in Psychology from the University of California at Santa Cruz, in addition to doctoral level work in Social Psychology and Quantitative methods at the University of Georgia. Greg also holds a Project Management Professional Certificate. Greg can be reached at

    Editor's note: More articles, resources, news and events are available in the BeyeNETWORK's Health & Life Sciences Channel. Be sure to visit today!



Want to post a comment? Login or become a member today!

Be the first to comment!