Research Data Management
The Research Data Specialists of the IHR perform the data management that make our research possible. Research is evidence-based, and evidence is rooted in the rich data collected in a vast array of clinical and administrative data systems underlying clinical practice and the provision of insurance. Data must be curated and transformed in a way that supports each research hypothesis.
Data Specialists are experts at the Extract/Transform/Load (ETL) process and adept at locating and learning the intricacies of new data sources based on the needs of a study. They are responsible for ensuring that data pulls are complete and accurate and help to inform research study needs by understanding data caveats within our clinical and administrative systems that could potentially cause bias in an analysis.
To create efficiency across large numbers of similar research data requests, a team of research data specialists maintain the Virtual Data Warehouse (VDW) - a product created by extracting and pre-processing disparate data from multiple data sources into a research-ready format that support research data requests.
The Virtual Data Warehouse (VDW)
The VDW has been the primary data source for hundreds of grant studies since its creation over 12 years ago. It is a rich, quality checked, clinical information data mart that combines many complex data sources into an easy to use format for research analysts and programmers. Source data include the electronic health record, administrative claims system, state death data, tumor registry, hospital machines, and many more.
The VDW is created by cleaning, standardizing, and combining data from these different systems into 'content areas' (e.g. Enrollment, Demographics, Utilization, Death.) that may be easily linked to each other. The table below lists the broad set of data content areas harmonized within the VDW to date. New data content areas are considered each year to support new areas of research.
Data Content Areas Harmonized within the VDW
|Utilization||Includes encounter, diagnoses and procedures from both Kaiser and non-Kaiser provider.|
|Demographics||Birthdate, gender, race, and ethnicity|
|Enrollment||Member periods of enrollment, enrollment plan types|
|Benefits||Dollar and percentage of copay, deductible, and coinsurance, types of benefits|
|Vital Signs||Height, weight, body mass index, and blood pressure|
|Census||Geocoded information on education, income, housing, and race information based on neighborhood|
|Geographically Enriched Member Socio-Demographics||Race probabilities and geographic descriptors|
|Pharmacy||Outpatient dispensing, including those from outside claims|
|Ordered Meds||Outpatient prescribing and associated diagnoses|
|Laboratory||Completed tests and results|
|Social History||Tobacco, alcohol and illegal drug use, sexual behavior, and contraceptive use|
|Death||Death date and state certified cause of death|
|Providers||Specialty and provider type of internal and external providers|
|Problem List||Current status of patient's problem list|
|Language||Spoken or written language(s) of member|
|Pregnancy||Pregnancy outcome episode and mother-baby linkage|
|Tumors||Data documenting confirmed neoplasms; size, histology, stage, etc.|
|Infusion||Ordered and dispensed drugs at infusion center and treatment plan|
|Bone mineral density||Calculated BMD, t-score, scan date/time, location and fracture risk scores|
|Patient reported outcomes||Self-administered questionnaires including brief pain inventory (BPI) and the patient health questionnaire (PHQ)|
|Spirometry results||Completed Spirometry tests and results|
What Makes the VDW Unique
What makes the VDW unique within the KPCO data landscape is that it is designed specifically with research needs in mind. The VDW can provide a full picture of any given patient's interactions with our health care system and their health status over the duration of their membership, which allows for its use in answering a large variety of research questions. The VDW can also provide insight to patient attributes and coverage characteristics that may influence the use of our health care system or health outcomes. Data spanning nearly 20 years of historical KPCO membership and utilization allows the IHR to conduct point-in-time/cross-sectional analysis as well as longitudinal analysis.
Another unique quality of the VDW, and one of its key strengths, is the ongoing collaboration for governance, development, and quality assurance efforts with the Kaiser Permanente Center for Effectiveness and Safety Research (CESR) and the Health Care Systems Research Network (HCSRN). CESR's focus is on comparing how well and how safely different preventive services and treatment approaches work within our clinical practice at Kaiser Permanente. HCSRN is a consortium of 20 research centers, including all 8 KP regions, embedded in health plans across the United States and Tel Aviv, Israel. All additions and modifications to the VDW model go through these governing bodies to ensure usability and robustness of the data across research projects and organizations.
While the VDW can be thought of as a data warehouse that combines data from all 20 organizations, there's no centrally located store of data where data from all sites can be touched in one single run. This is known as a 'federated' data model, and what makes the VDW 'virtual'. Each member organization creates and maintains structurally identical data models and retains control over their local data, utilizing local programmers with singular expertise about their source data systems.
Multi-site research is accomplished through 'distributed programming'. Analytic programs are written by a lead site against VDW specifications, and then distributed to participating sub-sites where they can be easily run against their VDWs. Results are reviewed, and analytic results are returned to the lead site. This process preserves privacy for all patients and ensures the quality of the data returned for analytics.
The standard HCSRN VDW data model is the core of our KPCO VDW. It also serves as a primary source for other common data models managed by the IHR, including:
- Sentinel Common Data Model (SCDM) – Funded by the FDA, the SCDM is a federated data model used for surveillance and analytics of adverse events associated with medical products.
- Colorado Health Observation Regional Data Service (CHORDS) – A federated data model employed by a seven-county regional network to support public health monitoring and evaluation efforts.
- Patient Centered Outcomes Research Network (PCORNet) Common Data Model – A federated data model to support patient–centered research initiatives.