This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Research Protocols, is properly cited. The complete bibliographic information, a link to the original publication on http://www.researchprotocols.org, as well as this copyright and license information must be included.
The Patient-Centered Outcomes Research Institute (PCORI) created a new national network infrastructure to enable large-scale observational comparative effectiveness research across diverse clinical care settings. As part of testing the feasibility of this effort, each clinical data research network (CDRN) was required to construct cohorts of patients, including one of patients with overweight and obesity.
The aim of this paper is to report on the development of the Patient Outcomes Research to Advance Learning (PORTAL) overweight and obese cohort, which includes patients from 10 health plans located across the United States.
Information was gathered from each plan’s electronic health records (EHR). Eligibility included 18 years of age or older, a valid height and weight in 2012 or 2013, and body mass index (BMI) greater than 22.9 kg/m2. Pre-diabetes and diabetes status was defined using the American Diabetes Association (ADA) criteria, using lab values of glycated hemoglobin (HbA1c) or fasting glucose available in the EHR. Hypertension was identified from the International Classification of Diseases (ICD) diagnosis codes. Individuals were classified into BMI categories: healthy weight (23.0-24.9 kg/m2), overweight (25.0-29.9 kg/m2), obese class 1 (30.0-34.9 kg/m2), obese class 2 (35.0-39.9 kg/m2), obese class 3 (40.0-49.0 kg/m2), and obese class 4 (>50.0 kg/m2).
A cohort of 5,293,458 non-pregnant adults was created. Weight status was 20.39% (1,079,289/5,293,458) healthy weight, 40.40% (2,138,520/5,293,458) overweight, 22.78% (1,205,866/5,293,458) obese class 1, 9.86% (521,872/5,293,458) obese class 2, 5.59% (295,786/5,293,458) obese class 3, and 0.98% (52,125/5,293,458) obese class 4. Race/ethnicity was 49.02% (2,594,776/5,293,458) non-Hispanic white, 22.89% (1,211,677/5,293,458) Hispanic, 10.40% (550,608/5,293,458) Asian, 10.83% (573,506/5,293,458) black, and 6.59% (348,830/5,293,458) other. About 34.33% (1,817,438/5,293,458) met the definition of hypertension, 20.49% (1,660,940/5,293,458) of individuals met the criteria for pre-diabetes, and 14.98% (793,069/5,293,458) met criteria for diabetes. Prevalence of pre-diabetes and diabetes varied across health plans to a greater extent than expected based on hypertension prevalence and BMI status variability.
This large, race, ethnic, and geographically diverse cohort will be useful for future studies of rare exposures or outcomes and differences in health care practices.
In 2014, the Patient-Centered Outcomes Research Institute (PCORI) funded 11 Clinical Data Research Networks (CDRN) and 18 Patient-Powered Research Networks to develop a National Patient-Centered Clinical Research Network (PCORnet), with the purpose of building a common infrastructure across the CDRNs to enable highly representative future clinical outcomes research. The goal of PCORnet is to "transform clinical research by engaging patients, care providers, and health systems in collaborative partnerships to improve healthcare and advance medical knowledge." One of the CDRNs is the Patient Outcomes Research to Advance Learning (PORTAL) network. PORTAL combines four health care delivery systems that have about 11 million members enrolled across nine states (CA, CO, GA, HI, MD, MN, OR, VA, WA) and the District of Columbia, reaching into most regions in the United States and offering a diverse patient population.
The PORTAL health care systems are previously described [
All CDRNs were required to develop three cohorts to demonstrate each network’s ability to identify individuals with a condition of interest and to test the commonality of data elements across sites. They also were required to field a survey of the cohorts to test the ability to reach out to patients. One of the pre-specified cohorts common to all of the PCORnet CDRNs was a cohort of individuals with obesity. The PORTAL overweight and obesity cohort was defined as adult members of our health care systems during 2012 or 2013 that were overweight or obese, defined as having a body mass index (BMI) greater than or equal to 23.0 kg/m2. Although overweight is defined as BMI greater than 25 kg/m2we recognize that the World Health Organization (WHO) recommends lower overweight and obesity cut points for Asians: 23.0-27.4 kg/m2for overweight and greater or equal to 27.5 kg/m2for obesity [
We constructed a cross-sectional cohort of adults enrolled in any of the PORTAL health plans; all of those meeting eligibility criteria are considered cohort members. For all sites except Denver Health, we first identified health plan members with at least 12 months of continuous membership between January 1, 2012 and December 31, 2013, and who were at least 18 years of age on December 31, 2013. Members were further restricted to those who had a weight recorded during 2012 or 2013, had a height recorded in the electronic health record (EHR), and who were not pregnant during 2012-2013. For Denver Health, the initial eligibility criteria included all adults who had a primary care encounter during 2012 or 2013 because Denver Health, as a safety-net organization, does not enroll members.
Each health care system has its unique methods of capturing its electronic health care data, resulting in information that widely varies in terms of content, format, and structure, thus requiring consistent data standards and terminology. We used the Health Care Systems Research Network (formerly HMO Research Network) Virtual Data Warehouse (VDW) for data extraction. The VDW is a federated database in which all data reside at each health system behind each site’s secure system, or firewall [
Kaiser Permanente Southern California (KPSC) is the lead site for the cohort and obtained its institutional review board’s (IRB) approval for human subjects protections for the research. The IRBs at the other sites reviewed the protocol and subsequently ceded review to the KPSC IRB.
Weight is routinely measured as part of obtaining vital signs during outpatient clinic visits. Height is typically assessed less often, as it is considered to be more static. If BMI was not available in the EHR, it was calculated. If more than one weight, height, or BMI was in the EHR in 2012-2013, the most recent value was used. EHR records of heights less than 4 ft or equal to or greater than 8 ft, and weights less than 50 lbs or equal or greater than 1000 lbs were considered implausible and were removed from the data set. Similarly, calculated BMI less than 5 kg/m2or equal to or greater than 90 kg/m2were excluded. A total of 6954 (0.11%, 6954/6,255,688) individuals were excluded from the cohort because they had no biologically plausible weight, height, or BMI values.
We categorized individuals as healthy weight (BMI 23.0-24.9 kg/m2), overweight (25.0-29.9 kg/m2), obese class 1 (30.0-34.9 kg/m2), obese class 2 (35.0-39.9 kg/m2), obese class 3 (40.0-49.9 kg/m2), or obese class 4 (>50 kg/m2) [
Race and ethnicity was obtained from self-report during enrollment into the health plan, during a health care encounter, or from birth certificates (if applicable). Individuals had the option to identify themselves as Asian, Black or African American, Hispanic, Native Hawaiian or other Pacific Islander, American Indian or Alaskan Native, White, or other. If the information was not available in the VDW or individuals identified themselves as belonging to another race or ethnic group, the individual was categorized as "other/unknown."
Our health plans do not routinely collect individual-level data on educational attainment or income levels, so investigators rely on neighborhood-level information to estimate socioeconomic status. Neighborhood education and income were estimated using geospatial entity object codes (geocodes) that linked addresses to 2010 US census data at the block group level. The probability of different education levels within a block group was used to calculate individual averages. The probability of different family and household income levels within a block group was used to calculate individual averages.
Pre-diabetes was defined by the American Diabetes Association (ADA) and from the work of Schmittdiel et al as follows: if during the study period the EHR had (1) at least one HbA1C between 5.7% and 6.4%, or (2) at least one fasting plasma glucose measurement between 100 and 125 mg/dL, or (3) at least one oral glucose tolerance test between 140 and 199 mg/dL, or (4) at least one outpatient International Classification of Diseases, Ninth Revision (ICD-9) code of 790.2, 790.29, 790.21, or 790.22 [
Diabetes was defined using the methodology developed for Surveillance, Prevention, and Management of Diabetes Mellitus (SUPREME DM), a large multi-site observational diabetes study [
Hypertension was considered present if an individual had at least two outpatient or one inpatient ICD-9 codes of 401-405xxx.
Individuals who had undergone bariatric surgery were identified by an algorithm developed by Arterburn et al in 2009, which used the Current Procedural Terminology 4 (CPT-4) codes (43842, 43843, 43846, 43847), and ICD-9 codes (CPT-4 codes 43659, 43621, 43633) [
Presence of comorbid conditions was assessed with a modified Charlson Comorbidity Index [
A random sample of 675 overweight and obese English or Spanish reading or speaking individuals were selected from each of the seven KP health plans and Denver Health to complete a brief health survey, for a total of 5400 individuals. An equal number of participants were selected from the categories of overweight, obese class 1, and obese class 2 (n=1080 per category). We randomly selected 2160 for those with obese class 3 and greater, as we were concerned that the extremely obese may not choose to complete the survey. The survey took about 10 minutes to complete and included items on general health and well-being, physical activity, eating patterns, sleep patterns, and perceived health care sensitivities surrounding weight status. The survey was mailed to individuals with telephone follow-up for those who did not return the survey. A US $20 incentive was offered to complete the survey.
PORTAL overweight and obesity flow chart to construct the cohort.
The number of individuals in each BMI category, all PORTAL sites combined.
The cohort includes over 5 million adults with a BMI >23.0 kg/m2. The cohort flow chart with all sites combined is displayed in
Cohort demographics are displayed in
Overall, about 85.03% (5,293,458/6,225,688) of non-pregnant individuals over the age of 18 with valid BMI measures obtained in 2012 to 2013 are members of the cohort (
Pre-diabetes varied across sites, with an overall cohort prevalence of 29.49% (1,560,940/5,293,458) and a range from 15.30% (21,248/138,900) to 34.45% (39,171/113,699) across the health plans (
Sociodemographic and BMI categories for those who returned the PORTAL health survey (N=2809) compared with those who did not (N=2591).
|
Returned survey, n (%) | Did not return survey, n (%) | |
Sex |
|
|
|
|
Female, n=3290 | 1737 (52.80) | 1553 (47.20) |
|
Male, n=2110 | 1072 (50.81) | 1038 (49.19) |
Age category |
|
|
|
|
<20 years, n=80 | 24 (30.00) | 56 (70.00) |
|
20-29 years, n=546 | 215 (39.38) | 331 (60.62) |
|
30-39 years, n=866 | 347 (40.07) | 519 (59.93) |
|
40-49 years, n=1115 | 539 (48.34) | 576 (51.66) |
|
50-59 years, n=1220 | 680 (55.74) | 540 (44.26) |
|
60-69 years, n=1019 | 638 (62.61) | 381 (37.39) |
|
70-79 years, n=442 | 296 (66.97) | 146 (33.03) |
|
>80 years, n=112 | 70 (62.50) | 42 (37.50) |
Race/ethnicity |
|
|
|
|
White, n=2535 | 1435 (56.61) | 1100 (43.39) |
|
Hispanic, n=987 | 420 (42.55) | 567 (57.45) |
|
Asian, n=304 | 165 (54.28) | 139 (45.72) |
|
Black, n=1144 | 596 (52.10) | 548 (47.90) |
|
Native Hawaiian/other Pacific Islander, n=301 | 168 (55.81) | 133 (44.19) |
|
American Indian/Alaskan Native, n=34 | 20 (58.82) | 14 (41.18) |
Other/unknown, n=95 | 5 (5.26) | 90 (94.74) | |
BMI category |
|
|
|
|
Overweight (25.0-29.9 kg/m 2), n=1080 | 577 (53.43) | 503 (46.57) |
|
Obese class 1 (30.0-34.9 kg/m 2), n=1080 | 566 (52.41) | 514 (47.59) |
|
Obese class 2 (35.0-39.9 kg/m2), n=1080 | 550 (50.93) | 530 (49.07) |
|
Obese class 3 (40.0-49.9 kg/m2), n=1811 | 936 (51.68) | 875 (48.32) |
|
Obese class 4 (>50.0 kg/m2), n=349 | 179 (51.29) | 170 (48.71) |
From the sample of 5400 individuals, 2809 surveys were completed, 114 were deemed ineligible (ie, no valid address, deceased), 924 persons refused, and 1553 did not respond to mail or telephone attempts, resulting in a 53.14% response of those eligible. Among those who were selected for the survey, women (52.80%, 1737/2809) were slightly more likely to complete the survey than men (50.81%, 1072/2809), and more older individuals returned the survey, for example 62.61% (638/1019) of those age 60 to 69 years completed the survey compared with 39.38% (215/546) of those age 20 to 29 years (
The PORTAL overweight and obesity cohort is large and extends across all regions in the United States. Racial and ethnic diversity, as well as socioeconomic diversity, is large and generally representative of the underlying populations of the health plans’ service regions [
The prevalence of individuals across BMI categories and hypertension prevalence was fairly similar across health plans. In contrast, pre-diabetes and diabetes prevalence varied to a greater extent than expected based on hypertension prevalence and BMI status variability. This variability may be due to local differences in testing for pre-diabetes and diabetes, which requires blood work while weight and blood pressure are routinely measured at each visit. The ADA recommends testing for pre-diabetes and diabetes for all adults starting at age 45 years or for those who are overweight and who have additional risk factors, including physical inactivity, hypertension, and being from minority race and ethnicities [
Follow-up of the cohort will be through the clinical information available in EHR. The five year retention is expected to be about 60%, but will vary by health care system. For the 3.1 million individuals who were health plan members in 2009 and 2013, clinical data are available with 5 year follow-up. This information includes repeated measures of height, weight, BMI, prevalent and incident diagnoses from inpatient and outpatient encounters, procedures performed, laboratory test results, pharmaceuticals dispensed, and pathology and radiology results.
PCORnet is created to foster collaborative partnerships across networks and institutions and PORTAL investigators adhere to this principle. The PCORnet CDM (similar to the VDW) has a query function to allow non-PORTAL investigators to inquire about data availability. In general, the information available in the EHR is protected and confidential and remains behind each health plan’s firewall. We welcome external collaborations, particularly collaborations that include establishment of research questions, study design decisions, and analysis and interpretation of the data. Current analyses underway include descriptions of cardiometabolic health among cohort members, incidence of outcomes across BMI categories, and survey results.
In some regions, individuals with low socioeconomic status may be underrepresented, although all health plans except one include individuals covered under state-subsidized insurance, and Denver Health’s mission is to serve those with limited ability to pay for medical services. There is also marginal underrepresentation of those with high incomes. While a large population, the cohort does not include individuals from all 50 states and, therefore, cannot be considered as fully representative of the United States. Because data are collected as part of clinical care, some data elements may not be research quality and are likely to have errors or misclassifications imbedded in them. The classifications of disease status (eg, hypertension, diabetes status) are based on data available in the VDW and have not been chart-reviewed for their validity. However, the quality of diagnosis codes is relatively high in managed care systems and has been validated for many health conditions [
The PORTAL overweight and obesity cohort is a rich resource of considerable diversity. It represents the ability of clinical data to be combined across health plans to be available for future epidemiological and comparative effectiveness research.
Sociodemographic, BMI category, chronic conditions, and health insurance status across the 10 PORTAL obesity cohort sites (N= 5,293,458).
American Diabetes Association
body mass index
Clinical Data Research Network
Common Data Model
Current Procedural Terminology 4
Electronic Health Records
glycated hemoglobin
International Classification of Diseases, Ninth Revision
institutional review board
Kaiser Permanente Southern California
National Health and Nutrition Examination Survey
Patient-Centered Outcomes Research Institute
Patient-Centered Clinical Research Network
Patient Outcomes Research to Advance Learning
Virtual Data Warehouse
World Health Organization
This work was supported by a contract awarded by the Patient-Centered Outcomes Research Institute (PCORI).
None declared.