Impact of Built Environments on Body Weight (the Moving to Health Study): Protocol for a Retrospective Longitudinal Observational Study

Background Studies assessing the impact of built environments on body weight are often limited by modest power to detect residential effects that are small for individuals but may nonetheless comprise large attributable risks. Objective We used data extracted from electronic health records to construct a large retrospective cohort of patients. This cohort will be used to explore both the impact of moving between environments and the long-term impact of changing neighborhood environments. Methods We identified members with at least 12 months of Kaiser Permanente Washington (KPWA) membership and at least one weight measurement in their records during a period between January 2005 and April 2017 in which they lived in King County, Washington. Information on member demographics, address history, diagnoses, and clinical visits data (including weight) was extracted. This paper describes the characteristics of the adult (aged 18-89 years) cohort constructed from these data. Results We identified 229,755 adults representing nearly 1.2 million person-years of follow-up. The mean age at baseline was 45 years, and 58.0% (133,326/229,755) were female. Nearly one-fourth of people (55,150/229,755) moved within King County at least once during the follow-up, representing 84,698 total moves. Members tended to move to new neighborhoods matching their origin neighborhoods on residential density and property values. Conclusions Data were available in the KPWA database to construct a very large cohort based in King County, Washington. Future analyses will directly examine associations between neighborhood conditions and longitudinal changes in body weight and diabetes as well as other health conditions. International Registered Report Identifier (IRRID) DERR1-10.2196/16787


Introduction
Background Residential context-the features of the neighborhoods we live in-affects our health behaviors and well-being [1,2]. Residential environments have been cross-sectionally linked to diet quality, body weight, and prevalence of obesity and obesity-related health conditions [3][4][5][6][7]. However, such study designs have limited causal interpretability owing to challenges isolating the impacts of a single neighborhood exposure and to the threat of reverse causality [8,9]. With a few notable exceptions [10,11], most studies of the impact of changing residential neighborhoods on health operated at the ecological level [12] or leveraged specific one-time changes such as a new transit system [13][14][15] or supermarket [16,17]. Meanwhile, studies assessing changes in weight among people who moved [18][19][20] have been limited by modest sample sizes. As neighborhood features often have only modest effects on behavior [21], studies with few participants frequently fail to identify robust and causally interpretable effects of residential environments [22].

Objectives
The Moving to Health Study, whose design and methods we present here, is using data from Kaiser Permanente Washington (KPWA; formerly Group Health Cooperative) to address this gap [23]. KPWA is a large integrated health insurance and care delivery system in Washington State, serving broad economic strata. By attaching a geographic context to more than a decade of anonymized electronic health records (EHRs) for more than 200,000 adults in King County, Washington (the central county of the Seattle-Tacoma-Bellevue metropolitan statistical area), the study will assess the longitudinal impact of baseline residential built environment, the effect of moving between environments, and the effect of changes in the built environment among those who did not move and on obesity and type 2 diabetes at a heretofore unparalleled scale.
Here, we describe the Moving to Health adult obesity study cohort design, the process of building a longitudinal epidemiologic cohort from health system data, the individual and neighborhood environment characteristics of adults aged 18 years and older in the cohort, and the residential moves that this cohort undertook during 11 years and 4 months of follow-up.

Setting
We constructed a retrospective observational cohort of adults and children in King County, Washington, using data from KPWA merged with publicly available data on the built environment compiled by the Urban Form Lab at the University of Washington. In this paper, we describe the adult cohort; details and analyses regarding the child cohort will be published separately. All study procedures were reviewed in advance and approved by the KPWA institutional review board, approved a waiver of consent, and the Health Insurance Portability and Accountability Act (HIPAA) authorization to identify and enroll study subjects.
KPWA has approximately 700,000 members in Washington, and 36% of these reside in King County. King County includes Seattle and is the most densely populated county in Washington State. KPWA enrollment in King County is similar to the county's population in terms of income, educational attainment, and representation of racial and ethnic minority groups.

Overview
The majority of member care at KPWA is delivered using EHR databases, which also record the majority of clinical outcomes. KPWA medical centers have used the Epic (Epic Systems Corporation) EHR platform since 2005, the first year of our study. The data contained in the EHR data warehouse include the vital indicators of KPWA member health status. For example, biometric data such as heights, weights, and blood pressure values recorded at clinic visits are fully retrievable for analyses, rendering the available patient profiles more detailed than the insurance claims only data available from Medicaid, Medicare, or most health plans that contract with independent medical groups or networks of physicians. By combining KPWA EHR data with other extensive databases used in provision of insurance and care (ie, enrollment, outside claims, deaths, costs, outpatient visits, hospitalizations, emergency room care, pharmacy, radiology, and laboratory databases), we can document all medical and surgical care rendered during the period of their enrollment at KPWA for each study participant that was either delivered in (1) KPWA-owned and KPWA-operated medical centers or (2) in KPWA's contracted network facilities and providers and paid for by the health plan. Specifically, our cohort uses the following data features:

Membership
Dates and status of enrollment, types of insurance coverage, and drug coverage plan were used to determine the periods of eligibility as detailed below.

Residential Locations
Membership files also contain changes in mailing address, typically the home address (mailing address is confirmed every time a patient contacts KPWA, including clinical visits). We geocoded these home addresses to identify latitude and longitude values for residential locations that can be used to link with spatially referenced data from other sources. A total of 95% of members for whom we attempted to geocode all recorded addresses had at least one address matched successfully. Common sources of inability to geocode included the use of a post office box as a mailing address and a form of address too oblique to be cleaned such that the geocoder could find the relevant location. We identified residential relocation (hereafter called moves) by comparing successive address records, such that any change in the patient address that resulted in a different location for the geocoded home address constituted a move. We classified patients for whom we identified a move to another location in the county as movers to compare available data for the population whose moves we can analyze to the population as a whole. Geocoding was performed in steps: first, we performed a crude but fast geocode using the SAS (SAS Institute) geocoder with US Census TIGER/Line files to rule out addresses clearly not in King County. Then, to get a more precise home location, we used a composite geocoding approach: we first looked for an exact match in the King County E-911 address points, and then, if no match was identified in the E-911 dataset, we used Esri Business Analyst (ESRI), requiring a rooftop match to consider the address successfully geocoded.

Demographics
Date of birth, gender, race, and ethnicity are available in the administrative datasets. These data were self-reported by patients as part of routine clinical practice.

Clinical Measures
Height and weight are measured by clinical staff and recorded in the EHR during clinical visits. These heights and weights have previously been used extensively for research purposes [4,24]. We excluded weight measurements that clinical expertise indicated were biologically implausible for adults (<70 pounds or ≥700 pounds). Smoking status was self-reported through patient questionnaires deployed during clinical visits.

Utilization, Diagnoses, and Procedures
The KPWA EHR includes dates and types of health care utilization for inpatient, emergency department, and outpatient settings. Using the baseline visit and all records dating to the previous 12 months, we constructed an Elixhauser comorbidity score [25,26]. As our baseline was 2005 and all subjects were aged 18 years or older at baseline, we consulted EHR records from as far back as 2004 and for patients as young as 17 years at the time of the visit to construct this score. We also used these records to assess the baseline prevalence of conditions of particular interest, including diabetes, hypertension, dyslipidemia, depression, and anxiety. Codes used to infer the presence of health conditions are available from the authors on request.

Measures of Neighborhood Context
As of December 2019, we have constructed six neighborhood environment measures (Table 1) and anticipate constructing more. These measures are drawn from publicly available geographic information systems (GIS) data layers and were selected to assess aspects of neighborhoods thought to influence physical activity behaviors and weight trajectory. Obtaining multiple GIS-based environmental measures for hundreds of thousands of point locations is challenging; to accomplish this, for each variable of interest, we first constructed SmartMaps [27], which are spatially continuous rasterized surfaces, where each raster cell contains the average value of the environmental feature of interest within a predetermined distance ( Figure 1). The maps allow efficient estimation of environmental characteristics for large numbers of point locations. We used each SmartMap to assign the selected neighborhood measure to each subject home location at baseline and multiple follow-ups, based on historical GIS data temporally matched with the EHR. This approach avoids typical GIS workflows that require computing each environmental measure for each individual geocoded location. We used radial buffers rather than network buffers for most SmartMaps to minimize computational costs. An additional advantage of the SmartMap approach is that SmartMaps can be constructed by team members outside of KPWA without the need for HIPAA-protected home addresses. SmartMaps were developed using PostgreSQL, PostGIS, and R (R Foundation).  We have constructed measures covering the following domains of neighborhood conditions; however, a key feature of our cohort design is that other measures of the built environment can be easily added in the future as the data become available:

Neighborhood Composition
The physical and social composition of a neighborhood may influence walkable access to retail and daily routine destinations, perceptions of the safety of outdoor physical activity, and other weight-relevant behavioral health norms. Our neighborhood composition measures included residential density (housing units/land area) [28][29][30] and population density (residents/land area) [6,31,32] to capture the intensity of neighborhood development and related mix of land uses, as well as residential property values as a dimension of neighborhood socioeconomic status [5]. We will develop a measure of employment density for use with this cohort.

Transportation Infrastructure
Transportation infrastructure affects a resident's ability to choose active transportation options, which, in turn, may prevent obesity. Street intersection density, a measure of walking connectivity, has been found to be negatively associated with obesity, albeit inconsistently [33,34]. Similarly, access to sidewalks and trails is also thought to encourage walking and prevent obesity, although findings focused on walking infrastructure have also been inconsistent [35][36][37]. We have measured street intersection density from King County GIS data and will measure trail density using King County GIS data and transit ridership per bus stop as reported by King County Metro, which operates the bus system within the county.

Food Environment
The food environment has been strongly correlated with obesity, but questions remain as to whether the relationship is causal [6,38,39]. Measures of the food environment for our cohort included densities of supermarkets and fast food restaurants as reported by King County Public Health and geoprocessed by the University of Washington Urban Form Lab [40], and we will construct a similar measure of convenience stores. As most King County residents drive to shop for food [41], the SmartMaps for food environment measures used network buffers to account for road network impacts on driving distances.

Recreational and Fitness Environments
Neighborhood parks are thought to encourage physical activity that prevents unhealthy weight gain [42,43]. We will compute the percent of land area dedicated to parks as reported by King County and local municipalities and compiled by the University of Washington Urban Form Lab [42]. Future analyses may also incorporate gyms, exercise studios, swimming pools, and other venues for recreational activity.

Identifying a Cohort From Electronic Health Record Data
To construct the study cohort, we initially identified KPWA members aged 18 to 89 years between January 1, 2005, and December 31, 2017, whose home addresses were successfully geocoded to a King County location and for whom height and weight data were available. We required a successful geocode because our goal was to assess the impacts of residential location. We excluded members older than 89 years owing to concerns that older age could be personally identifying. We later determined that an EHR system change rendered address changes after April 30, 2017 inconsistent and limited our data to records of visits before May 1, 2017. We included KPWA members who had a recorded weight measure while they were a resident of King County, Washington, after having been a KPWA member for at least 1 year to help ensure we had sufficient data to estimate the prevalence of comorbid health conditions before their weight measurement. Figure 2 is a flow diagram describing the identification of this cohort.

Follow-Up and Outcomes
We defined the first eligible weight measure of an individual in the cohort to be their baseline measure. We considered a member to be followed at each clinic visit after the baseline visit and censored before the end of follow-up if he or she moved out of King County or was not a member of KPWA for at least 13 months. Once censored, individuals did not rejoin the cohort even if they became KPWA members again. We did not censor women during pregnancy. This will allow us to conduct analyses incorporating pregnancy weight change; however, we anticipate that analyses not focused on pregnancy will need to handle pregnancy episodes appropriately.
The primary outcome of our future analyses will be weight change over time. We intend to focus on weight change rather than BMI change to minimize artifacts that could arise because of the height measurement error in this cohort of adults whose height change should be minimal. Figure 3 is a plot of weight measurements over time, with trajectories of selected study subjects highlighted as examples. There is substantial variability in weight trajectory, follow-up, and within-subject variability over time. Additional analyses will examine changes in glycemic control among patients with type 2 diabetes, as measured by the serum glycosylated hemoglobin test; these outcomes will be described in future manuscripts.

Analyses
The analyses for this cohort description manuscript focused on baseline characteristics of the study cohort, comparison of movers with nonmovers to the full cohort, and exploration of the characteristics of residential moves undertaken by cohort members. All analyses were descriptive and conducted in R for Windows version 3.5.2 (Vienna, Austria).

Exclusions
The records of 4,208,674 clinic visits that included a weight assessment among 286,232 unique adults met initial inclusion criteria. After applying the exclusion and censoring criteria as depicted in Figure 2

Population Characteristics
The final study population was a broad cross-section of King County adults (   c Property values at home address missing from 9.8% of the cohort.

Follow-Up
The baseline visit for approximately 44.1% (101,543/229,755) of the final analytic cohort was in the first 3 years of study enrollment, between January 1, 2005, and December 31, 2007. The mean follow-up was slightly less than 5 years, and follow-up ranged from 1 day to 12 years and 118 days, 3 days shy of the full follow-up period. Weight measures at least 1 year apart were available for 67.0% (154,040/229,755) of subjects, measures at least 5 years apart were available for 31.6% (72,726/229,755) of subjects, and measures at least 9 years apart were available for 16.3% (37,612/229,755) of subjects. In addition, 43.9% (101,053/229,755) of subjects were still enrolled at the end of study follow-up; the most common (87,116/229,755, 37.9%) reason for censoring was that the subject disenrolled from KPWA for at least 13 months.

Moves
Approximately 24.0% (55,152/229,755) of the cohort moved at least once during follow-up. Movers were a somewhat younger subcohort (mean age 41.5 years among movers compared with 45.0 overall) and tended toward longer follow-up (54% followed for 5 years or more compared with 39% overall). This may be because those who remained a member with KPWA for longer had a greater probability of their membership time overlapping with a move. In addition, 67.8% (37,388/55,152) of movers moved only once during the follow-up. Figure 4 is a histogram of residential tenure at each address tracked in the study.
In total, the 55,152 movers made 84,698 moves (Table 3). A total of 45.9% (38,911/84,698) of these moves were less than 5 km in distance, and destinations had residential densities and property values more like origins than would be expected by chance (χ 2 test P<.001). For example, although only 19.8% (16,803/84,698) of moves were initiated from residential locations with densities of 18.7 units/hectare (roughly that of a 1920's era streetcar suburb neighborhood) or more, 53.5% (8962/16,803) of those moves were to destinations that also had residential densities of 18.8 units/hectare or above ( Figure 5, top panel).  those in the premove quintile whose move destination was in the associated postmove quintile. For example, the top right corner of the top panel indicates that 50% (9519/19,107) of those living in locations where residential densities were 18.7 units/hectare or more before a move moved to locations with residential densities of 18.7 units/hectare or more.

Principal Findings
In this population-based, retrospective cohort constructed from KPWA medical records, we have identified 229,755 adults aged 18 to 89 years who lived in King County, Washington, who were continuously enrolled in KPWA for at least 1 year, and for whom at least one weight measure is available for analysis. Of these adults, an average of about 5 years of follow-up was available, and 55,152 moved within the county at least once.
To the best of our knowledge, this is the first large-scale EHR-based cohort developed to assess the impact of residential moves on the health of adults [44]. However, there is prior work assessing neighborhood influences on BMI change in children using EHR data [45], and there is a substantial literature on the reasons that people change the residential location and the process by which movers select residential locations [23,[46][47][48]. Our finding that nearly half of our recorded moves were within 5 km of the initial residential location is consistent with prior findings that moves in Western Washington and elsewhere tend to be within corridors or neighborhoods [49,50]. As short distance moves imply limited changes to neighborhood built environments, substantial statistical power is needed to assess the impacts of moves.

Strengths and Limitations
Indeed, the sample size and considerable follow-up time available are key strengths of this cohort [10,11]. Individual health impacts of built environments are likely to be small in general, but because many people are affected by the same characteristics, impacts that may be small at the individual level can still have large population impacts [1]. Another key strength of our design is our use of EHR cohorts for population inferences [51]; our design may act as a template for future similar studies in other populations in other geographic contexts. The sample size is large for examining health outcomes such as obesity, type 2 diabetes, and hypertension, and data on health outcomes are comprehensive in that they include all diagnoses and treatments paid for by Kaiser Permanente insurance during the study period. More generally, our work was possible only because of a foresighted health system decision to treat residential address as patient data to be recorded longitudinally rather than contact information to be updated without maintaining the old value.
Studies using our cohort will also be subject to several limitations. First, this is an EHR cohort, and the research team is not interacting with study subjects directly, which precludes collecting some data that may be readily available in more conventional cohort designs. For example, there are no available measures of the behaviors through which exposure to neighborhood environments might affect weight change, such as physical activity or diet. Second, because the data were not initially collected for research purposes, some potentially relevant covariates are missing (eg, race/ethnicity, particularly in the early years of the cohort), and we cannot verify whether those data are missing at random. Third, weight change, which captures not only changes in fat mass but also changes in lean mass, can be challenging to interpret as an indicator of health [52]. Fourth, our cohort excludes members who listed a post office box address or whose address otherwise could not be geocoded, who may be different from other members. Fifth, residential address recorded in the EHR does not fully capture a subject's environment, both because residential environment is only a subset of environment encountered and because address in the dataset may only partially reflect the true home location of some members, such as students attending college. Finally, although King County is large and geographically diverse and our cohort demographics resemble those of the county as a whole, county residents are wealthy relative to the rest of Washington State, and the region has fewer African American and Hispanic residents than the country as a whole.

Conclusions
In conclusion, the Moving to Health Cohort is a very large, EHR-based cohort that offers novel potential for identifying neighborhood effects on obesity and obesity-related conditions.