This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Research Protocols, is properly cited. The complete bibliographic information, a link to the original publication on http://www.researchprotocols.org, as well as this copyright and license information must be included.
Advancing technology has increased functionality and permitted more complex study designs for behavioral interventions. Investigators need to keep pace with these technological advances for electronic data capture (EDC) systems to be appropriately executed and utilized at full capacity in research settings. Mobile technology allows EDC systems to collect near real-time data from study participants, deliver intervention directly to participants’ mobile devices, monitor staff activity, and facilitate near real-time decision making during study implementation.
This paper presents the infrastructure of an EDC system designed to support a multisite HIV biobehavioral intervention trial in Los Angeles and New Orleans: the Adolescent Medicine Trials Network “Comprehensive Adolescent Research & Engagement Studies” (ATN CARES). We provide an overview of how multiple EDC functions can be integrated into a single EDC system to support large-scale intervention trials.
The CARES EDC system is designed to monitor and document multiple study functions, including, screening, recruitment, retention, intervention delivery, and outcome assessment. Text messaging (short message service, SMS) and nearly all data collection are supported by the EDC system. The system functions on mobile phones, tablets, and Web browsers.
ATN CARES is enrolling study participants and collecting baseline and follow-up data through the EDC system. Besides data collection, the EDC system is being used to generate multiple reports that inform recruitment planning, budgeting, intervention quality, and field staff supervision. The system is supporting both incoming and outgoing text messages (SMS) and offers high-level data security. Intervention design details are also influenced by EDC system platform capabilities and constraints. Challenges of using EDC systems are addressed through programming updates and training on how to improve data quality.
There are three key considerations in the development of an EDC system for an intervention trial. First, it needs to be decided whether the flexibility provided by the development of a study-specific, in-house EDC system is needed relative to the utilization of an existing commercial platform that requires less in-house programming expertise. Second, a single EDC system may not provide all functionality. ATN CARES is using a main EDC system for data collection, text messaging (SMS) interventions, and case management and a separate Web-based platform to support an online peer support intervention. Decisions need to be made regarding the functionality that is crucial for the EDC system to handle and what functionality can be handled by other systems. Third, data security is a priority but needs to be balanced with the need for flexible intervention delivery. For example, ATN CARES is delivering text messages (SMS) to study participants’ mobile phones. EDC data security protocols should be developed under guidance from security experts and with formative consulting with the target study population as to their perceptions and needs.
DERR1-10.2196/10777
Data collection systems are integral to the design of behavioral intervention trials and have rapidly evolved from systems that are limited by rudimentary data collection tools, such as pen-and-paper assessments, to electronic data capture (EDC) systems that incorporate mobile phones and other mobile data collection devices. Advancing technology has increased functionality and permitted more complex study designs. To date, papers on mobile EDC systems have focused on improving health care delivery, disease surveillance, and epidemiological surveys in resource-poor settings [
As a basic role, EDC systems support data collection and increase the capacity to collect and store different types of data, compared with non-EDC systems. The ability of EDC systems to link to different mobile devices offers flexibility in capturing information over different time intervals and locations and from different sources. Our prior work has used EDC systems to store biological, anthropometric, social network, and self-reported measures that were collected over several time points as part of large-scale behavioral intervention trials in South Africa [
The ability to link EDC systems to study participants’ mobile devices provides additional opportunities for intervention delivery, referred to as ecological momentary interventions [
The EDC system for ATN CARES incorporates multiple functions summarized above: data collection, mobile phone-based intervention delivery, and real-time data access for timely data quality monitoring and decision making that pertains to participant care. Most mobile EDC systems utilized in the past did not have multiple functions [
Study participants are youth living with HIV (YLH) and HIV-negative high-risk youth (HRY). By focusing on YLH and HRY, we highlight the benefits of what an EDC system can provide for a tech-savvy study population that already uses mobile phones to a high degree in its daily routines [
The objective of our paper is to present the design of an EDC system for the CARES trials and show how multiple EDC functions introduced above can be integrated into a single EDC system to support a large-scale behavioral intervention trial.
A suite of interventions, “CARES,” is being conducted through the HIV ATN. Three separate studies share an overarching aim to address the increasing HIV epidemic among youth aged 12-24 years. Toward this goal, interventions were developed for YLH and HRY. Recruitment began in May 2017. Participants are being recruited through social service agencies, homeless shelters, HIV care clinics, and clinic referral in Los Angeles (LA) and New Orleans (NO). During recruitment, interested youth are screened, which seeds a case in the EDC. Eligible youth are consented and enrolled into one of three studies based on HIV test results and responses to screening questions. Enrolled youth are then administered a baseline assessment and are repeatedly assessed at 4-month intervals over a 2-year period. YLH are administered blood tests for HIV viral loads, and HRY are tested for HIV seropositivity. Both cohorts are tested for the presence of sexually transmitted infections (STI) and administered drug screenings and assessments to self-report on HIV-transmission risk behaviors.
Assignment to one of the following three studies occurs at enrollment.
The EDC system was developed through the
The development of the EDC system began with meetings between the ATN CARES research team and
Roles, purpose for accessing electronic data capture (EDC) system, and level of access granted for research staff who use the EDC system.
Role name | Access |
Interviewer | Enter assessment, laboratory testing, and locator data via |
Coach | Enter semistructured coaching log via apps. Have access to reports and messaging utility, including sending out a group message to participants. View participant data. |
Study manager | Complete research-related actions by submitting data such as rerandomization, reassigning a life coach for participants, or making stepped care decision. View reports and submitted data. Edit submitted data. |
Data manager | Create and edit |
Schematic for roles of 4 types of research staff who access the electronic data capture (EDC) system and the study flow that pertains to each role during the study period.
Both roles enable one to view the following key information for each participant: basic demographics, participant tracking information (eg, preferred methods of contact and contact information), session calendars that contain scheduled interviewer assessment dates or coaching session dates depending on the role, study participants’ progress in terms of completed assessments, and field notes that are entered by interviewers and coaches. Coaches have access to all notes, while interviewers have access only to interviewer notes.
Study managers utilize the EDC system to act upon a participant’s study status directly, such as investigator actively withdrawing a participant from the study, assigning a specific coach to a participant, or rerandomizing a participant if the HIV serum status changes. Study managers, coaches, and interviewers only have access to data collected in their city of residence (LA or NO). Restricted access not only adds a layer of protection for participant data but also streamlines research staff workflows. For example, coaches can more quickly locate, access, and browse participant notes than if they had to filter information from participants across both cities. The data manager has access to data across both cities. The data manager troubleshoots day-to-day data entry issues and can update any aspect of the EDC systems to reflect new study needs, including a change in assessment questionnaires and updating system reports or data entry workflow. In addition, the data manager can assist study managers in editing submitted data when data entry errors occur. Furthermore, the data manager creates reports for quality and progress monitoring of study teams.
Once the study flow was established, the research team worked with
Forms were initially developed outside of the EDC system for easy sharing and viewing by research team members, including members who would not be accessing the EDC system but gave input on its design. For example, the baseline and follow-up assessments were developed in Microsoft Word. Once finalized, forms were then built in the EDC system using one of the following three methods: (1) built directly within the EDC system by
Prior to study initiation, study workflow was tested by conducting screening, enrollment, and baseline assessment procedures with mock study participants. This gave us a chance to ensure that the research staff understood how to use the EDC system, that questions and prompts on each form were properly specified, and that the EDC system was assigning participants to appropriate studies and randomizing when required. Mock interviews with study participants were supplemented by scenarios that were generated by the data manager to ensure that all possible scenarios were tested, even scenarios that were not anticipated to occur very often. For example, a scenario was generated for a YLH who was initially enrolled into ATN 149 but then tested HIV positive at a later date, requiring study reassignment and rerandomization if the youth was assigned to ATN 148. Testing was conducted in an iterative fashion where the data manager made corrections and enhancements to the EDC system specifications after each round of testing.
Several general system requirements were considered throughout the development phase for daily tasks across all three studies. The EDC system was designed to (1) integrate data entry tasks into a single EDC system across all three studies and both cities where the study is taking place (LA and NO); (2) be intuitive for user navigation so that minimum training for study staff would be required; (3) be easy to modify so that EDC system changes can be programmed in a timely manner by a trained data manager or other research staff member; (4) support and maintain both incoming and outgoing SMS text messages; and (5) offer high-level data security during data collection in the field and for data storage. More details on each system requirement are provided below.
Data across all three studies in LA and NO are entered and accessed through 2 points of access: a
Integration required the assignment of a unique participant identifier for each study participant in the EDC system across various data sources, such as survey, study management, and lab sample collection and storage. A participant is opened as a “case” within the EDC system and assigned a unique participant ID when they express interest in study participation, complete the screener, and are determined to be eligible based on the screening data. The same participant ID is then used for tracking lab specimen data and weekly survey responses. Integration was made possible by carefully planning all the decision points and data sources at the outset so that they could be captured in a single EDC system; EDC systems typically only collect study outcome data. The following paragraphs provide examples of study tasks and decision points that are often conducted outside of EDC systems but managed by the CARES EDC system.
Eligibility, study assignment, and randomization are all conducted within the EDC system.
Study assignment schematic showing assignment to one of three studies (Adolescent Medicine Trials Network, ATN, 147-149) through the electronic data capture (EDC) system after screening; ATN 149 participants who test positive for HIV during the follow-up period are reassigned to ATN 147 or 148. ART: antiretroviral therapy.
ATN 148 and 149 are RCTs and require participants to be randomized after study enrollment is completed by entering rapid diagnostic test results. Study participants are automatically randomized to ATN 148 and 149 study arms within the EDC system. An exception occurs if study participants change studies and need to be randomized again. For example, it is possible for an HRY in ATN 149 to become HIV positive during the study (
Several randomization challenges were addressed so that randomization schemes could be programmed within the EDC system for ATN 148 and 149. RCT randomization schemes are typically stratified on one or more sociodemographic participant characteristics and site such as clinics. Randomization within the EDC system required careful consideration of how best to capture key variables and keep the scheme simple so that it could be programmed within the system. It was determined that the EDC system would feasibly accommodate 2 levels of stratification. Interviewer comprises the first stratification level and is a proxy for the site. An interviewer was used instead of site because the
The interaction log records all nonscheduled interactions that any study staff have with participants, including content areas such as simple relationship building, participants updating contact information, rescheduling of participants’ appointments, confirmation of participants’ STI treatment, and so forth. For each interaction, we collect parameters that include date of the interaction and methods via which the interaction was completed (eg, phone, in-person, SMS text message, email, or social media). Specifically, we also require the study staff to report any failed interaction attempt to track staff effort required for each participant. This information is used for supervision and problem solving to maximize study retention.
In addition to the interaction log, coaches record coaching activities through a separate log. This log records semistructured coaching activities, including coaching content areas and skills utilized, date of coaching, and the method of contact. The EDC systems record the length of each coaching session so that we can track coaching effort required for each participant assigned with a coach. Logged coaching information helps coaching supervisors to monitor and support coaching performance and better assign coaching workload. In addition, coaches are aided as they can refer to prior information through the mobile app to inform current interactions with participants. These data can also be used for analyses on engagement or utilization and dose-response effects.
Despite efforts to capture data through a single EDC system, there are 3 instances where data are collected outside of the EDC system. First, participants were initially administered weekly surveys through SMS text messages. Response rates were low. Based on participant feedback, greater flexibility was provided to participants in filling out weekly surveys through SMS text message or an email prompt. Participants who receive email prompts are linked to a Web-based survey that is conducted through Research Electronic Data Capture. Second, participants who are randomized to receive peer support as an intervention component are asked to participate in a private social media discussion forum that is hosted through
The EDC system was developed to be intuitive to use to reduce user burden, minimize data entry error, and, in turn, reduce data cleaning. For data entry purposes, all EDC system users (shown in
Intuitive use is also aided by built-in logic checks and data point validations. Users cannot enter information into subsequent fields within forms that clash with prior information that was entered into the system. For example, a warning will show up if an interviewer tries to enter an HIV viral load result for a seronegative participant. Information is also shared between forms; this means that information such as gender at birth and age that was collected during screening does not need to be asked again in the baseline assessment. Many participant characteristic variables such as HIV status and birth sex are used to set up appropriate skip patterns including for follow-up assessments. For example, questions regarding pregnancy at the baseline interview are only asked among female participants. HIV stigma questions are only asked among participants who had been positive for more than 4 months at the time of recruitment, that is, the baseline interview.
The EDC system is set up so that changes to any of the study forms can be programmed in a timely manner without any special programming abilities by a trained research staff member, mainly the data manager.
Screenshots of CommCare mobile app that links to the electronic data capture system.
The EDC system was set up to manage all SMS text messaging with study participants. This occurs in 3 instances. Shortly after study enrollment, participants receive a welcome SMS text message that also contains information on intervention arm assignment, if participants are assigned to ATN 148 or 149. For example, if a participant is assigned to have a coach, both the participant and the coach receive an immediate SMS text message informing them of the initiation of their coaching relationship. Participants who are randomized to receive peer support are sent an SMS text message invitation to join a peer support group with a weblink to the registration page. All participants receive weekly SMS text message surveys and daily health promotion messages that span 5 domains as follows: general health, mental health, sexual health, substance abuse, and medication adherence. The specific messages that are sent to participants are based on one of four risk profiles that they are assigned to within the EDC system: gay, bisexual, and transgender youth (GBTY) living with HIV, non-GBTY who are living with HIV, high-risk GBTY, and high-risk non-GBTY.
The EDC system is set up to meet Web standards for compliance with the Health Insurance Portability and Accountability Act. The EDC system incorporates redundant safeguards to protect participant data. For example, mobile devices used for data entry are password-protected, saved form data are encrypted on the device and during transmission, and form submissions are transmitted and removed from devices as soon as internet connectivity is established.
Screenshot of the electronic data capture system interface that is used by the data manager to create and edit assessment questions.
By using
At the time of writing, CARES studies have been in the field for approximately 10 months from May 2017 to early July 2018; recruitment is ongoing. We present sample sizes to summarize information that has been catalogued and managed through the EDC system and, in turn, to underscore the performance of the EDC system across studies 1-3. Of 1053 youth who have been screened (576 in LA and 477 in NO), 812 have been recruited and enrolled (408 in LA and 404 in NO). Study 1 enrolled 26 YLH, study 2 enrolled 70 YLH, and study 3 enrolled 716 HRY. Four-month assessments (ie, assessments at the first follow-up) have been conducted on 608 participants. In addition, the EDC system is being used to facilitate intervention delivery as discussed above. For example, a total of 16,994 daily SMS text messages have been sent to 627 participants’ phones, and 432 participants have filled out at least one weekly SMS text message survey as part of the AMMI. Among YLH in study 2 who have been randomized to the Stepped Care arm, 3 have stepped up from the AMMI to the next level of support.
We experienced several challenges after the implementation of the EDC system. Most of these challenges were resolved by modifying the system. We discovered multiple discrepancies between real-world field worker (ie, coach and interviewer) needs and workflow scenarios that the research team was able to envision and test before the EDC system was launched in the field. For example, the initial HIV clinical visit would be the ideal setting for the research staff to collect all baseline information for acute-infected YLH. However, the impact of learning of an HIV diagnosis during the first clinical visit is potentially overwhelming for YLH. It was decided that alternative study protocols should be set up so that research staff do not need to go through all the risk behavior questions listed in the screening form or any other required forms with newly diagnosed YLH during enrollment. As a result, we built additional features within the EDC system that allow interviewers to circumvent data entry into some of the enrollment forms that typically require data entry if an HIV-positive test result is also entered into the EDC system. A participant ID is still generated for lab specimen tracking. This alternative workflow allows interviewers to focus on the consent and blood draw with participants during their first clinical visit.
Despite the EDC system training for field workers prior to study implementation, the major challenges around the EDC system lie in data quality control among field workers. In particular, accurately logging ad-hoc participant interactions proved to be especially challenging for field workers. First, we found that field workers tend to prefer to enter information into the open-ended running notes as opposed to the prespecified types of interactions that are captured by form fields. Second, open-ended note documentation styles vary greatly among field workers, making a quick review of the notes across field workers harder. To address this, we first expanded the capabilities of the interaction log to capture additional data regarding field workers’ interactions with participants that we could not anticipate. Field workers were then required to memorize what information could be collected within the interaction log. Furthermore, the data management team put together additional data entry training sessions to standardize the use of open notes after consulting with the intervention team.
Another issue came up regarding daily SMS text messages and weekly surveys that the EDC system sends to participants’ phones. Participants can discontinue the receipt of SMS text messages and surveys during the study. Our original plan was that participants would have to contact the study staff to discontinue the receipt of daily messages and weekly surveys. We found that the SMS text message server is legally required by Federal Trade Commission regulations to allow a participant to opt out of all SMS text message services by replying to the daily message or weekly service with a “STOP” command. Obviously, those who opt out of the SMS text message intervention component will not receive the full intervention, essentially creating an additional ad-hoc intervention arm. Fortunately, the EDC system allows study managers to track the list of participants who opt out, and subsequently, field workers can reach out to study participants to attempt to re-engage them. In the least, the study team can track participants who self-select the ad-hoc intervention arm.
This paper describes an EDC system that is currently being used to support the implementation of a large-scale HIV intervention trial across multiple sites. Several key considerations underpinned the EDC system infrastructure development, and they should also be considered in the design of other EDC systems, regardless of study-specific requirements.
First, research teams need to decide whether EDC systems will be developed in-house or through a third-party vendor. The ATN CARES team chose the latter option to move away from the “pilotitis” paradigm, where a lot of money is spent on developing mHealth apps from the ground up that are not sustainable [
In the end, we decided to use the
Second, the needs of the target population and the protection of electronic data collected on the target population were the top priorities in the development of the EDC system. A number of electronic safeguards were put in place to ensure a high degree of security as described in the Methods section. In addition to electronic safeguards, there is no substitute for the formative work that precedes the implementation of any behavioral intervention trial. In an iterative fashion, the CARES research team discussed all study procedures and presented study protocols to community advisory boards that comprised the target population, prior to the development of the EDC system. The SMS text message content, for example, was highly vetted and pilot-tested by the CARES research staff prior to the implementation of the study. A priority was placed on developing the SMS text message content that is culturally sensitive and would not contain any specific references to HIV; medication reminders were generic to encourage general medication adherence.
A limitation of the EDC system is that it does not support all data entry tasks and technology functions that the intervention requires. For example, the EDC system does not support online peer support groups; a separate system was set up through
The EDC system that was developed for the ATN CARES trial presents as an example of how a commercially available and integrated data capturing system can meet all system requirements and support almost all needs of a large-scale research trial. While most EDC challenges can be resolved through programming updates, there are important considerations prior to the database implementation, such as identifying an appropriate EDC vendor with research experience and consulting with the target study population in developing participant-friendly study content.
Automated Messaging and Monitoring Intervention
antiretroviral therapies
Adolescent Medicine Trials Network
Comprehensive Adolescent Research & Engagement Studies
electronic data capture
ecological momentary assessment
gay, bisexual, and transgender youth
high-risk youth
Los Angeles
New Orleans
randomized controlled trial
short message service
sexually transmitted infections
youth living with HIV
This study was supported by the Eunice Kennedy Shriver National Institute of Child Health and Human Development (U19HD089886). WSC’s time was also supported by the National Institute of Mental Health through the Center for HIV Identification, Prevention, and Treatment Services (P30MH058107). We would like to thank the study participants for their time commitment in participating in the study and helping advance the field of HIV prevention and treatment.
The following individuals contributed to the study: Mary J Rotheram-Borus, Sue E Abdalian, Maria Isabel Fernandez, Jeffrey D Klausner, Sung-Jae Lee, Maryann Koussa, Leslie Kozina, Manuel Ocasio, Robert E Weiss, Ronald Brookmeyer, Karin Nielsen, Yvonne Bryson, Tara Kerin, Chelsea Shannon, Ruth Cortado, Kate Mitchell, Elizabeth M Arnold, Norweeta Milburn, Cathy Reback, Marguerita Lightfoot, Danielle Harris, and Jasmine Fournier.
JW and AC work for Dimagi Inc, the company that developed the open source mobile data collection platform used in this study. Other authors have nothing to declare.