Background: Smartphone apps that capture surveys and sensors are increasingly being leveraged to collect data on clinical conditions. In mental health, this data could be used to personalize psychiatric support offered by apps so that they are more effective and engaging. Yet today, few mental health apps offer this type of support, often because of challenges associated with accurately predicting users’ actual future mental health.
Objective: In this protocol, we present a study design to explore engagement with mental health apps in college students, using the Technology Acceptance Model as a theoretical framework, and assess the accuracy of predicting mental health changes using digital phenotyping data.
Methods: There are two main goals of this study. First, we present a logistic regression model fit on data from a prior study on college students and prospectively test this model on a new student cohort to assess its accuracy. Second, we will provide users with data-driven activity suggestions every 4 days to determine whether this type of personalization will increase engagement or attitudes toward the app compared to those receiving no personalized recommendations.
Results: The study was completed in the spring of 2022, and the manuscript is currently in review at JMIR Publications.
Conclusions: This is one of the first digital phenotyping algorithms to be prospectively validated. Overall, our results will inform the potential of digital phenotyping data to serve as tailoring data in adaptive interventions and to increase rates of engagement.
International Registered Report Identifier (IRRID): PRR1-10.2196/37954
While COVID-19 restrictions begin to end, the crisis in college mental health continues to expand. Recent large-scale studies suggest that the mental health impact of depression and anxiety for college students continues even in mid-2022 . Digital mental health technologies, especially smartphone apps, are a leading tool to help provide more services to students [ ]. Numerous college mental health centers already recommend mental health apps, and many programs are aimed specifically at college students [ ]. Despite the clear potential of apps to provide easy-to-access and interactive mental health resources, their impact to date has been limited [ ]. One leading barrier has been a lack of engagement; many people quickly abandon apps after only a few days [ ]. In this paper, we propose a scalable and data-driven approach to customize daily and weekly app content based on predictive models that enable both personal and automated care.
Smartphone apps are well suited to personalize care as they can gather information related to real-time mental health. Often known as digital phenotyping or smartphone sensing, it is possible, for example, to use signals from a smartphone’s accelerometer to infer sleep behaviors and geolocation to infer mobility patterns. Reviews and research on digital phenotyping in college students suggest that, while digital biomarkers do exist , their effect size is likely small. In our prior research [ ], we have combined these digital biomarkers with brief smartphone surveys to build predictive models of stress, anxiety, and depression. While we have validated these models retrospectively on different data sets of college students, to date there have been no studies exploring their prospective validity and if customizing an app to offer tailored preventive resources may reduce mental health symptoms. Overall, this work aims to prospectively evaluate a model for participant improvement across the study and compare groups that receive personalized interventions via a digital navigator, automated worker, or neither to explore the Technology Acceptance Model (TAM) in college students.
First, we will provide general details about the study, and then, we will address how we plan to achieve these two goals.
Participants, Technology, and App Use
This study will use the open-source mindLAMP app developed by the Digital Psychiatry lab at Beth Israel Deaconess Medical Center to collect survey and sensor data from college student participants . mindLAMP is an app that facilitates survey, digital phenotyping (see below), and app-based intervention all in one platform that runs on Apple and Android smartphones. In this study, GPS, accelerometer, and screen state data will be collected. In addition, the app will be used to administer surveys and provide cognitive games, mindfulness, and other activities. Like earlier iterations of this study, college students will be recruited via social media to complete a screening survey on REDCap [ ]. Given that in-person recruitment remains challenging around COVID-19, online recruitment via social media is practical [ ]. To participate, students must be 18 years or older, score 14 or higher on the Perceived Stress Scale (PSS) [ ], be enrolled as an undergraduate for the duration of the study, own a smartphone able to run mindLAMP, be able to sign informed consent, and pass the run-in period outlined below. We will not exclude students based on any comorbidities. We aim to recruit at least 100 students to start the study in line with our prior pilot studies and the sample sizes used to generate the model we are testing. Given that the effect size of any personalization efforts remain largely undefined, formal power analysis is more challenging; although, we note that this study is larger than prior digital phenotyping studies for college mental health, which have a mean sample size of 81 [ ].
Participants will be sent log-in information for the app and will enter a run-in period. During these 3 days, participants will be asked to complete a survey each day. This run-in period will serve to screen out participants whose devices are not able to capture digital phenotyping data or do not engage with the app at all, and give the study coordinators time to verify that informed consent is signed and dated correctly. The run-in period is designed to help improve overall digital data coverage that is important for validation of the predictive model . After these 3 days, participants who have completed the required surveys and have sufficient GPS data will be moved to the enrollment period of the study. Participants who have not completed the required surveys will be emailed by the study worker automation and given 24 hours to complete these tasks before being automatically discontinued.
Participants will be asked to complete a longer survey each week on the app that includes the Patient Health Questionnaire-9 (PHQ-9) , Generalized Anxiety Disorder-7 (GAD-7) [ ], PSS [ ], UCLA Loneliness Survey [ ], Pittsburgh Sleep Quality Index [ ], Digital Working Alliance Inventory (DWAI) [ ], and TAM-related questions ( ) [ ].
On the first day of the study, participants will also be asked to complete the Prodromal Questionnaire-16  ( A). Participants will have a daily survey each morning on the app that asks about sleep duration and sleep quality, and has questions from the PHQ-9, GAD-7, and PSS ( A). Participants will be compensated for completing the weekly surveys: US $15 for completing one survey between the first and eighth days, US $15 for completing at least one survey between the 8th and 21st days, and finally US $20 for completing at least one more survey between the 21st and 28th days. Students will be paid via Amazon gift card codes.
Throughout the study, engagement will be monitored to ensure that a minimum amount of data is being collected. To promote engagement, the study worker will reach out to participants via email if they have not completed any activities in the past 3 days and encourage them to complete the scheduled activities. If participants have not completed any activities in 5 days, they will be discontinued.
|Component of TAMa and questions
|The app supports me to overcome challenges.
|The app allows me to easily manage my mental health.
|The app makes me better informed of my mental health.
|The app provides me with valuable information or skills.
|The app is easy to use and operate.
|I trust the app to guide me toward my personal goals.
|I believe the app tasks will help me to address my problems.
|The app encourages me to accomplish tasks and make progress.
|I agree that the tasks within the app are important for my goals.
|I want to use the app daily.
|I would want to use it after the study ends.
aTAM: Technology Acceptance Model.
bDWAI: Digital Working Alliance Inventory.
All participants will be scheduled for different therapeutic modules each week. The activities are listed in the app under the participant’s daily task feed. The components of the study are shown in.
These modules include content created specifically for college students. For the first week, all participants will be scheduled for gratitude journaling. In the second and fourth weeks, participants will learn about different types of thought patterns and practice recoding and rationalizing their thoughts (B). Screenshots of the app modules are shown in .
We have evaluated improvement (change in GAD-7 scores) in a prior study , which is shown in . Each participant’s change in GAD-7 is shown by a line going from their start-of-week to end-of-week score. Overall, it is difficult to determine in this small data set if one module is better than the other. However, it seems that participants with higher GAD-7 scores may not improve as much with mindfulness as compared to cognitive distraction games. Thus, in the third week, participants will be scheduled for either mindfulness or cognitive distraction games based on whether they had low (≤10) or high (>10) GAD-7 scores on the initial weekly survey. Participants who do not complete this initial survey within the first week will be discontinued.
Engagement Theory and Study Design
To address our second aim exploring engagement, we adapted the TAM as a theoretical framework . The TAM is the most widely used model to study engagement around digital health technologies. In the TAM, both perceived ease of use and perceived usefulness influence attitude toward technology, which in turn impacts behavioral intention to use (B) and actual system use. In this study, attitude toward technology will be measured by the DWAI [ ]. The predictive models offering tailored resources should increase perceived usefulness and thus attitude toward technology and actual engagement compared to a control group receiving a scheduled set of resources.
However, increasing perceived usefulness may not be enough, as recent studies suggest the need for a social, or at least human, interaction to drive engagement. It is currently unclear if this interaction would have the largest effect on perceived usefulness, attitude toward technology, or behavioral intention to use, and thus, we will perform an exploratory analysis around this question. The study will be split into three groups. For those in the first group, digital navigators  will provide human support and reach out every fourth day to suggest a different module based on whether the algorithm (described below) predicts future symptom worsening or improvement. In this study, navigators are research assistants who have been trained in our 10-hour curriculum on how to provide technical and engagement support for people using health apps [ ]. They will use email to communicate with participants, although we automated much of the role for this study as outlined below. For those assigned to the automation arm, modules will be suggested every fourth day by the automated study worker bot via email. Our automation platform will generate emails, and those assigned to the bot group will receive that email, while those assigned to the digital navigator group will have theirs reviewed and signed by such. Finally, for those assigned to the third arm, or the null group, there will be no modules suggested or automation/digital navigator interaction. Study staff will be available to answer any study questions from all participants. The reason for activities being suggested every 4 days is to allow participants to practice the suggested skills and resources, and allow a window for these to impact symptoms. Upon enrollment, participants will be sequentially assigned to one of three groups: the automated group, the digital navigator group, or the null group ( ).
Machine Learning Model for Engagement Intervention
We present a logistic regression model trained on the passive data features of a prior study of college students to predict whether daily survey scores would increase by one or more (any decrease in mental health). The model will be used to predict every fourth day if there will be an increase in reported symptoms. The model is used to demonstrate the feasibility of applying a data-driven approach to activity suggestions. On these days, students in the digital navigator group and the automated group will receive a suggestion via email for an additional activity to complete from either a digital navigator or the automation worker bot, respectively. On days with an expected increase, a cognitive behavioral therapy–based exercise will be assigned, and on days without an expected increase, a mindfulness exercise will be assigned. These activities will be pulled sequentially from a predefined list and will be different from the weekly activities (C). In addition, participants will be asked to complete a 3-question survey about their attitude and behavioral intention toward the app after completing the survey ( A) as a measure of engagement.
The model was fit using data from the second iteration of the college mental health study using leave-one-patient-out cross-validation on the difference between each of the passive data features from 2 days ago to the previous day to predict a score increase of one or more from the previous to the current day. The implementation of the passive data features used in the model can be found on GitHub . The Scikit-Learn LogisticRegression model was used with a 1:1 ratio of 0.5 [ ]. Class weights were balanced, and all input features were standardized. The final model coefficients ( ) are an average of the coefficients of each model. The area under the curve (AUC) over all the combined cross-validated folds was 0.648.
|GPS data coverage
Symptom Improvement Model
To achieve our first aim, we present an additional logistic regression model to predict if participants will improve by at least 25% by the end of the study on the weekly surveys from the average of all features over the course of the study. The model was trained on data from the first iteration of the college study  and tested on the second iteration of the college study to test model generalization. The AUC scores are shown in . The features used in the model and a table of nonzero model coefficients can be found in D.
Both previous versions of the study recruited college students to participate in a 28-day study taking daily and weekly surveys. Differences included the time the study was performed (version 1 collected data from December 2020 to May 2021, and version 2 collected data from November to December 2021) and the module activities (version 1 had no assigned activities, and version 2 had four set modules: thought patterns, journaling, mindfulness, and cognitive distraction games).
|Area under the receiver operating characteristic curve
|Patient Health Questionnaire-9
|Generalized Anxiety Disorder-7
|Perceived Stress Scale
|UCLA Loneliness Scale
|Pittsburgh Sleep Quality Index
End of the Study
The activity schedule will finish after 28 days in the enrollment period. However, if participants have not completed their final weekly survey, they will be given up to 4 additional days to complete this survey and receive compensation. At 32 days, all remaining participants will be marked as completed, and their sensor data collection will be turned off.
Study Automation and Data Coverage
To enable scalable research, we will build upon the digital study infrastructure used in our prior studies . All parts of the study will be automated via workers implemented in Python. We have added new features to the codebase, including a worker that will update a Google Sheet with study information such as the status of different participants in the study, payment form completion, and which activities have been assigned. In addition, automated Slack notifications will be sent to the team to help manage the study ( ). These improvements will provide an easy way for the study team to track study progress.
Passive and active data coverage will additionally be monitored throughout the study via Slack notifications sent to the study team and graphs on the data portal (). Graphs will include participant GPS, accelerometer, and screen state coverage over the past week, days since the last activity, previous week’s module completion, and previous week’s daily/weekly survey counts. These graphs will allow researchers to monitor for any study-wide data collection issues and track overall participant engagement at a high level.
In addition to these researcher-facing metrics, participants will receive a weekly progress email telling them their streak, number of weekly and daily surveys completed, and module completion to promote engagement. The code for the study workers can be found on GitHub .
For any participants who indicate thoughts related to self-harm or suicide as noted by a score of 3 on question number nine of the PHQ-9, an alert will be sent to study staff by the automated study worker, and the principal investigator or covering licensed clinicians will reach out to the student within the same business day to conduct a safety assessment. If the student cannot be reached via phone or email after 24 hours, we will notify the local student mental health services. At the same time a participant records an elevated thought of self-harm or suicide, the app also displays a reminder that it is not a replacement for emergency care and that study staff cannot respond in real time, and provides links and phone numbers to resources.
This study was approved by the Beth Israel Deaconess Medical Center institutional review board (protocol 2020P000310). Data is not available to share, but the smartphone app and feature processing code are.
The first key goal of this work is to prospectively evaluate a model predicting improvement across the study. Second, we aim to analyze the effectiveness of suggesting personalized modules to participants. We will compare the improvement of the automated and digital navigator groups to see if there is a significant effect of having a person versus artificial intelligence delivering information. We will also compare the automated and digital navigator groups with the null group to see whether suggested modules and interaction during the study increases engagement or improvement. As a secondary outcome, we will perform an ANOVA analysis to compare the TAM questions across the three study groups, acknowledging that this type of analysis is novel and that the results of individual questions will be challenging to compare to prior literature. The study was completed in the spring of 2022, and the results will be published with JMIR Publications.
The results of this study will inform both data science and clinical engagement questions around digital college mental health. First, by prospectively testing our algorithms on a unique sample, we can determine both their reliability and validity. Second, by assessing engagement outcomes with digital navigators versus automations versus a control group, we can learn how to best increase the use of apps and build mechanistic understanding using the TAM.
While many smartphone digital phenotyping biomarkers and algorithms have been proposed across the mental health field and even specifically for college mental health , none have ever been prospectively validated. In our prior research, we have been able to retrospectively validate findings, used to inform this study’s methods, on older data sets [ , ].
Beyond their predictive ability, the results around the validity of the digital phenotyping biomarkers hold potential for advancing adaptive interventions . A key component of adaptive interventions is the tailoring variable that is used to customize treatment at each decision time point [ ]. While most tailoring variables are static (eg mood score above a predetermined threshold at a certain time), digital phenotyping biomarkers could serve as more dynamic tailoring variables that would enable more personalized treatments. Given that smartphones themselves can serve as platforms to offer these adaptive interventions, the results around optimal tailoring variables are highly relevant.
The digital navigator group, as well as the control group, offer useful comparisons that must be considered. Digital navigators are increasingly used to increase engagement although at the price of greater scalability. Still, most apps today are not supported by either digital navigators or algorithms, so comparing outcomes to a control group can help assess any potential benefit. Additionally, it remains difficult to determine which activities are best for participants or which interventions should be assigned in real time in response to passive data changes. This challenge makes it difficult to truly personalize app recommendations. However, it may be the case that providing expert or data-driven suggestions to the participant introduces a placebo effect that improves engagement and attitude toward the app regardless of the actual usefulness of the activity. Although difficult to explore in this study, comparing different app activities is an interesting area of future work.
Further secondary outcomes related to the TAM can also help inform mechanistic-based understanding of engagement. While many prior studies, including our own, have examined outcomes like usability, fewer have explored why apps are engaging. Even if our results are negative around engagement, learning how TAM scores change over time and correlate to rates of app use will inform how future versions of mindLAMP can be improved.
There are limitations to this protocol. For secondary outcomes regarding automated interventions, given that our model here has a low AUC, the results will have to be interpreted with caution. While our study is designed to prospectively validate the symptom algorithm, it is not powered around the secondary engagement outcomes. This is in part due to the effect size for different engagement strategies like digital navigators and personalization remaining poorly defined. Thus, our results can help inform future study design.
Like our prior studies, our research is fully reproducible. We offer details of our recruitment process and procedures in this paper that outlines details of our recruitment, screening, and data coverage procedures . The mindLAMP app remains open-source software currently deployed at over 50 clinical sites worldwide, and our algorithms are also publicly accessible via GitHub [ ]. This enables others to validate and expand upon our work transparently. While not a study outcome, the decentralized clinical trial mechanism used in this study offers a practical example of how digital phenotyping research can be done in a remote yet scalable manner.
Data from this study is not available given the personal identifiable nature of the information. However, the mindLAMP app and processing code are freely available.
Conflicts of Interest
Additional figures and supplementary material.DOCX File , 33 KB
- Kessler RC, Ruhm CJ, Puac-Polanco V, Hwang IH, Lee S, Petukhova MV, et al. Estimated prevalence of and factors associated with clinically significant anxiety and depression among US adults during the first year of the COVID-19 pandemic. JAMA Netw Open 2022 Jun 01;5(6):e2217223 [FREE Full text] [CrossRef] [Medline]
- Melcher J, Camacho E, Lagan S, Torous J. College student engagement with mental health apps: analysis of barriers to sustained use. J Am Coll Health 2022;70(6):1819-1825. [CrossRef] [Medline]
- Melcher J, Torous J. Smartphone apps for college mental health: a concern for privacy and quality of current offerings. Psychiatr Serv 2020 Nov 01;71(11):1114-1119. [CrossRef] [Medline]
- Melcher J, Lavoie J, Hays R, D'Mello R, Rauseo-Ricupero N, Camacho E, et al. Digital phenotyping of student mental health during COVID-19: an observational study of 100 college students. J Am Coll Health 2021 Mar 26:1-13. [CrossRef] [Medline]
- Baumel A, Muench F, Edan S, Kane JM. Objective user engagement with mental health apps: systematic search and panel-based usage analysis. J Med Internet Res 2019 Sep 25;21(9):e14567 [FREE Full text] [CrossRef] [Medline]
- Currey D, Torous J. Digital phenotyping correlations in larger mental health samples: analysis and replication. BJPsych Open 2022 Jun 03;8(4):e106 [FREE Full text] [CrossRef] [Medline]
- Vaidyam A, Halamka J, Torous J. Enabling research and clinical use of patient-generated health data (the mindLAMP platform): digital phenotyping study. JMIR Mhealth Uhealth 2022 Jan 07;10(1):e30557 [FREE Full text] [CrossRef] [Medline]
- Patel SK, Torous J. Exploring the neuropsychiatric sequalae of perceived COVID-19 exposure in college students: a pilot digital phenotyping study. Front Psychiatry 2021;12:788926. [CrossRef] [Medline]
- Low DM, Rumker L, Talkar T, Torous J, Cecchi G, Ghosh SS. Natural language processing reveals vulnerable mental health support groups and heightened health anxiety on reddit during covid-19: observational study. J Med Internet Res 2020 Oct 12;22(10):e22635 [FREE Full text] [CrossRef] [Medline]
- Cohen S, Kamarck T, Mermelstein R. A global measure of perceived stress. J Health Soc Behav 1983 Dec;24(4):385-396. [Medline]
- Melcher J, Hays R, Torous J. Digital phenotyping for mental health of college students: a clinical review. Evid Based Ment Health 2020 Nov;23(4):161-166. [CrossRef] [Medline]
- Currey D, Torous J. Increasing the value of digital phenotyping through reducing missingness: a retrospective analysis. medRxiv Preprint posted online on May 17, 2022. [CrossRef]
- Kroenke K, Spitzer RL, Williams JBW. The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med 2001 Sep;16(9):606-613 [FREE Full text] [CrossRef] [Medline]
- Spitzer RL, Kroenke K, Williams JBW, Löwe B. A brief measure for assessing generalized anxiety disorder: the GAD-7. Arch Intern Med 2006 May 22;166(10):1092-1097. [CrossRef] [Medline]
- Russell DW. UCLA Loneliness Scale (Version 3): reliability, validity, and factor structure. J Pers Assess 1996 Feb;66(1):20-40. [CrossRef] [Medline]
- Buysse DJ, Reynolds CF, Monk TH, Berman SR, Kupfer DJ. The Pittsburgh Sleep Quality Index: a new instrument for psychiatric practice and research. Psychiatry Res 1989 May;28(2):193-213. [CrossRef] [Medline]
- Henson P, Wisniewski H, Hollis C, Keshavan M, Torous J. Digital mental health apps and the therapeutic alliance: initial review. BJPsych Open 2019 Jan;5(1):e15 [FREE Full text] [CrossRef] [Medline]
- Davis FD. A technology acceptance model for empirically testing new end-user information systems: theory and results. Massachusetts Institute of Technology 1986:1-2 [FREE Full text]
- Loewy RL, Bearden CE, Johnson JK, Raine A, Cannon TD. The prodromal questionnaire (PQ): preliminary validation of a self-report screening measure for prodromal and psychotic syndromes. Schizophr Res 2005 Sep 15;77(2-3):141-149. [CrossRef] [Medline]
- Wisniewski H, Torous J. Digital navigators to implement smartphone and digital tools in care. Acta Psychiatr Scand 2020 Apr;141(4):350-355 [FREE Full text] [CrossRef] [Medline]
- Wisniewski H, Gorrindo T, Rauseo-Ricupero N, Hilty D, Torous J. The role of digital navigators in promoting clinical care and technology integration into practice. Digit Biomark 2020;4(Suppl 1):119-135 [FREE Full text] [CrossRef] [Medline]
- Currey D, Hays R, D'Mello R, Scheuer L, Vaidyam A, Lavoie J, et al. LAMP-cortex. GitHub. 2022. URL: https://github.com/BIDMCDigitalPsychiatry/LAMP-cortex [accessed 2022-11-14]
- Abraham A, Pedregosa F, Eickenberg M, Gervais P, Mueller A, Kossaifi J, et al. Machine learning for neuroimaging with scikit-learn. Front Neuroinform 2014;8:14. [CrossRef] [Medline]
- Currey D, Hays R, Vaidyam A. mindLAMP College Study V3. GitHub. 2022. URL: https://github.com/BIDMCDigitalPsychiatry/LAMP-college-study/tree/college_v3 [accessed 2022-04-29]
- Teepe GW, Da Fonseca A, Kleim B, Jacobson NC, Salamanca Sanabria A, Tudor Car L, et al. Just-in-time adaptive mechanisms of popular mobile apps for individuals with depression: systematic app search and literature review. J Med Internet Res 2021 Sep 28;23(9):e29412 [FREE Full text] [CrossRef] [Medline]
- Perski O, Hébert ET, Naughton F, Hekler EB, Brown J, Businelle MS. Technology-mediated just-in-time adaptive interventions (JITAIs) to reduce harmful substance use: a systematic review. Addiction 2022 May;117(5):1220-1241. [CrossRef] [Medline]
- Bilden R, Torous J. Global collaboration around digital mental health: The LAMP Consortium. J Technol Behav Sci 2022;7(2):227-233 [FREE Full text] [CrossRef] [Medline]
|AUC: area under the curve
|DWAI: Digital Working Alliance Inventory
|GAD-7: Generalized Anxiety Disorder-7
|PHQ-9: Patient Health Questionnaire-9
|PSS: Perceived Stress Scale
|TAM: Technology Acceptance Model
Edited by T Leung; submitted 13.03.22; peer-reviewed by J Lipschitz, B Nievas Soriano, K Denecke; comments to author 29.06.22; revised version received 18.07.22; accepted 27.10.22; published 29.11.22Copyright
©Danielle Currey, John Torous. Originally published in JMIR Research Protocols (https://www.researchprotocols.org), 29.11.2022.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Research Protocols, is properly cited. The complete bibliographic information, a link to the original publication on https://www.researchprotocols.org, as well as this copyright and license information must be included.