This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Research Protocols, is properly cited. The complete bibliographic information, a link to the original publication on https://www.researchprotocols.org, as well as this copyright and license information must be included.
Current standards of psychiatric assessment and diagnostic evaluation rely primarily on the clinical subjective interpretation of a patient’s outward manifestations of their internal state. While psychometric tools can help to evaluate these behaviors more systematically, the tools still rely on the clinician’s interpretation of what are frequently nuanced speech and behavior patterns. With advances in computing power, increased availability of clinical data, and improving resolution of recording and sensor hardware (including acoustic, video, accelerometer, infrared, and other modalities), researchers have begun to demonstrate the feasibility of cutting-edge technologies in aiding the assessment of psychiatric disorders.
We present a research protocol that utilizes facial expression, eye gaze, voice and speech, locomotor, heart rate, and electroencephalography monitoring to assess schizophrenia symptoms and to distinguish patients with schizophrenia from those with other psychiatric disorders and control subjects.
We plan to recruit three outpatient groups: (1) 50 patients with schizophrenia, (2) 50 patients with unipolar major depressive disorder, and (3) 50 individuals with no psychiatric history. Using an internally developed semistructured interview, psychometrically validated clinical outcome measures, and a multimodal sensing system utilizing video, acoustic, actigraphic, heart rate, and electroencephalographic sensors, we aim to evaluate the system’s capacity in classifying subjects (schizophrenia, depression, or control), to evaluate the system’s sensitivity to within-group symptom severity, and to determine if such a system can further classify variations in disorder subtypes.
Data collection began in July 2020 and is expected to continue through December 2022.
If successful, this study will help advance current progress in developing state-of-the-art technology to aid clinical psychiatric assessment and treatment. If our findings suggest that these technologies are capable of resolving diagnoses and symptoms to the level of current psychometric testing and clinician judgment, we would be among the first to develop a system that can eventually be used by clinicians to more objectively diagnose and assess schizophrenia and depression with the possibility of less risk of bias. Such a tool has the potential to improve accessibility to care; to aid clinicians in objectively evaluating diagnoses, severity of symptoms, and treatment efficacy through time; and to reduce treatment-related morbidity.
DERR1-10.2196/36417
Mental disorders represent the second most common cause of years of life lived with disability worldwide [
Current evidence suggests that early identification and optimized treatment of depression and schizophrenia improves outcomes and reduces illness progression [
At present, depression and schizophrenia are diagnosed through the subjective clinical evaluation of signs and symptoms established by the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) [
The development of easy-to-use, objective clinical tools to aid clinicians in the diagnosis and evaluation of mental illness has the potential to limit the impact of these illnesses on patients and on society. The cost of powerful computing hardware has fallen, and improvements in the field of computer science and health care suggest that various types of computer sensors and recording hardware could be used to aid in the assessment and diagnostic prediction of mental illness [
Few studies, however, have been conducted to assess variations in patients with depression or schizophrenia and control subjects with video technology, although some studies have observed statistically significant differences between schizophrenia and control groups in certain combinations of facial action clustering [
Past research groups have also found that electroencephalographic (EEG) recordings can accurately classify schizophrenia. For instance, one group found that EEG recordings could classify schizophrenia with 91.5% to 93.9% accuracy [
Combining biomarkers may be a promising approach to improving diagnostic precision and treatment options [
We present a research protocol to assess the efficacy of a multimodal sensor system combining video, audio, actigraphy, noninvasive EEG, and heart rate monitoring to assess differences in individuals with schizophrenia and unipolar major depressive disorder and controls (patients with no history of mental illness in the preceding year). The data collected from the subjects will be utilized to develop a machine learning model to evaluate the presence, severity, and possible subtypes of schizophrenia and depression. We seek to assess the performance of the developed model and its ability to differentiate schizophrenia from depression and control groups based on high-value input features from different modalities. We hypothesize that the predictive model will discriminate between schizophrenia, depression, and control groups. For the depression group only, we will evaluate within-group preprocessed outputs of the sensor data to discriminate depression severity scores using the Patient Health Questionnaire-9 (PHQ-9) [
The research assessments will be conducted at the Grady Outpatient Behavioral Health Clinic, which is part of the broader Grady Health System, a metropolitan safety-net hospital in Atlanta, Georgia. The study seeks to recruit 50 individuals with schizophrenia, 50 individuals with unipolar major depressive disorder, and 50 controls without a prior history of mental illness.
The 3 groups of participants (aged 18 years or older) will include outpatients with a DSM-5 diagnosis of schizophrenia or a DSM-5 diagnosis of major depressive disorder and individuals with no mental health diagnosis (as controls). All diagnoses will be confirmed by the Mini International Neuropsychiatric Interview (MINI) [
All participant groups will be recruited (1) from a database of interested research participants from prior research studies, from clinician referrals, and from respondents to a general research interest form provided in the outpatient waiting rooms of the Grady Outpatient Behavioral Health Clinic; (2) through a regional digital recruitment strategy, part of ResearchMatch (of which Emory University is an institutional participant), that will target the metro Atlanta area for in-person interviews and the entire United States for remote or telehealth interviews; and (3) through a digital recruitment strategy based on Amazon Mechanical Turk (Amazon Inc) in which individuals that respond to a short questionnaire will be able to reach out to the study team via email if they are interested in participating in the study.
The schizophrenia and control groups that participate in person will be interviewed at 2 time points (the initial encounter and a second encounter, 3 to 6 months after the initial encounter) for all measures, as indicated below.
Initial interview data collection process. MQOL Part A: McGill Quality of Life Questionnaire-Revised Part A; CRDPSS: Clinician-Rated Dimensions of Psychosis Symptom Severity in Patients with Schizophrenia; PHQ-9: Patient Health Questionnaire-9; GAD-7: General Anxiety Disorder-7; GSQ: General Symptom Questionnaire; MUQ: Medication Utilization Questionnaire; EEG: electroencephalogram.
All assessments will take place over Zoom (Zoom Inc), a secure, encrypted, telehealth platform that is compliant with the Health Insurance Portability and Accountability Act of 1996 (HIPAA). In-person assessments will consist of an interviewer conducting an interview via Zoom on a computer, with the participant on a different computer located in an adjacent room in the research suite. For remote assessments, the interviewer will be physically located at the research suite and the participant will be on their own computer at the location of their choice (usually their home). Due to the COVID-19 pandemic, assessments will be conducted in person when local case counts are low and the research team is safely able to complete interviews with social distancing measures in place and appropriate personal protective equipment (PPE).
In-person participants will be interviewed twice, 3 to 6 months apart, and offered a US $30 honorarium at the completion of study visit 1 and a US $30 honorarium at the completion of study visit 2. Between visits the participants may complete a battery of assessments offered every 2 weeks on their devices, for which they will be compensated US $5 for each battery. Remote participants will be interviewed once and offered a US $30 honorarium at the completion of the study visit. Individuals who participate in the study but are unable to complete it will be offered a US $10 honorarium. Those subjects screened and determined to not meet the eligibility criteria will not be offered compensation.
The study was approved by the Emory University Institutional Review Board in November 2018 (IRB00105142) and the Grady Research Oversight Committee in January 2019 (00-105142).
The schedule of assessments is found in
Schedule of assessments for in-person visits.
Assessments | Visit 1 | Biweeklya | Visit 2 (3-6 months)a |
Informed consent | ✓ |
|
|
Semistructured interview | ✓ |
|
✓ |
Demographic assessment | ✓ |
|
|
Sociodemographic assessment | ✓ |
|
|
Mini International Neuropsychiatric Interview | ✓ |
|
|
Positive and Negative Syndrome Scaleb | ✓ |
|
✓ |
McGill Quality of Life Questionnaire-Revised Part A | ✓ |
|
✓ |
Clinician-Rated Dimensions of Psychosis Symptom Severity in Patients with Schizophreniab | ✓ |
|
✓ |
Clinical Global Impression-Severity | ✓ |
|
✓ |
Clinical Global Impression-Improvement |
|
|
✓ |
Cambridge Gambling Taska,c | ✓ |
|
✓ |
Patient Health Questionnaire-9d | ✓ | ✓ | ✓ |
Generalized Anxiety Disorder-7d | ✓ | ✓ | ✓ |
General Symptom Questionnairea,d | ✓ | ✓ | ✓ |
Medication Utilization Questionnairea,d | ✓ | ✓ | ✓ |
Facial expressivity and eye gaze | ✓ |
|
✓ |
Voice and speech data collection | ✓ |
|
✓ |
Electroencephalographya,c | ✓ |
|
✓ |
Actigraphy and heart rate (continuous)a,c | ✓ | ✓ | ✓ |
aDepression group excluded.
bOnly administered for schizophrenia group.
cOnly for in-person visits (ie, excluding subjects who were recruited digitally).
dCompleted on participant devices for the schizophrenia and control groups only.
Schedule of assessments for remote visits.
Assessments | Visit 1 |
Informed consent | ✓ |
Semistructured interview | ✓ |
Demographic assessment | ✓ |
Sociodemographic assessment | ✓ |
Mini International Neuropsychiatric Interview | ✓ |
Positive and Negative Syndrome Scalea | ✓ |
McGill Quality of Life Questionnaire-Revised Part A | ✓ |
Clinician-Rated Dimensions of Psychosis Symptom Severity in Patients with Schizophreniaa | ✓ |
Clinical Global Impression-Severity | ✓ |
Patient Health Questionnaire-9 | ✓ |
General Anxiety Disorder-7 | ✓ |
Facial expressivity and eye gaze | ✓ |
Voice and speech data collection | ✓ |
aOnly administered for schizophrenia group.
The initial assessment will include demographic and clinical information that will be collected via self-reporting. Clinical information will include information about self-reported psychiatric and medical comorbidities and current medications. The evaluations will include a clinical record review; a battery of psychometric tests, including a semistructured group developed interview that will include the Thematic Apperception Test (TAT) [
Depression subjects will only be evaluated once. The evaluations will include a clinical record review; a battery of psychometric tests, including a semistructured group developed interview; and demographic and sociodemographic questionnaires, including the MINI, MQOL Part A, CGI-S, PHQ-9, GAD-7, and CGT (in-person only). The evaluations will also include audiovisual recordings, and in-person evaluations will include pulse oximetry and electroencephalographic recordings.
For control subjects, monitoring will occur in one of two ways: the subjects will be evaluated at the initial encounter and 3 months after initial encounter in person, or they will be evaluated remotely once via Zoom. Evaluation will include clinical record review; a battery of psychometric tests, including a semistructured group developed interview; and demographic and sociodemographic questionnaires, including the MINI, MQOL Part A, CGI-S, PHQ-9, GAD-7, and CGT (in-person only). Audiovisual, pulse oximetry (in-person only), and electroencephalographic recording (in-person only) will be performed for the entirety of the interview and actigraphy and heart rate recording will be assessed for the 3-month period between interviews for applicable patients. All evaluations will be conducted in person during times when visits can be completed safely with social distancing measures and appropriate PPE use; otherwise, they will be conducted remotely via Zoom.
The data will undergo a descriptive statistical analysis and assessment of classifier and regressor capacity to predict diagnostic and psychometric scores, respectively. Descriptive statistical assessment of the collected data will be conducted. Diagnostic and psychometric score prediction will be conducted utilizing extracted features from the collected data. Sample data descriptions including demographics, clinical diagnosis, and psychometric scores will be analyzed. Cross-group variations in psychometric scores will be reported and analyzed via inferential statistical methods (2-way
Recordings of the interviews will be captured using Zoom at various resolutions, depending on the interviewee’s camera and network conditions. The participants will be asked to sit close enough to the camera that the interviewer can clearly see their face, with a minimum of 10% of the participant’s face filling the viewable screen. Three different simultaneous audio files will be generated, corresponding to the bidirectional conversation, the interviewee only, and the interviewer only. This separation will enable the research team to isolate the individual speaker, and also to evaluate how the system might work in a single-microphone environment.
Video analysis of the full face involves both the dynamics of facial action units [
The audio will be analyzed in several ways. First, we will compare the outputs of two HIPAA-compliant commercial services for transcription of the audio recordings: the Otter.ai (Otter.ai Inc) service through Zoom’s transcription service and Amazon Transcribe (Amazon Inc). At the time of writing, no information on the performance of Otter.ai is available publicly. Amazon Transcribe has been shown to have word error rates of 10% to 20%, with small biases for gender and medical condition [
EEG recordings will be made using noninvasive scalp leads attached during the CGT [
As described above and shown in
In addition to the feature extraction and analyses within each modality, we aim to investigate the interaction between features from different modalities and fuse them together to build a multimodal machine learning model to estimate the presence and the severity of depression and schizophrenia. For interaction, we will check the Pearson correlations between features and between the predictions made with features from each single modality. Additionally, we will use dynamic time warping and general dynamic time warping [
Data collection began in July 2020 and is expected to continue through December 2022. We present some preliminary data here, highlighting a comparison of the fluctuation of emotions using computer vision.
Measures of facial expression when comparing a patient with schizophrenia (left) to a control subject (right).
This study aims to advance current progress in the use of state-of-the-art technology for assisting clinical psychiatric assessments by using a novel multimodal sensing system.
The in-person recruitment site, Grady Health System, allows access to a racially and ethnically diverse group of potential participants. Furthermore, the study methods will allow for data collection to continue irrespective of local case fluctuations in COVID-19 infection rates. The study team will be able to recruit and evaluate participants in person or pivot to virtual recruitment and data collection using a national recruitment database.
This study has a number of limitations. Larger data sets will be needed and the model will need to be prospectively validated. Additionally, remote participation is only available to those who have access to the internet, a video camera, and a microphone. The quality of these recordings is subject to variation based upon the devices accessible to the participants and the areas where they are located, which may impact the quality of the recordings and limit recruitment efforts. The protocol has provided an option for subjects to participate in person, which will allow the study team to better standardize the technology used. However, there may be differences between subjects who participate in their homes and those who travel to the research site to complete the interviews. The research site will provide a private room to complete the Zoom interview and will mimic, as much as possible, the environment of those participating remotely from their homes. Furthermore, the interviewers may be at risk of bias in their ratings, and their physical presence or absence may affect patient responses. All assessments will be recorded, which will allow for the verification of all ratings by the interviewers and the study psychiatrist. The interviewers will also follow a script to maintain as much between-subject consistency in the interviews as possible.
If our findings suggest that these technologies are capable of resolving diagnoses and revealing symptoms at the same level as current psychometric testing and clinician judgment, we will be among the first in the world to have developed a clinical decision support system that can be used by expert and nonexpert clinicians for objectively diagnosing and tracking schizophrenia and depression over time. Such a tool would improve accessibility to care; aid clinicians in objectively evaluating diagnoses, severity of symptoms, and treatment efficacy; reduce treatment-related morbidity; and potentially empower patients to gain a deeper insight into their day-to-day symptoms and stressors to guide self-management.
Supplemental Materials.
Clinical Global Impression-Improvement
Clinical Global Impression-Severity
Cambridge Gambling Task
Clinician-Rated Dimensions of Psychosis Symptom Severity in Patients with Schizophrenia
Diagnostic and Statistical Manual of Mental Disorders
electroencephalogram
General Anxiety Disorder-7
General Symptom Questionnaire
Health Insurance Portability and Accountability Act of 1996
Mini International Neuropsychiatric Interview
McGill Quality of Life Questionnaire-Revised Part A
Medication Utilization Questionnaire
Positive and Negative Syndrome Scale
Patient Health Questionnaire-9
Personal Protective Equipment
Thematic Apperception Test
Research reported in this publication was supported in part by Imagine, Innovate and Impact Funds from the Emory School of Medicine and through a Georgia Clinical & Translational Science Alliance National Institutes of Health award (UL1-TR002378)
The data sets generated during or analyzed during the current study will not be publicly available because they contain protected health information, but deidentified subsets of the data will be made available from the corresponding author on reasonable request, if the resources to perform and validate the deidentification process are available.
ROC received institutional research funding from Alkermes, Roche, and Otsuka and is a consultant to Saladax Biomedical and the American Psychiatric Association. The remaining authors declare no conflicts of interest.