This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Research Protocols, is properly cited. The complete bibliographic information, a link to the original publication on http://www.researchprotocols.org, as well as this copyright and license information must be included.
Conversational agents (also known as chatbots) have evolved in recent decades to become multimodal, multifunctional platforms with potential to automate a diverse range of health-related activities supporting the general public, patients, and physicians. Multiple studies have reported the development of these agents, and recent systematic reviews have described the scope of use of conversational agents in health care. However, there is scarce research on the effectiveness of these systems; thus, their viability and applicability are unclear.
The objective of this systematic review is to assess the effectiveness of conversational agents in health care and to identify limitations, adverse events, and areas for future investigation of these agents.
The Preferred Reporting Items for Systematic Reviews and Meta-Analyses Protocols will be used to structure this protocol. The focus of the systematic review is guided by a population, intervention, comparator, and outcome framework. A systematic search of the PubMed (Medline), EMBASE, CINAHL, and Web of Science databases will be conducted. Two authors will independently screen the titles and abstracts of the identified references and select studies according to the eligibility criteria. Any discrepancies will then be discussed and resolved. Two reviewers will independently extract and validate data from the included studies into a standardized form and conduct quality appraisal.
As of January 2020, we have begun a preliminary literature search and piloting of the study selection process.
This systematic review aims to clarify the effectiveness, limitations, and future applications of conversational agents in health care. Our findings may be useful to inform the future development of conversational agents and promote the personalization of patient care.
PRR1-10.2196/16934
Digital technologies are driving transformation in the health sector and show promise in contributing to the resolution of major challenges facing health care systems worldwide, including the provision of personalized medicine, prevention of chronic conditions, care of an increasingly elderly population, and provision of health care to hard-to-reach populations. Intelligent digital platforms with a conversational user interface (ie, conversational agents) constitute a representative technology that has been investigated in these contexts [
The first technology of this kind emerged in 1966, constituting a text-based platform that mimicked a psychotherapist, “ELIZA”, using prerecorded answers selected based upon user input [
Despite a wealth of literature on conversational agents and their application to health care, the majority of reviews on the topic focus on a specific therapy area or function, whereas few reviews have comprehensively examined the overall scope and progress in the field [
There is a clear need to understand the effectiveness of current conversational agents to achieve their intended outcome and facilitate the user experience with these agents. This information can then be used to determine the direction that these technologies are most likely to follow in health care and identify the functions or populations that will derive the most benefit from these resources. Furthermore, these conversational agents have potential to alleviate current health care resource burdens by automating functions that previously required face-to-face interaction; thus, it is important to identify whether this is an observed outcome of the use of these technologies.
Thus, the aim of this systematic review is to evaluate the effectiveness and implications of conversational agents in health care. This review will focus on three main questions. First, are the intended health-related outcomes of current conversational agents being fulfilled, and does the effectiveness vary depending on the population or function of the agent? Second, what are the capabilities of health-focused conversational agents, and how might the availability of these agents impact the use of health care resources? Finally, what are the current limitations and gaps in the utility of conversational agents in the health care field that could inform future research?
We will use the Population, Intervention, Comparator, Outcomes (PICO) template and the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Protocols (PRISMA-P) guidelines [
The following PICO framework is based on our three main research questions stated above.
Population: The population will include the general population, patients, students, and health care professionals of any age who have interacted with a conversational agent for any health-related purpose.
Intervention: Interaction with a conversational agent that utilizes natural language processing via any interactive device.
Comparator: No comparator is required for the studies to be included in this systematic review.
Outcomes: The main health outcomes assessed will be those related to improvements in clinical, behavioral, and psychosocial parameters, along with health literacy, shared decision making, practical improvement in health care provision, or user-based evaluation outcomes, including acceptability, usability, engagement, and satisfaction.
We will search the following databases: PubMed (Medline), Embase, CINAHL, ACM Digital, and Web of Science. Key terms relating to conversational agents were extracted from an initial review of the literature, and specific search terms and strings were chosen in consultation with a medical librarian. Search terms will include MeSH terms and keywords related to conversational agents, natural language processing, health care, and evaluation. A draft of the search terms that will be used in this review are grouped into four themes in
Search terms.
Category | MeSHa | Keywords (title, abstract) |
Conversational agent | Speech recognition software | “Conversational agent*” OR “embodied conversational agent” OR chatbot* OR avatar OR dialog* system OR “virtual assistan*” OR “virtual nurs*” OR virtual patient OR virtual coach* OR intelligent assistan* OR “relation* agent” OR “assistance technol*” OR “voice-based interfac*” OR “virtual coach” OR speech recognition software OR voice recognition software |
Health | Healthcare facilities OR health services OR health communication OR health services accessibility OR delivery of healthcare OR health behavior OR simulation training OR health education OR health literacy OR patient acceptance of healthcare OR health knowledge, attitudes, practice OR asthma OR sex education OR exp aged OR exp counseling OR smoking cessation OR exp diet OR exp education, medical OR exp substance-related disorder OR social skills OR autism spectrum disorder OR patient education as topic OR exercise OR diabetes mellitus OR cardiovascular disease OR pulmonary disease, chronic obstructive | Health OR healthcare OR “health behavio?r” OR hospital OR exercis* OR diet OR healthcare delivery or healthcare access or simulation training or education or elderly care or sex* education or health literacy or counsel?ing or well-being or smoking cessation or cognitive dysfunction or mental health or social skills or autism spectrum disorder OR diabetes OR heart health OR chronic obstructive pulmonary disease OR COPD OR sun protection OR physical activity |
Evaluation | Outcome assessment |
Feasibil* OR usabil* OR evaluat* OR outcome* OR acceptability OR acceptance OR treatment adherence OR effectiv* OR adoption OR assess* OR user experience* OR efficacy OR utility OR utili?ation OR patient* acceptance OR patient* acceptability OR user* acceptance OR user* acceptability OR user* perce* patient* perce* or user perspective* OR patient* perspective* OR user* view* OR patient* view* OR cost* |
aMeSH: Medical Subject Headings.
The main criteria for inclusion will be interventional studies, including randomized controlled trials and non-randomized studies (eg, non-randomized controlled trials, before-and-after studies, and interrupted time-series studies), and observational studies, including cross-sectional surveys, cohort studies, and qualitative studies. Only studies published in English will be included.
There will be no restriction regarding the year of publication of studies to provide a comprehensive overview of the evolution of conversational agents in health care and the obstacles or successes that these agents have met to inform future research. Studies that evaluated at least one conversational agent will be included. Any population groups, geographical locations, or function intending to influence any aspect of physical or mental health or provide health-related education or training will be included to enable an assessment of the breadth of applications of conversational agents. Studies of conversational agents acquiring information via any input will be included; however, the agent must interact with a human user and adapt the response according to user input.
For an initial search, all study designs will be included; however, the studies included in the final review may be refined based on the initial results. An evaluation of the number of studies that are retrieved from an initial search may result in the exclusion of quasi-experimental trials or other study types.
We will exclude studies that are not published in English and studies of conversational agent interventions that have no health-related function. Studies of conversational agents that utilize the Wizard of Oz technique, whereby a human operator is involved in response generation, or those not utilizing natural language processing will be excluded, as these do not constitute autonomous conversational agents. Conversational agents solely producing proactive communication will also be excluded (eg, reminder texts or electronic messages that cannot be responded to). Studies that report no evaluation of the conversational agent, such as papers discussing solely the design, development, or intention of the agent, will also be excluded.
All articles identified from the database searches will be stored in the citation management software Mendeley (London, UK), which will be used to eliminate any duplicates. Two independent reviewers will screen the titles and abstracts of all studies. Studies that fail to meet the eligibility criteria will be excluded, with any disagreements being discussed until consensus is reached. The full text of the remaining articles will then be examined to determine final eligibility.
A PRISMA flow diagram will be used to record the details of the screening and selection process so that the study can be reproduced.
To extract data from the included studies, we will use a standardized Excel form that includes general information (title, author[s], year, country of study), study characteristics (study design, aim, study population, duration of study), risk of bias or quality assessment (depending on study design), details of the conversational agent (developer, architecture, intended application, design features), outcomes (including but not limited to health outcomes, user perception, usability, feasibility, and resource implications), limitations (including functional and user-reported limitations or potential improvements), and adverse events (such as data breaches, misinformation, or improper use). We will pilot the data extraction form on a small number of studies to develop the final data extraction form. One reviewer will review the full text of all the papers included in the final selection and extract data that will be validated by a second reviewer. Disagreements will be resolved by discussion, and if consensus cannot be reached, a third reviewer will be consulted.
After the final selection of the studies, two independent reviewers will assess the risk of bias of the included studies. If there is disagreement in judgment, the reviewers will discuss before consulting a third reviewer. The Cochrane Collaboration Risk of Bias tool will be used to assess any randomized controlled trials included in the review [
It is unlikely that a meta-analysis will be feasible owing to the anticipated variety of study aims, methods, and reported outcomes. Therefore, we will conduct a descriptive analysis to summarize the extracted data. If possible, we will provide a narrative overview of results by subgroups. The discussion will synthesize the data to describe the effectiveness of current conversational agents as well as comment on the scope of the field; draw conclusions about their feasibility, usability, and acceptability; identify limitations and adverse events; and establish directions for future research and development.
As of January 2020, we have begun a preliminary literature search and piloting of the study selection process.
We will perform a systematic review and do not anticipate any issues with the implementation of the proposed protocol. This systematic review of the literature reporting the evaluation of conversational agents will offer new insight into the viability and progress of conversational agents in health care, and uncover challenges and limitations that have been encountered in order to inform the future development and evolution of these agents. This research will also add to the growing body of evidence and understanding of how health care can be further personalized. Our findings may also identify potential obstacles to the widespread implementation of these technologies, and aid in the future integration of conversational agents in clinical practice.
Medical Subject Headings
Population, Intervention, Comparator, Outcomes
Preferred Reporting Items for Systematic Reviews and Meta-Analyses Protocols
Risk Of Bias in Non-randomized Studies of Interventions
We would like to thank the outreach librarian Liz Callow for her assistance in developing search terms and in reviewing the search strategy. CdC, MMI, CL, MV, and EM are supported by EIT Health (Grant 18654).
CdC and EM conceived the study topic and research questions, and designed the review protocol. CdC prepared the first draft of the protocol with revisions from MI, CL, MV, and EM. AA contributed to the development of the first draft of this protocol.
None declared.