A Bayesian Network Decision Support Tool for Low Back Pain Using a RAND Appropriateness Procedure: Proposal and Internal Pilot Study

doi:10.2196/21804

Proposal

¹Sport and Exercise Medicine, Queen Mary University of London, London, United Kingdom

²Electronics, Engineering and Computer Science, Queen Mary University of London, London, United Kingdom

³Barts Health NHS Trust, London, United Kingdom

⁴Graduate School of Informatics, Middle East Technical University, Ankara, Turkey

⁵Department of Industrial Engineering, Hacettepe University, Ankara, Turkey

Corresponding Author:

Adele Hill, MSc

Sport and Exercise Medicine

Queen Mary University of London

Mile End Road

London, E1 4NS

United Kingdom

Phone: 44 7971495806

Email: a.hill@smd18.qmul.ac.uk

Background: Low back pain (LBP) is an increasingly burdensome condition for patients and health professionals alike, with consistent demonstration of increasing persistent pain and disability. Previous decision support tools for LBP management have focused on a subset of factors owing to time constraints and ease of use for the clinician. With the explosion of interest in machine learning tools and the commitment from Western governments to introduce this technology, there are opportunities to develop intelligent decision support tools. We will do this for LBP using a Bayesian network, which will entail constructing a clinical reasoning model elicited from experts.

Objective: This paper proposes a method for conducting a modified RAND appropriateness procedure to elicit the knowledge required to construct a Bayesian network from a group of domain experts in LBP, and reports the lessons learned from the internal pilot of the procedure.

Methods: We propose to recruit expert clinicians with a special interest in LBP from across a range of medical specialties, such as orthopedics, rheumatology, and sports medicine. The procedure will consist of four stages. Stage 1 is an online elicitation of variables to be considered by the model, followed by a face-to-face workshop. Stage 2 is an online elicitation of the structure of the model, followed by a face-to-face workshop. Stage 3 consists of an online phase to elicit probabilities to populate the Bayesian network. Stage 4 is a rudimentary validation of the Bayesian network.

Results: Ethical approval has been obtained from the Research Ethics Committee at Queen Mary University of London. An internal pilot of the procedure has been run with clinical colleagues from the research team. This showed that an alternating process of three remote activities and two in-person meetings was required to complete the elicitation without overburdening participants. Lessons learned have included the need for a bespoke online elicitation tool to run between face-to-face meetings and for careful operational definition of descriptive terms, even if widely clinically used. Further, tools are required to remotely deliver training about self-identification of various forms of cognitive bias and explain the underlying principles of a Bayesian network. The use of the internal pilot was recognized as being a methodological necessity.

Conclusions: We have proposed a method to construct Bayesian networks that are representative of expert clinical reasoning for a musculoskeletal condition in this case. We have tested the method with an internal pilot to refine the process prior to deployment, which indicates the process can be successful. The internal pilot has also revealed the software support requirements for the elicitation process to model clinical reasoning for a range of conditions.

International Registered Report Identifier (IRRID): DERR1-10.2196/21804

JMIR Res Protoc 2021;10(1):e21804

doi:10.2196/21804

Keywords

back pain; decision making; Bayesian methods; consensus

Back pain is a prime example of the general increase in long-term musculoskeletal conditions. It has been deemed a leading cause of years lived with disability worldwide, and health care costs for treating back pain are escalating [1]. Some low back pain (LBP) cases are associated with injuries that will self-resolve, but there are a considerable number of people who live with disabling LBP. It is difficult to predict who will have a favorable outcome and who will not. Meta-analyses have consistently shown that treatment outcomes for musculoskeletal conditions are partial [2]. Studies have shown that clinicians may not recognize complexity when assessing these patients [3], and efforts made to improve diagnosis via pattern recognition or analytical approaches [4] have limited success. An artificial intelligence (AI) decision support tool that overcomes one or more of these issues would greatly improve the referral of LBP patients to the correct treatment care pathway, and the approach would be generalizable to other musculoskeletal conditions.

The potential of AI systems to address some of these issues has been recognized by The Digital Framework for Allied Health Professionals in the United Kingdom [5], which highlights the priority to develop digitally mature systems, improve outcomes, and limit variation in service delivery. Similarly, in 2019, Her Majesty’s Government announced a £250 million investment in AI development in the UK National Health Service [6] for such purposes. However, many of these techniques are data driven and thus require large error-free data sets that must, crucially, also contain outcome results. Unfortunately, while there is an abundance of data within the musculoskeletal health care system, access is severely restricted and most of the data are unrefined, often being free text or hand written without the necessary outcomes. Furthermore, studies have found that clinicians are skeptical about the ability of AI to perform at the human level [7], and patient and public involvement forums have highlighted a mistrust of black box systems. Successful adoption of AI therefore requires combining clinical evidence, patient data, and expert opinion to mirror the clinical reasoning process and provide transparency regarding how predictions have been reached.

An LBP decision support tool to guide the treatment of patients has been developed with an expert-opinion Delphi-consensus approach in the Netherlands [8], with some success, although there was a suggestion that the results could be further improved with AI and machine learning techniques [9]. The previous authors particularly highlight the limitations of traditional multivariate regression models and suggest that a Bayesian network (BN) would take into account more clinical aspects of assessment. A BN is a probabilistic model that offers an ideal approach for modeling clinical reasoning as it is capable of combining expert opinion with available evidence to provide probabilistic outcomes for a given scenario [10]. These outcomes can be continuously updated to ensure the most recent clinical knowledge is being used [11] and provide explanations of the predictions arrived at [12].

In order to support clinicians in their decision making, we envisage a BN that can predict a patient’s response to a given treatment or course of action when they first present with LBP (ie, in a primary care setting). In order to do this, we must first characterize the types of LBP possible, and the associated risk factors, signs, and symptoms. We propose eliciting a BN that predicts the probability of a patient having a certain LBP-related presentation and, subsequently, how likely they are to improve in response to a course of action. To be clear, this focus on characterization rather than diagnosis was deliberately taken owing to the lack of accurate diagnostic labels for the majority of people with LBP [13], but the BN was designed to identify patients with serious diagnoses, such as nerve root pain and cauda equina syndrome, where these can be identified. For reasons mentioned above, the clinical knowledge base of our BN must be elicited from experts [14,15]. We also believe that it is important not to constrain the type of LBP considered but use expert opinion to guide our focus. In this paper, we outline a protocol for a BN elicitation process that aims to balance the construction of a complex expert-driven model for the treatment of LBP while minimizing the elicitation burden placed on participants. We also describe a pilot study of this process, which highlights the subsequent results that informed the main protocol, discuss the overall methodology, and consider future implementation of the AI tool.

Design

A BN is comprised of the following three components: (1) variables, quantities of interest, such as age, BMI, and the presence/absence of a condition; (2) structure, the dependence of variables on each other, for instance, the condition may be more prevalent in the elderly but the BMI makes no difference; (3) probabilities, quantification of the structure, for example, the probability of having a condition given the person is of a certain age.

We therefore divide the elicitation process into three distinct stages related to those components plus a final stage intended as a rudimentary validation of the output from the process (Figure 1).

The overall elicitation method will be a combined Delphi and RAND appropriateness procedure [10]. Participants will first answer questions individually online before contentious results are discussed together in a face-to-face workshop. We have chosen this strategy because related methods have been used with success in previous musculoskeletal studies [8,16-18], and it combines individual tasks and group discussions to explore biases [19]. We acknowledge that further rounds of discussion may help with consensus, but time constraints render this infeasible.

Ethical approval was sought from the Queen Mary University of London, and approval was granted by the Queen Mary Ethics of Research Committee (reference: QMREC2018/48027).

Participants and Recruitment

Clinicians will be recruited via clinical networks, word of mouth, and social media. There is no definitive number of participants recommended for this style of study. However, for face-to-face workshops, an optimum number of attendees has been suggested to lie between seven and 15 [19]. This number of participants is thought to represent a range of views and promote discussion, without becoming too onerous to manage and facilitate in a workshop setting. We will include extended scope/advanced practitioner physiotherapists (covering orthopedics, rheumatology, neurosurgery, and general musculoskeletal triage services), general practitioners with a special interest in musculoskeletal conditions, and sports medicine doctors. We will exclude clinicians who do not regularly manage patients with LBP or do not hold current professional registration. Participants will be provided with information about the research and be asked to provide informed written consent prior to taking part.

Data Management

Participants will not be identified in any reports or publications of the results. Any information held by the project team in relation to participants will be kept confidential and managed in accordance with the Data Protection Act, the General Data Protection Regulation (GDPR), the UK Policy Framework for Health and Social Care Research, and the conditions of the Research Ethics Committee.

Elicitation Process

Stage 1: Variable Elicitation

Ad hoc building of the BN causal structure is prone to biases and modeling error, which can lead to overly complex models that are not repeatable in later studies [20]. To mitigate this potential issue and to form concepts familiar to the clinical participants, we have constructed a conceptual model where potentially relevant factors in the assessment and treatment of LBP will be grouped into either “risk factors,” “signs and symptoms,” or “judgement factors.” A list of potential treatments will also be elicited. The categories have been connected in the fashion as shown in Figure 2 to form a “skeleton causal diagram,” which approximates the clinical reasoning process. This is necessary for the model functionality and makes the elicitation process recognizable for participants.

Conventional appropriateness scoring usually relies on a Likert scale [19]; however, it is anticipated that participants will have considerable time constraints and the number of potential variables will render the task unfeasible. We will therefore use a placement and ranking procedure. At the start, participants will see many example variables chosen by the research team (Figure 3) (reasons behind this decision are presented in the Results section). They can also create new variables. They will then be asked to place them into one of the three categories of their choosing and order the three associated lists so the variables they deem most important for the assessment of patients with LBP appear at the top. Unfortunately, this setup does create the risk of anchoring biases, but we hope this will be mitigated when the variables are voted on in the workshop.

Figure 2. Skeleton causal diagram involving the categories of variables to be considered.

Figure 3. Mockup of the online interface for stage 1 of the process.

To aid their understanding, participants will be given explanatory information about what constitutes risk factors, signs and symptoms, and judgement factors (Table 1), as well as many example variables from the prepopulated lists. Additionally, the meaning of each variable will be clarified by giving a question that could be answered to find out the value of the variable (such as “What is the patient’s anxiety level?”) and indication of the mode and units of measurement of the variable to ensure consistent responses.

Table 1. Categories of variables in the Bayesian network.

Category name^a	Definition
Risk factor	These are pre-existing factors that might raise (or lower) the chances of the patient having a certain judgement factor or characterization of low back pain. They may also have an effect on the efficacy of the treatment but, crucially, are factors (like age and demographics) that cannot be changed by the treatment.
Signs and symptoms	These are factors that arise as a consequence of the patient having certain conditions or judgement factors. In contrast to risk factors, signs and symptoms are factors that may change or respond to treatment.
Judgement factors	These are clinical reasoning factors or the characterization of the patient’s low back pain. They are the factors that are likely to be unmeasurable with a Patient Reported Outcome Measure or test, but which help a clinician reason about prognosis or likelihood of recovery.

^aExplanatory information about each type of variable will be made available to participants in order to inform the categorization of factors from the prepopulated lists and those created in the elicitation.

Each variable will be included in a particular category if more than 80% of participants allocate it. From those participants, we will take the median normalized rank (0 if placed at the bottom of the list and 1 at the top) to give an indication of preference for inclusion, which will be further adjusted by an interpercentile range (IPR) measure to measure consensus (more consensus will result in less adjustment of the median value). The top-placed variables will be selected for the face-to-face stage. We anticipate around 50 variables can be taken forward without overburdening participants, but we reserve some clinical judgement to decide on the exact number. Suggested variables from the online elicitation will be consolidated by the working team, and those judged by the clinical team to be duplicates will be removed.

After the online session, a face-to-face workshop will be held, in which participants will use the same tool. Misconceptions or differences in understanding will be clarified by the research team. A member of the research team will facilitate discussion of the variables that do not have consensus via use of open-ended questions. The face-to-face workshops will be video recorded to capture the discussions and enable postworkshop checking and clarification. The participants will then be given the opportunity to categorize and rank the variables for a second time, on the basis of the discussion. We anticipate that the mixture of not allowing new variables, the refined interface, and having numerous variables fixed in categories will reduce the elicitation burden while still maintaining an opportunity for group discussions, according to the conventional RAND process.

Stage 2: Structure Elicitation

The second stage seeks to connect individual variables linking risk factors to judgement factors, and judgement factors to signs and symptoms. This will be achieved by grids (Figure 4). In the online elicitation, participants will be presented with a blank grid and asked to give each relationship a strength score between 0 and 3 in the appropriate cell. The definitions of the scores are as follows: 0, no relationship; 1, X sometimes has a small effect on Y; 2, X sometimes has a large effect on Y or X always has a small effect on Y; and 3, X always has a large effect on Y, and they reflect the nuances of the BN probabilities.

We will again use the median and an 80% IPR as descriptive statistics. To make the workshop manageable (but still allow for discussion), we will fix the median score of those variables with an IPR of 1 or less (ie, the cells in the grids cannot be edited), and all others will be edited again by the participants following discussions. A subsequent overall score will be taken from the workshop for each relationship in a similar manner to the overall score in stage 1. We will then only keep the strongest relationships to form the structure of the BN, and any variables not connected to others will be discarded.

Figure 4. An example grid for stage 2. Relationship strengths are placed in the cells. The definitions of the scores are as follows: 0, no relationship; 1, X sometimes has a small effect on Y; 2, X sometimes has a large effect on Y or X always has a small effect on Y; and 3, X always has a large effect on Y.

Stage 3: Probability Elicitation

The third stage seeks to endow the BN structure with quantitative probabilities for making predictions. Crucially, although cognitive biases appear in all stages, the probability elicitation is particularly susceptible because of the quantitative nature and known issues with probability estimation [21,22]. Cognitive bias training will therefore be given prior to starting the third stage. This will include questions from unrelated fields that are susceptible to biases likely to arise within the elicitation, such as recall bias and confusion of inverse probabilities [21,22]. A brief explanation of how the bias may lead to systematic deviations in the estimates will be presented for each question (Figure 5). Moreover, we will word questions so that participants are aware of the target population and are thinking about frequency estimates, rather than individual patients (Figure 6).

Figure 5. Examples of questions to be given in the cognitive bias training.

The variables in the BN can be binary (eg, true and false), labeled, that is, nominal with states without any order (eg, muscle, tissue, and tendon), or ranked, that is, ordinal with states with increasing or decreasing order (eg, high, medium, and low). Additionally, some variables (eg, age) will be independent of others (known as prior variables), whereas others will be dependent (eg, the prevalence of a condition may be influenced by the age of a patient).

For binary and labeled variables, sliders will be used to allow participants to assign probabilities individually to each state. For example (Figure 6), the question “What proportion of people work in the described type of job?” has a series of nominal answers, which can be represented with individual percentages adding up to 100. In contrast, ranked variables have a smooth distribution, so participants will be asked to estimate the most likely value and the extent of the variation around that value [23].

Figure 6. Example of direct probability elicitation for the question “What proportion of people work in the described type of job?”.

For independent (prior) variables, a simple question similar to that in Figure 6 will suffice, whereas for conditional variables, the states of related variables will be needed, for example, “What is the probability of a patient having a fracture given that they have an age of 80+ and experienced trauma?”

For binary/labeled variables, the answers from the probability elicitation will be combined using logistic regression and generalized noisy OR [23] techniques to generate the probability tables for each variables. For ranked variables, we will use the answers to estimate a truncated normal distribution [24]. This reduces the number of parameters compared with other methods [25,26] and so has the advantage of avoiding overfitting, but certain nuances of the relationships are missed. Outlying answers will be identified using an IPR-based metric and removed so as not to bias the regression approach.

Stage 4: Validation

A subset of clinical experts will be asked to give their judgement regarding the decision-making process for a number of given scenarios, which will then be compared against the model output, otherwise known as “face validity” [27]. Further validation will likely be required to establish the efficacy of the BN in practice; however, this is outside the scope of the study.

Internal Pilot

A much reduced version of the above protocol has already been conducted in an internal pilot study with three clinical colleagues (AH, CKJ, and DM) from Queen Mary University of London, who regularly treat patients with LBP. This was to test whether a suitable model could feasibly be constructed within a framework where the elicitation burden was not too high.

They completed all three stages of the elicitation process. During and after each stage of the process, a discussion took place regarding overall impressions of the stage in order to learn lessons and refine the process. These discussions captured clinical perspectives on the design of the questions and elicitation, feedback on what may or may not be acceptable to clinical participants, and appropriate redesign to ensure that the requirements for the modeling process were still being met. The differences in methods are discussed below, with lessons learned reserved for the subsequent Results section.