This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Research Protocols, is properly cited. The complete bibliographic information, a link to the original publication on https://www.researchprotocols.org, as well as this copyright and license information must be included.
Scientific hypothesis generation is a critical step in scientific research that determines the direction and impact of any investigation. Despite its vital role, we have limited knowledge of the process itself, thus hindering our ability to address some critical questions.
This study aims to answer the following questions: To what extent can secondary data analytics tools facilitate the generation of scientific hypotheses during clinical research? Are the processes similar in developing clinical diagnoses during clinical practice and developing scientific hypotheses for clinical research projects? Furthermore, this study explores the process of scientific hypothesis generation in the context of clinical research. It was designed to compare the role of VIADS, a visual interactive analysis tool for filtering and summarizing large data sets coded with hierarchical terminologies, and the experience levels of study participants during the scientific hypothesis generation process.
This manuscript introduces a study design. Experienced and inexperienced clinical researchers are being recruited since July 2021 to take part in this 2×2 factorial study, in which all participants use the same data sets during scientific hypothesis–generation sessions and follow predetermined scripts. The clinical researchers are separated into experienced or inexperienced groups based on predetermined criteria and are then randomly assigned into groups that use and do not use VIADS via block randomization. The study sessions, screen activities, and audio recordings of participants are captured. Participants use the think-aloud protocol during the study sessions. After each study session, every participant is given a follow-up survey, with participants using VIADS completing an additional modified System Usability Scale survey. A panel of clinical research experts will assess the scientific hypotheses generated by participants based on predeveloped metrics. All data will be anonymized, transcribed, aggregated, and analyzed.
Data collection for this study began in July 2021. Recruitment uses a brief online survey. The preliminary results showed that study participants can generate a few to over a dozen scientific hypotheses during a 2-hour study session, regardless of whether they used VIADS or other analytics tools. A metric to more accurately, comprehensively, and consistently assess scientific hypotheses within a clinical research context has been developed.
The scientific hypothesis–generation process is an advanced cognitive activity and a complex process. Our results so far show that clinical researchers can quickly generate initial scientific hypotheses based on data sets and prior experience. However, refining these scientific hypotheses is a much more time-consuming activity. To uncover the fundamental mechanisms underlying the generation of scientific hypotheses, we need breakthroughs that can capture thinking processes more precisely.
DERR1-10.2196/39414
A hypothesis is an educated guess or statement about the relationship between 2 or more variables [
Scientific hypothesis generation and scientific hypothesis testing are distinct processes [
Electronic health record systems and related technologies have been widely adopted in both office-based physician practices (86% in 2019) [
Arocha et al and Patel et al [
This makes it essential to understand the scientific hypothesis–generation process in clinical research and to compare this process with generating clinical diagnoses, including the role of experience during the scientific hypothesis–generation process.
This study explores the scientific hypothesis–generation process in clinical research. It investigates whether a secondary data analytics tool and clinician experience influence the scientific hypothesis–generation process. We propose to use direct observations, think-aloud methods with video capture, follow-up inquiries and interview questions, and surveys to capture the participants’ perceptions of the scientific hypothesis–generation process and associated factors. The qualitative data generated will be transcribed, analyzed, and quantified.
We aim to test the following study hypotheses:
Experienced and inexperienced clinical researchers will differ in generating scientific hypotheses guided by secondary data analysis.
Clinical researchers will generate different scientific hypotheses with and without using a secondary data analytics tool.
Researchers’ levels of experience and use of secondary data analytics tools will interact in their scientific hypothesis–generation process.
In this paper, we used the term “research hypothesis” to refer to a statement generated by our research participants, the term “study hypothesis” to refer to the subject of our research study, and the term “scientific hypothesis” to refer to the general term “hypothesis” in research contexts.
This manuscript introduces a study design and uses a mixed methods approach. The study includes assessment of direct observational, utility, and usability study designs. Surveys, interviews, semipredefined tasks, and capturing screen activities are also utilized. The modified Delphi method is also used in the study.
The study has been approved by the Institutional Review Board (IRB) of Clemson University, South Carolina (IRB2020-056).
Experienced and inexperienced clinical researchers, and a panel of clinical research experts, have been recruited for this study. The primary criterion used to distinguish subjects in the 3 groups is the level of their experience in clinical research.
Summary of the criteria for study participants and clinical research expert panel members.
Variable | Inexperienced clinical researchersa | Experienced clinical researchersa | A panel of clinical research experts |
Participation in research hypothesis generation and study design | ≤2 years | Leading role ≥5 and <10 years | Leading role ≥10 years |
Participation in data analysis of study results | ≤2 years | Leading role ≥5 and <10 years | Leading role ≥10 years |
Publications in clinical research | Not required | ≥5 as a leading author, including first, correspondence, or senior author for original studies | ≥10 as a leading author, including at least one article in a high-impact journal in the past 5 years |
Review experience in clinical research for conferences, journals, or grants | Not required | Not required | ≥10 years |
Internet connection | Required | Required | Required |
Microphone | Required | Required | Not required |
Software package for data analysis (eg, Microsoft Excel and R) | Required | Required | Not required |
Tools to facilitate research hypothesis generation, if available | Required | Required | Not required |
aIf a participant has clinical research experience between 2 and 5 years, the decisive factor for the experienced group will be 5 publications for original studies as the leading author.
To recruit participants, invitational emails and flyers have been sent to collaborators, along with other communication means, including mailing lists, such as those of working groups of the American Medical Informatics Association, South Carolina Clinical & Translational Research Institute newsletters, and PRISMA health research newsletters, and Slack channels, such as National COVID Cohort Collaborative communities. All study sessions are conducted remotely via video conference software (Webex, Cisco) and recorded via a commercial software (BB FlashBack, Blueberry Software).
In this study, we have used VIADS as an example of a secondary data analytics tool. VIADS is a visual interactive analytical tool for filtering and summarizing large health data sets coded with hierarchical terminologies [
Meanwhile, we recognize that VIADS can only accept coded clinical data and their associated use frequencies. Currently, VIADS can accept data sets coded using ICD9-CM, ICD10-CM, and MeSH. This can limit the types of scientific hypotheses generated by VIADS.
Selected screenshots of VIADS. (A) Homepage; (B) validation module; (C) dashboard; (D) a graph coded using International Classification of Diseases, 9th Revision-Clinical Modification (ICD9-CM) codes and generated by VIADS.
We have prepared and used the same data sets for this study across different groups in order to reduce the potential biases introduced by different data sets. However, all data sets (ie, input files) used in VIADS are generally prepared by users within specific institutions.
Acceptable formats and data set sizes in VIADS.
Data seta and graph node ID (code) | Usage frequency | |
|
|
|
|
300.00 | 2223 |
|
278.00 | 5567 |
|
… …c | … |
|
|
|
|
O10.01 | 5590 |
|
E11.9 | 50,000 |
|
… …c | … |
|
|
|
|
A0087342 | 16,460 |
|
A0021563 | 4459 |
|
… …c | … |
aAcceptable data set sizes for Web VIADS are as follows: patient counts ≥100 and event counts ≥1000.
bICD9-CM: International Classification of Diseases, 9th Revision-Clinical Modification.
cThere are many more codes in addition to the 2 examples provided.
dICD10-CM: International Classification of Diseases, 10th Revision-Clinical Modification.
eMeSH: Medical Subject Headings.
The usage frequencies of the data sets used in VIADS can be either of the following: patient counts (the number of patients associated with specific ICD codes in the selected database) or event counts (the number of events [ICD codes or MeSH terms] in the selected database).
An ancestor-descendant table, which contains 1 row for each node and each of its distinct descendants, can calculate class counts easier and more accurately without counting the same node multiple times. These implementation details have been discussed in greater detail in prior publications [
A publicly accessible data source [
Metrics have been developed to assess research hypotheses generated during the study sessions. The development process goes through iterative stages (
Development process for metrics to evaluate research hypotheses in clinical research.
The performance of scientific hypothesis–generation tasks will be measured using metrics that include the following qualitative and quantitative measures: validity, significance, clinical relevance, feasibility, clarity, testability, ethicality, number of total scientific hypotheses, and average time used to generate 1 scientific hypothesis. In an online survey, the panel of clinical research experts will assess the generated research hypotheses based on the metrics we have developed.
A survey (
What are the needed but currently unavailable attributes that would help clinical researchers generate research hypotheses? The question is similar to a wish list of features facilitating research hypothesis generation.
Follow-up questions clarify potential confusion during the think-aloud processes or enable meaningful inquiries when unexpected or novel questions emerge during observations [
Can the list of items in
A survey will be developed based on the comparative results of the first 4 groups and administered at the end of Study 2. The identified differences from Study 1 will be the focus of the survey to determine whether these differences are helpful in Study 2.
Summary of the study procedures. Blue boxes indicate data collected in Study 1.
Study 1 tests all 3 study hypotheses. If the study hypotheses are supported in Study 1, we will conduct a follow-up study (Study 2) to examine whether and how the efficiency or quality of research hypothesis generation may be improved or refined.
We will use the think-aloud method to conduct the following tasks: (1) observe the research hypothesis–generation process; (2) transcribe data, and analyze and assess if VIADS [
Experienced and inexperienced clinical researchers conduct research hypothesis–generation tasks using the same secondary data sets. The tasks are captured to describe participants’ research hypothesis–generation processes when guided by analysis of the same data sets. VIADS is also used as an example of a secondary data analytics tool to assess participants’ research hypothesis–generation processes. Accordingly, 2 control groups do not use VIADS, while 2 intervention groups do use it.
A pilot study, Study 0, was conducted in July and August 2021 for each group to assess the feasibility of using the task flow, data sets, screen capture, audio and video recordings, and study scripts. In Study 1, 4 groups are utilized.
Design of Study 1 for assessment of the hypothesis–generation process in clinical research.
Variable | Number of experienced clinical researchers | Number of inexperienced clinical researchers |
Not using VIADS | 8 (Group 1) | 8 (Group 2) |
Using VIADS | 8 (Group 3) | 8 (Group 4) |
We conduct 1 study session individually with each participant. The participants are given the same data sets from which to generate research hypotheses with or without VIADS during each study session. The same researcher observes the entire process and captures the process via think-aloud video recordings. Follow-up or inquiry questions from the observing researcher are used complementarily during each study session. The study sessions are conducted remotely via Webex. All screen activities are captured and recorded via BB FlashBack [
If the results of Study 1 indicate group differences as expected (in particular, differences are found between experienced and inexperienced clinical researchers, along with differences between groups using VIADS and not using VIADS), Study 2 will be conducted to examine whether the efficiency of the research hypothesis–generation process could be improved. Specifically, we will analyze for group differences, identify anything related to VIADS, and incorporate them into VIADS. Then, we will test whether the revised and presumably improved VIADS increases the efficiency or quality of the research hypothesis–generation process. In this process, the group that has the lowest performance in Study 1 will be invited to use the revised VIADS to conduct research hypothesis–generation tasks again with the same data sets. However, at least 8 months will be allowed to pass in order to provide an adequate wash-out period. This group’s performance will be compared with that in Study 1.
If no significant difference can be detected between the groups using VIADS and not using VIADS in Study 1, we will use the usability and utility survey results to revise VIADS accordingly without conducting Study 2. If no significant difference can be detected between experienced and inexperienced clinical researchers, Study 2 will recruit both experienced and inexperienced clinical researchers as study participants. In this case, Study 2 will focus only on whether a revised version of VIADS impacts the research hypothesis–generation process and outcomes.
While conducting the given tasks, the qualitative data collected via the think-aloud method will be transcribed, coded, and analyzed according to the grounded theory [
The outcome variable used is based on the participants’ performance in research hypothesis–generation tasks. The performance is measured by the quality (eg, significance and validity) and quantity of the research hypotheses generated through the tasks and the average time to generate 1 research hypothesis. At least three clinical research experts will assess each hypothesis using the scientific hypothesis assessment metrics that were developed for this study. The metrics include multiple items, each on a 5-point Likert scale. The details of the metrics are described in the Instrument Development section.
The data will be analyzed with a 2-tailed factorial analysis. We calculated the required sample size in G*Power 3.1.9.7 for a 2-way ANOVA. The sample size was 32 based on a confidence level of 95% (α=.05), effect size f=0.5, and power level of 0.8 (β=.20).
In Study 1, we will use descriptive statistics to report how many hypotheses were generated, average time spent per hypothesis, and how many hypotheses were evaluated for each participant. A 2-way ANOVA will be used to examine the main effects of VIADS and experience, as well as the interaction effect of the 2 factors. In the ANOVA, the outcome variable is the expert evaluation score. The follow-up survey data will be analyzed using correlations to examine the relationship between participants’ self-rated creativity, the average time per hypothesis generation, the number of hypotheses generated, and the expert evaluation score. The SUS will be used to assess the usability of VIADS. Qualitative data from the SUS surveys will be used to guide revisions of VIADS after Study 1. Descriptive statistics will also be used to report the answers to other follow-up questions.
A
The study is a National Institutes of Health–funded R15 (Research Enhancement Award) project supported by the National Library of Medicine. We began collecting data in July 2021 via pilot studies, and here provide some preliminary results and summarize our early observations. The full results and analysis of the study will be shared in future publications when we complete the study.
Based on a literature review, metrics were developed to assess research hypotheses [
Multiple items were used to measure the quality for each dimension mentioned above. For each item, a 5-point Likert scale (1=strongly disagree, 5=strongly agree) was used for measurement. After internal consensus, we conducted external consensus and sought feedback from the external expert panel via an online survey [
We have developed the initial study scripts for Study 1 and have revised them after the pilot study sessions (Study 0). We have developed the screening survey for the recruitment process. The follow-up survey is administered after each study session, regardless of the group. The standard SUS survey [
Currently, we are recruiting all levels of participants, including inexperienced clinical researchers, experienced clinical researchers, and a panel of clinical research experts. Recruitment began in July 2021 with pilot study participants. To participate, anyone involved in clinical research can share their contact email address by filling out the screening survey [
For this study, we are using data from the National Ambulatory Medical Care Survey (NAMCS) conducted by the Centers for Disease Control and Prevention [
The experience level of the clinical researchers was determined by predetermined criteria. To determine which group a participant joins (inexperienced [groups 2 and 4] or experienced [groups 1 and 3] clinical researchers), we used the R statistical software package (blockrand [
We have noticed that both forward [
Many participants did not use any advanced analysis during the study sessions. However, they did use their prior experience and knowledge to generate research hypotheses based on the frequency rank of the provided data sets and by comparing the 2 years of data (ie, 2005 vs 2015).
Noticeably, VIADS can answer more complicated questions both systematically and more rapidly. However, we noticed that the training session required to enable use of VIADS increased participants’ cognitive load. Cognitive load refers to the amount of working memory resources required during the task of thinking and reasoning. Without a comprehensive analysis, we cannot yet draw further conclusions about the potential effects of this cognitive load.
A critical first step in the life cycle of any scientific research study is formulating a valid and significant research question, which can usually be divided into several scientific hypotheses. This process is often challenging and time-consuming [
Participants have been observed to use analogical reasoning [
Analyzing the research hypothesis–generation process may include several initial cognitive components. These components can consist of searching for, obtaining, compiling, and processing evidence; seeking help to analyze data; developing inferences using obtained evidence and prior knowledge; searching for external evidence, including literature or prior notes; seeking connections between evidence and problems; considering feasibility, testability, ethicality, and clarity; drawing conclusions; formulating draft research hypotheses; and polishing draft research hypotheses [
We recognize that research hypothesis generation and the long refining and improving process matter most during the study sessions. Without technologies to capture what occurs cognitively during the research hypothesis–generation process, we may not be able to answer fundamental questions regarding the mechanisms of scientific hypothesis generation.
Establishing the evaluation metrics to assess research hypotheses is the first step and the critical foundation of the overall study. The evaluation metrics used will determine the quality measurements of the research hypotheses generated by study participants during the study sessions.
Research hypothesis evaluation is subjective, but metrics can help standardize the process to some extent. Although the metrics may not guarantee a precise or perfectly objective evaluation of each research hypothesis, such metrics provide a consistent instrument for this highly sophisticated cognitive process. We anticipate that a consistent instrument will help to standardize the expert panel’s evaluations. Additionally, objective measures, such as the number of research hypotheses generated by the study participants and the average time each participant spends generating each research hypothesis, will be used in the study. The expert panel is therefore expected to provide more consistent research hypothesis evaluations with the combined metrics and objective measures.
Although developing metrics appears linear, as presented in
Many challenges have been encountered while conducting the research hypothesis–generation study sessions. These include the following:
What can be considered a research hypothesis? What will not be considered a research hypothesis? The response will determine which research hypotheses will be evaluated by the panel of clinical research experts.
How should the research hypothesis be measured accurately? Although we developed workable metrics, the metrics are not yet perfect.
How can we accurately capture thinking, reasoning, and networking processes during the research hypothesis–generation process? Currently, we use the think-aloud method. Although think-aloud protocols can capture valuable information about the thinking process, we recognize that not all processes can be articulated during the experiments, and not everyone can articulate their processes accurately or effectively.
What happens when a clinical researcher examines and analyzes a data set and generates a research hypothesis subconsciously?
How can we capture the roles of the external environment, internal cognitive capacity, existing knowledge base of the participant, and interactions between the individual and the external world in these dynamic processes?
When faced with challenges, we see opportunities for researchers to further explore and identify a clearer picture of research hypothesis generation in clinical research. We believe that the most pressing target is developing new technologies in order to capture what clinical researchers do and think when generating research hypotheses from a data set. Such technologies can promote breakthroughs in cognition, psychology, computer science, artificial intelligence, neurology, and clinical research in general. In clinical research, such technologies can help empower clinical researchers to conduct their tasks more efficiently and effectively.
We learned some important lessons while designing and conducting this study. The first lesson involved balancing the study design (straightforward or complicated) and conducting the study (feasibility). During the design stage, we were concerned that the 2×2 study design was too simple, even though we know it does not negatively impact the value of the research. We simply considered experience levels and whether the participants used VIADS in a very complicated cognitive process. However, even for such a straightforward design, only 1 experienced clinical researcher has volunteered so far. Thus, we will first focus on inexperienced clinical researchers. Even for study sessions involving inexperienced clinical researchers, considerable time is needed to determine strategies for coding and analyzing the raw data. In order to design a complicated experiment that answers more complex questions, we must consider balancing practical workload, recruitment reality, expected timeline, and researchers’ desire to pursue a complex research question.
Recruitment is always challenging. Many of our panel invitations to clinical research experts either received no response or were rejected, which significantly delayed the study timeline, in addition to the effects of the COVID-19 pandemic. Furthermore, the IRB approval process was time-consuming, delaying our study when we needed to revise study documents. Therefore, the study timeline includes the IRB initial review and rereview cycles.
The first step of a future direction for this project is to explore the feasibility of formulating research questions based on research hypotheses. In this project, we are looking for ways to improve the efficiency of generating research hypotheses. The next step will be to explore whether we can enhance the efficiency of formulating research questions.
A possible direction for future work is to develop tools to facilitate scientific hypothesis generation guided by secondary data analysis. We may explore automating the process or incorporating all positive attributes in order to guide the process better and improve efficiency and quality.
At the end of our experiments, we asked clinical researchers what facilitates their scientific hypothesis–generation process the most. Several of their responses included repeatedly reading academic literature and discussing with colleagues. We believe intelligent tools can undoubtedly improve both aspects of scientific hypothesis generation, namely, summarizing new publications of the chosen topic areas and providing conversational support to clinical researchers. This would be a natural extension of our studies.
An additional possible direction is to expand the terminologies that can be used by VIADS, for example, the addition of RxNorm, LOINC, and SNOMED CT can be considered in the future.
Follow-up survey at the end of the hypothesis-generation tasks.
VIADS System Usability Scale survey and utility questionnaire.
List of items to consider during hypothesis generation.
International Classification of Diseases
International Classification of Diseases, 9th Revision-Clinical Modification
International Classification of Diseases, 10th Revision-Clinical Modification
Institutional Review Board
Logical Observation Identifiers Names and Codes
Medical Subject Headings
National Ambulatory Medical Care Survey
Systematized Nomenclature of Medicine-Clinical Terms
System Usability Scale
The project is supported by a grant from the National Library of Medicine of the United States National Institutes of Health (R15LM012941), and is partially supported by the National Institute of General Medical Sciences of the National Institutes of Health (P20 GM121342). The content is solely the authors’ responsibility and does not necessarily represent the official views of the National Institutes of Health.
A request for aggregated and anonymized transcription data can be made to the corresponding author, and the final decision on data release will be made on a case-by-case basis, as appropriate. The complete analysis and the results will be published in future manuscripts.
None declared.