This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Research Protocols, is properly cited. The complete bibliographic information, a link to the original publication on http://www.researchprotocols.org, as well as this copyright and license information must be included.
Data stewardship is an essential driver of research and clinical practice. Data collection, storage, access, sharing, and analytics are dependent on the proper and consistent use of data management principles among the investigators. Since 2016, the FAIR (findable, accessible, interoperable, and reusable) guiding principles for research data management have been resonating in scientific communities. Enabling data to be findable, accessible, interoperable, and reusable is currently believed to strengthen data sharing, reduce duplicated efforts, and move toward harmonization of data from heterogeneous unconnected data silos. FAIR initiatives and implementation trends are rising in different facets of scientific domains. It is important to understand the concepts and implementation practices of the FAIR data principles as applied to human health data by studying the flourishing initiatives and implementation lessons relevant to improved health research, particularly for data sharing during the coronavirus pandemic.
This paper aims to conduct a scoping review to identify concepts, approaches, implementation experiences, and lessons learned in FAIR initiatives in the health data domain.
The Arksey and O’Malley stage-based methodological framework for scoping reviews will be used for this review. PubMed, Web of Science, and Google Scholar will be searched to access relevant primary and grey publications. Articles written in English and published from 2014 onwards with FAIR principle concepts or practices in the health domain will be included. Duplication among the 3 data sources will be removed using a reference management software. The articles will then be exported to a systematic review management software. At least two independent authors will review the eligibility of each article based on defined inclusion and exclusion criteria. A pretested charting tool will be used to extract relevant information from the full-text papers. Qualitative thematic synthesis analysis methods will be employed by coding and developing themes. Themes will be derived from the research questions and contents in the included papers.
The results will be reported using the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-analyses Extension for Scoping Reviews) reporting guidelines. We anticipate finalizing the manuscript for this work in 2021.
We believe comprehensive information about the FAIR data principles, initiatives, implementation practices, and lessons learned in the FAIRification process in the health domain is paramount to supporting both evidence-based clinical practice and research transparency in the era of big data and open research publishing.
PRR1-10.2196/22505
Advancement in information communication technology is impacting the health ecosystem’s technological and analytical capabilities to store, curate, share, and analyze data from standard and nonstandard sources [
Digitalization brings opportunities and concerns in health care data processing. Despite many potential benefits, it also poses potential threats, such as breaches of privacy, disinformation and misinformation, and cyberattacks [
The European GDPR is the most recent data regulatory framework as of September 2020 and has implications on the ethical sharing of research data [
Boeckhout et al [
Beyan et al [
The need for good data stewardship among different stakeholders in scientific research is the basis on which the FAIR data principles (findability, accessibility, interoperability, and reusability) were coined in 2014 by the FORCE11 (The Future of Research Communication and e-Scholarship) community [
In 2020, Vesteghem et al [
The aims for conducting this work are to (1) provide an overview of applications of the FAIR data principles that are focused on health data research and (2) map out the existing evidence accordingly.
This scoping review will adopt the framework outlined by Arksey and O’Malley [
We have already conducted a pilot overview of the existing literature as an informal desk review and literature exploration. This overview included published works in PubMed, Google Scholar, and Web of Science. The medical and public health research librarian used the FAIR data principles’ keywords to match medical subject headings (MeSH) used to tag PubMed peer-reviewed literature, along with combinations of terms used in clinical research, public health, health care, pharmacology, and patient data.
As part of the ongoing evidence synthesis from medical and human health research journal articles that used FAIR data markup, the bibliographies of key papers were scrutinized for other complementary publications, and those articles were added to the PubMed collections shared with the authors. Further, as the key FAIR data and health articles inspired new citations, often authored by similar consortia of writers or networks of researchers, the newer citing articles were added to the stage 1 collection to demonstrate possible progress in the field of shared or open medical data. Recurrent alerts were set up to capture newly published literature on PubMed, Google Scholar, and Web of Science (
Our informal desk review has shown that many approaches used in the implementation of the FAIR data principles are applied to the life sciences domain [
As we intend to conduct this exploratory review in an iterative manner, further refinement of the research questions may become necessary. Close examination of key references in bibliographies and citing articles to gauge the impact of shared data on ensuing research and health practice will be followed as part of the secondary analysis. All proposed refinements of the research questions and search methods will be scrutinized by the authors prior to approval. We will also provide comprehensive provenance information on changes in the protocol to be fully transparent.
The general objective of this protocol is to conduct a scoping review to identify concepts, approaches, implementation experience, and lessons learned from the FAIR data principle initiatives in the health domain. The following research questions (RQs) have been formulated to meet the objective of the scoping review:
RQ 1: What approaches are being used or piloted in the implementation of the FAIR data principles in the health data domain since the conception of these principles in 2014?
RQ 2: What are the challenges and risks regarding the approaches used in the practical implementation of the FAIR data principles in the health data domain?
RQ 3: What are the suggested concepts and approaches to mitigating the concerns of the implementation of the FAIR data principles in the health data domain?
RQ 4: Which are the active public and private research and service networks involved in the implementation of the FAIR data principles in the health data domain?
RQ 5: What are the reported outcomes for data sharing, data reuse, and research publication after the implementation of the FAIR data principles in the health data domain?
With the aid of an experienced research librarian, at least two researchers will identify relevant studies from 3 primary electronic databases: PubMed, Web of Science, and Google Scholar. In addition to those, relevant grey literature from existing networks, relevant organizations, and conferences as well as the reference lists from potential papers will be searched. The keywords for the scoping review search strategies have been categorized tentatively to terms related to the FAIR data principles, data sharing, and health. Although refinement of the selected MeSH terms are possible, open terms have been proposed for the construction of the search strategy of this protocol. The Boolean operators “AND” and “OR” will be used to guide the search strategy. The following descriptors and keywords and their combinations were used to construct the strategies: “open science,” “data collection,” “data provenance,” “open access publishing,” “data*,” ”repositor*,“ ”registr*,“ ”pharma*,“ ”health*,“ ”research,“ ”biomedical research,“ ”data management,“ “FAIR data principles,” “FAIR principles,” “FAIR guiding principles,” “Data steward*,” “Data management systems,” “findable,” “findability,” “access,” “accessibility,” “interoperable,” “interoperability,” “reusable,” “reusability” (
The PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-analyses Extension for Scoping Reviews) reporting guidelines will be used for reporting the findings [
As an inclusion criterion, we will consider literature published between January 1, 2014, and December 31, 2020. The start date in 2014 is chosen due to the fact that FAIR concept initiatives and official publications became first available in that year. Moreover, to be included as a potential paper, the literature needs to be published in English and include the scope of FAIR principle applications in the health domain (defined by the operational definition). Literature published before 2014, in a language other than English, and in domain areas other than health or the operational definition of health will be excluded. All search results from online databases and grey literature sources will be exported to a reference management software to eliminate duplications. Unique search results will be exported to a screening tool to facilitate an independent screening process for the potential papers.
Rayyan software (Qatar Computing Research Institute) has been chosen as the primary screening and data extraction tool to expedite the initial screening of abstracts and titles using a semiautomated process while incorporating a high level of usability. This software supports research teams in the easier exploration of literature searches within a shorter time as well as in sharing and comparing individual researchers’ decisions to include or exclude studies [
A data-charting form will be used by the reviewers to determine which variables to extract. The form is flexible for continuous updating in an iterative manner during the data-charting process, but any changes will be tracked. The descriptive analytical approach, as described by Arksey and O’Malley [
Data-charting form.
Section | Description | |
|
Summary of the basic information of the publication | |
Publication type | Peer reviewed or grey literature | |
Country | Name of the country or countries where the study took place or focused on | |
Objective | Aim or objective of the publication | |
Methodology | The specific procedures or techniques used to identify, select, process, and analyze information | |
Study design and data management | Includes whether the researchers used quantitative, qualitative, or mixed-method approaches | |
Setting of the study | The site in which the researcher conducted the study | |
Summarized results | A short summary of the findings | |
|
Includes the research questions and the date that the literature was published | |
Suggested health care domain–specific FAIRificationa concepts and approaches | A description of FAIRification concepts and approaches in the health care domains | |
FAIR implementation challenges, risks, and lessons learned | Encountered challenges or anticipated changes and lessons learned at different stages of FAIR data principle concept introduction, infrastructure implementation, and FAIRness evaluation | |
Active networks involved in the implementation of the FAIR data principles in the health domain | Dedicated networks of scientific communities, research institutions, repositories or data archives, consortia, funding agencies, and citizens who are actively engaged in advocating FAIR principle data stewardship in the health care domains | |
FAIRification reported outcomes | FAIR implementation outcomes in terms of data sharing, data reuse, and research publication after imposing FAIR data principles in health domain |
aFAIR: findable, accessible, interoperable, reusable.
This scoping review focuses on the range of data curated and the health data research content identified. Quantitative assessment is limited to a count of the number of sources reporting a particular FAIR thematic issue or recommendation. After charting the relevant data from the studies in spreadsheets, the results will be collated and described using summary statistics, charts, figures, and common tools for analytical reinterpretation of the literature [
Our PubMed preliminary search has yielded 360 results (
This scoping review will provide insight on the initiatives, concepts, and implementation practices of FAIR data principles in health data stewardship. More specifically, it will allow for the exploration of (1) approaches being used or piloted for the implementation of the FAIR data principles in the health domain since the conception of these principles in 2014; (2) challenges, risks, lessons learned, and the suggested concepts and approaches to mitigate the concerns of implementation of the FAIR data principles in the health domain; (3) active research and service networks involved in the implementation of the FAIR data principles in the health domain; and (4) the reported outcomes for data sharing, data reuse, and research publication after the implementation of the FAIR data principles in the health domain. We anticipate increases in data repositories demanding FAIR data markup suitable for artificial intelligence extraction of statistics. We also anticipate a greater demand for the implementation of the FAIR principles in light of the ongoing COVID-19 pandemic as well as more open research activities by public and private research and service networks involved in the implementation of the FAIR data principles in the health domain. An example of such an initiative is the Research Data Alliance [
The results will be used to generate recommendations on how to integrate the FAIR principles in health research, and we will generate different knowledge dissemination materials to share project results with various stakeholders, partners, associations, and networks who may benefit from this work.
The findings of this proposed work may be used to help identify the types of available evidence that support the incorporation of FAIR data principles in health. The results will also help to clarify key concepts in the scientific literature and serve as an introduction to how research on FAIR practices is conducted. This methodological framework will help us identify the overall state of research activities that explore initiatives, concepts, and implementation practices of FAIR data principles in health data stewardship. The outcome of this review can be used to further determine areas of research based on current gaps in the literature. Conducting this scoping review will also help determine the practicality and relevance of a full systematic review on the same issues by assessing the availability of literature. Similarly, gaps that still exist in the uptake and implementation of the FAIR principles in health research will also be identified as areas of further research. This work will be of interest to various stakeholders, including health and academic institutions, publishers, researchers, and funding agencies. In the wake of the COVID-19 pandemic, it is extremely critical that health data stewardship is practiced in a FAIR manner to facilitate the globally coordinated response [
Once complete, this work will be published in a peer-reviewed journal, and the results will also be presented at appropriate forums or conferences. Ethical approval is not required, as only secondary data from published sources will be included in the scoping review and the public is not invited to participate in this work.
Supplementary Material.
findable, accessible, interoperable, and reusable
The Future of Research Communication and e-Scholarship
General Data Protection Regulation
Health Research Board
medical subject heading
Preferred Reporting Items for Systematic Reviews and Meta-analyses Extension for Scoping Reviews
research question
We acknowledge support for the article processing charge from the DFG (German Research Foundation; 393148499) and the Open Access Publication Fund of the University of Greifswald.
None declared.