The importance of findability, accessibility, interoperability, and reusability has become even more significant to academics as government agencies increasingly require data openness and accessibility for funding eligibility.
FAIR data are data that meet principles of findability, accessibility, interoperability, and reusability (FAIR). The acronym and principles were defined by a consortium of scientists and organizations back in 2016 to improve and expand scientific study through better data management.1
Since then, FAIR principles for scientific data have received strong support from global organizations such as G7; national governments; science funding agencies including the European Commission and National Institutes of Health; and pharmaceutical leaders including Novartis, Pfizer, and GSK. In fact, government legislation requiring data accessibility has passed in both the United States (Open, Public, Electronic and Necessary Government Data Act)2 and Europe (EU Data Governance Act).3
But how pervasive has FAIR become in the various areas of life sciences and how is it being leveraged today?
Academia has long been a leader in enhancing innovation via collaboration, data sharing, and iterative innovation. Its critical role in establishing and evangelizing FAIR data sharing principles is thus no surprise. Additionally, the importance of FAIR has become even more significant to academics as government agencies increasingly require data openness and accessibility for funding eligibility.
Many researchers were early supporters of FAIR data management. For example, the Pistoia Alliance—a not-for-profit collaboration of life sciences companies, pharmaceutical leaders, vendors, publishers, and academic groups—publicized their support in a 2019 Drug Discovery Today feature article.4 They said that by the life sciences adopting FAIR for R&D, “the plethora of new and powerful analytical tools such as artificial intelligence and machine learning will be able, automatically and at scale, to access the data from which they learn, and on which they thrive. FAIR is a fundamental enabler for digital transformation.”
The bioinformatics and crystallography data used in biology research are shared widely in open repositories. Researchers likely often encounter FAIR data when using genomic databases such as the Protein Data Bank (PDB), Universal Protein Resource (UniProt), or GenBank. But beyond these standardized and open data types, many life science organizations are also outfitting their labs with research tools that support FAIR data principles from the earliest days of data collection through analysis and reporting, which is becoming increasingly common for grant funding requirements.
Researchers may have tools such as an electronic lab notebook (ELN) that can help ensure proper collection and management of lab data, even when the data types and research workflows used evolve. Effective ELNs should let researchers push raw data collected in the lab data directly into analytics software without needing to waste time or risk error by manually preparing and transferring data.
For example, a researcher might want to pass assay data, and all associated metadata, for curve fit calculation, and then tie the results back to the ELN record file. The results should become part of a federated master data source so they are easily searchable and re-usable in the future by colleagues with appropriate access permissions.
Although chemistry research has not inherently reflected a FAIR culture, efforts to evolve have been ongoing. In 2019, the Chemistry Implementation Network (ChIN) published a manifesto calling for the industry to “Go FAIR.”5 Other leading chemistry organizations, including the Research Data Alliance (CRDIG) and International Union of Pure and Applied Chemistry (IUPAC), have joined the cause, calling for the establishment of chemistry standards (e.g., naming conventions, structural representations, and characterization and reaction data), as well as the widespread adoption of R&D tools and infrastructure that aid in FAIR data collection, sharing, and analysis.6
Industrywide support is growing. For example, there has been a call to make it easier for researchers to share chemical structure information in journal submissions.7 Awards have been established to recognize the best chemistry FAIR datasets published each year.8 And companies are creating solutions that make it easier for chemists to annotate, track, and manage data throughout their chemistry workflows.
While change will be gradual, most experts agree that the chemistry community needs to create a FAIR culture that is supported by standards and infrastructure development promoting machine readability of chemical data and other digital resources.
Calls to “Go FAIR” have also been increasing in the chemical and materials industry, which has traditionally focused on experimental exploration and computational modeling, rather than any data-driven approach. In fact, a data-driven approach to chemicals and materials R&D has often been deemed too difficult to achieve because the complex workflows and data types used are thought to make process documentation and data exchange uniquely challenging.
That mindset is changing as companies work to create a united platform for chemicals and materials R&D. In an April 2022 Nature perspective,9 leading materials experts argued that a fundamental paradigm shift toward data-driven materials R&D is necessary for the industry to thrive. They propose that such change is essential to reaping value from a gold mine of available research data that have largely remained unleveraged, despite the potential it holds for use in advanced analytics and AI.
These experts support the adoption of FAIR data principles for materials R&D and explain that there is a great need for supportive data infrastructures and research tools, such as ELNs or LIMS, that will help facilitate a shift toward data-driven materials R&D. While in the coming years, the predicted changes brought about by a FAIR data infrastructure will not replace scientists, those who use such an infrastructure will very likely replace those who don’t.
About the Author
Christian Olsen is the Associate VP, Industry Principal of Biologics at Dotmatics, a leader in R&D scientific software connecting science, data and decision-making.