FAIR data

Last updated
An introduction to FAIR data and persistent identifiers.
FAIR data principles.jpg

FAIR data are data which meet principles of findability, accessibility, interoperability, and reusability (FAIR). [1] [2] The acronym and principles were defined in a March 2016 paper in the journal Scientific Data by a consortium of scientists and organizations. [1]

Contents

The FAIR principles emphasize machine-actionability (i.e., the capacity of computational systems to find, access, interoperate, and reuse data with none or minimal human intervention) because humans increasingly rely on computational support to deal with data as a result of the increase in volume, complexity, and creation speed of data. [3]

The abbreviation FAIR/O data is sometimes used to indicate that the dataset or database in question complies with the FAIR principles and also carries an explicit data‑capable open license.

FAIR principles, as published by GO FAIR

Findable

The first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers. Machine-readable metadata are essential for automatic discovery of datasets and services, so this is an essential component of the FAIRification process.

F1. (Meta)data are assigned a globally unique and persistent identifier

F2. Data are described with rich metadata (defined by R1 below)

F3. Metadata clearly and explicitly include the identifier of the data they describe

F4. (Meta)data are registered or indexed in a searchable resource

Accessible

Once the user finds the required data, they need to know how they can be accessed, possibly including authentication and authorisation.

A1. (Meta)data are retrievable by their identifier using a standardised communications protocol

A1.1 The protocol is open, free, and universally implementable

A1.2 The protocol allows for an authentication and authorisation procedure, where necessary

A2. Metadata are accessible, even when the data are no longer available

Interoperable

The data usually need to be integrated with other data. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing.

I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.

I2. (Meta)data use vocabularies that follow FAIR principles

I3. (Meta)data include qualified references to other (meta)data

Reusable

The ultimate goal of FAIR is to optimise the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings.

R1. (Meta)data are richly described with a plurality of accurate and relevant attributes

R1.1. (Meta)data are released with a clear and accessible data usage license

R1.2. (Meta)data are associated with detailed provenance

R1.3. (Meta)data meet domain-relevant community standards

The principles refer to three types of entities: data (or any digital object), metadata (information about that digital object), and infrastructure. For instance, principle F4 defines that both metadata and data are registered or indexed in a searchable resource (the infrastructure component).

GO FAIR Foundation, FAIR Principles, https://www.gofair.foundation/

Acceptance and implementation of FAIR data principles

Before FAIR a 2007 paper was the earliest paper discussing similar ideas related to data accessibility. [4]

At the 2016 G20 Hangzhou summit, the G20 leaders issued a statement endorsing the application of FAIR principles to research. [5] [6]

In 2016 a group of Australian organisations developed a Statement on FAIR Access to Australia's Research Outputs, which aimed to extend the principles to research outputs more generally. [7]

In 2017 Germany, Netherlands and France agreed to establish [8] an international office to support the FAIR initiative, the GO FAIR International Support and Coordination Office. [9]

Other international organisations active in the research data ecosystem, such as CODATA or Research Data Alliance (RDA) also support FAIR implementations by their communities. FAIR principles implementation assessment is being explored by FAIR Data Maturity Model Working Group of RDA, [10] CODATA's strategic Decadal Programme "Data for Planet: Making data work for cross-domain challenges" [11] mentions FAIR data principles as a fundamental enabler of data driven science.

"Implementing FAIR Data Principles - The Role of Libraries", a guide Implementing FAIR Data Principles - The Role of Libraries.pdf
"Implementing FAIR Data Principles – The Role of Libraries", a guide

The Association of European Research Libraries recommends the use of FAIR principles. [12]

A 2017 paper by advocates of FAIR data reported that awareness of the FAIR concept was increasing among various researchers and institutes, but also, understanding of the concept was becoming confused as different people apply their own differing perspectives to it. [13]

Guides on implementing FAIR data practices state that the cost of a data management plan in compliance with FAIR data practices should be 5% of the total research budget. [14]

In 2019 the Global Indigenous Data Alliance (GIDA) released the CARE Principles for Indigenous Data Governance as a complementary guide. [15] The CARE principles extend principles outlined in FAIR data to include Collective benefit, Authority to control, Responsibility, and Ethics to ensure data guidelines address historical contexts and power differentials. The CARE Principles for Indigenous Data Governance were drafted at the International Data Week and Research Data Alliance Plenary co-hosted event, "Indigenous Data Sovereignty Principles for the Governance of Indigenous Data Workshop", held 8 November 2018, in Gaborone, Botswana. [16]

The lack of information on how to implement the guidelines have led to inconsistent interpretations of them. [17]

In January 2020, representatives of nine groups of universities around the world produced the Sorbonne declaration on research data rights, [18] which included a commitment to FAIR data, and called on governments to provide support to enable it. [19]

In 2021, researchers identified the FAIR principles as a conceptual component of data catalog software tools, with the other components being metadata management, business context and data responsibility roles. [20]

In April 2022, Matthias Scheffler and colleagues argued in Nature that FAIR principles are "a must" so that data mining and artificial intelligence can extract useful scientific information from the data. [21]

However, making data (and research outcomes) FAIR is a challenging task as well as it is challenging to assess the FAIRness. [22]

See also

Related Research Articles

The Committee on Data of the International Science Council (CODATA) was established in 1966 as the Committee on Data for Science and Technology, originally part of the International Council of Scientific Unions, now part of the International Science Council (ISC). Since November 2023 its president is the Catalan researcher Mercè Crosas.

A data steward is an oversight or data governance role within an organization, and is responsible for ensuring the quality and fitness for purpose of the organization's data assets, including the metadata for those data assets. A data steward may share some responsibilities with a data custodian, such as the awareness, accessibility, release, appropriate use, security and management of data. A data steward would also participate in the development and implementation of data assets. A data steward may seek to improve the quality and fitness for purpose of other data assets their organization depends upon but is not responsible for.

<span class="mw-page-title-main">Open data</span> Openly accessible data

Open data is data that is openly accessible, exploitable, editable and shared by anyone for any purpose. Open data is licensed under an open license.

<span class="mw-page-title-main">Publications Office of the European Union</span> Academic and legal publisher

The Publications Office of the European Union is the official provider of publishing services and data, information and knowledge management services to all EU institutions, bodies and agencies. This makes it the central point of access to EU law, publications, open data, research results, procurement notices, and other official information.

<span class="mw-page-title-main">Barend Mons</span> Biologist and bioinformatics specialist

Barend Mons is a molecular biologist by training and a leading FAIR data specialist. The first decade of his scientific career he spent on fundamental research on malaria parasites and later on translational research for malaria vaccines. In the year 2000 he switched to advanced data stewardship and (biological) systems analytics. He is currently a professor in Leiden and most known for innovations in scholarly collaboration, especially nanopublications, knowledge graph based discovery and most recently the FAIR data initiative and GO FAIR. Since 2012 he is a Professor in biosemantics in the Department of Human Genetics at the Leiden University Medical Center (LUMC) in The Netherlands. In 2015 Barend was appointed chair of the High Level Expert Group on the European Open Science Cloud. Since 2017 Barend is heading the International Support and Coordination office of the GO FAIR initiative. He is also the elected president of CODATA, the standing committee on research data related issues of the International Science Council. Barend is a member of the Netherlands Academy of Technology and Innovation(ACTI). He is also the European representative in the Board on Research Data and Information (BRDI) of the National Academies of Science for engineering and medicine in the USA. Barend is a frequent keynote speaker about FAIR and open science around the world, and participates in various national and international boards.

Resource Description and Access (RDA) is a standard for descriptive cataloging initially released in June 2010, providing instructions and guidelines on formulating bibliographic data. Intended for use by libraries and other cultural organizations such as museums and archives, RDA is the successor to Anglo-American Cataloguing Rules, Second Edition (AACR2).

<span class="mw-page-title-main">Metadata</span> Data about data

Metadata is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including:

Open scientific data or open research data is a type of open data focused on publishing observations and results of scientific activities available for anyone to analyze and reuse. A major purpose of the drive for open data is to allow the verification of scientific claims, by allowing others to look at the reproducibility of results, and to allow data from many sources to be integrated to give new knowledge.

An open repository or open-access repository is a digital platform that holds research output and provides free, immediate and permanent access to research results for anyone to use, download and distribute. To facilitate open access such repositories must be interoperable according to the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). Search engines harvest the content of open access repositories, constructing a database of worldwide, free of charge available research. Data repositories are the cornerstone for FAIR data practices and are used expeditiously within the scientific community.

Before data.europa.eu, the EU Open Data Portal was the point of access to public data published by the EU institutions, agencies and other bodies. On April 21, 2021 it was consolidated to the data.europa.eu portal, together with the European Data Portal: a similar initiative aimed at the EU Member States.

The UK Data Service is the largest digital repository for quantitative and qualitative social science and humanities research data in the United Kingdom. The organisation is funded by the UK government through the Economic and Social Research Council and is led by the UK Data Archive at the University of Essex, in partnership with other universities.

The cTuning Foundation is a global non-profit organization developing a common methodology and open-source tools to support sustainable, collaborative and reproducible research in Computer science and organize and automate artifact evaluation and reproducibility inititiaves at machine learning and systems conferences and journals.

The Plant Genomics and Phenomics Research Data Repository (PGP) is a data publication infrastructure to comprehensively publish multi-domain plant research data. It is hosted at the Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) in Gatersleben, Germany. The repository hosts DOI citeable datasets that are not being published in public repositories because of their volume or data scope. PGP enables the publication of gigabyte-scale datasets and is registered as a research data repository at FAIRSharing.org, re3data.org and OpenAIRE as a valid EU Horizon 2020 open data archive. The above features, the programmatic interface and the support of standard metadata formats, enable PGP to fulfil the FAIR data principles—findable, accessible, interoperable, reusable. The PGP repository was created using the e!DAL software infrastructure and applies an on-premises approach to "bring the infrastructure to the data" (I2D).

<span class="mw-page-title-main">FORCE11</span> Non-profit organisation to enhance research publishing and communication

FORCE11 is an international coalition of researchers, librarians, publishers and research funders working to reform or enhance the research publishing and communication system. Initiated in 2011 as a community of interest on scholarly communication, FORCE11 is a registered 501(c)(3) organization based in the United States but with members and partners around the world. Key activities include an annual conference, the Scholarly Communications Institute and a range of working groups.

Elizabeth 'Liddy' Nevile is an Australian academic and a pioneer in using computers and the World Wide Web for education in Australia. In 1989-1990 she was instrumental in establishing the first program in the world that required all students to have laptop computers, at Methodist Ladies College, Melbourne, Australia.

The CARE Principles for Indigenous Data Governance are a set of principles intended to guide open data projects in engaging Indigenous Peoples rights and interests. CARE was created in 2019 by the International Indigenous Data Sovereignty Interest Group, a group that is a part of the Research Data Alliance. It outlines collective rights related to open data in the context of the United Nations Declaration on the Rights of Indigenous Peoples and Indigenous data sovereignty.

<span class="mw-page-title-main">Susanna-Assunta Sansone</span> British-Italian data scientist

Susanna-Assunta Sansone is a British-Italian data scientist who is professor of data readiness at the University of Oxford where she leads the data readiness group and serves as associate director of the Oxford e-Research Centre. Her research investigates techniques for improving the interoperability, reproducibility and integrity of data.

The Microdata Information System (MISSY) is a database-driven online system that provides structured metadata about selected research data of official statistics free of charge as part of the service infrastructure of the German Microdata Lab (GML) at GESIS – Leibniz Institute for the Social Sciences. MISSY is targeted at empirically-working scientists who use official microdata for their research.

Data governance in the context of Indigenous data involves supporting the data interests, gaps and priorities of Indigenous peoples, in order to enable Indigenous self-determination. Generally, data governance refers to who has ownership, control and access over the use of data. Indigenous data governance requires the data to surround Indigenous peoples and its purpose to reflect Indigenous needs and priorities, rather than omitting Indigenous peoples in the production of Indigenous data.

The German Human Genome-Phenome Archive (GHGA) is a consortium within the national data infrastructure (NFDI). GHGA aims to create a secure national data infrastructure for human omics data in order to make these data available for scientific research while preventing the misuse of data.

References

  1. 1 2 Mark D. Wilkinson; Michel Dumontier; IJsbrand Jan Aalbersberg; et al. (15 March 2016). "The FAIR Guiding Principles for scientific data management and stewardship". Scientific Data . 3 (1): 160018. doi:10.1038/SDATA.2016.18. ISSN   2052-4463. PMC   4792175 . PMID   26978244. Wikidata   Q27942822.
  2. Annika Jacobsen; Ricardo de Miranda Azevedo; Nick Juty; et al. (31 January 2020). "FAIR Principles: Interpretations and Implementation Considerations". Data Intelligence: 10–29. doi:10.1162/DINT_R_00024. ISSN   2641-435X. Wikidata   Q76394974.
  3. "FAIR Principles". GO FAIR. Retrieved 2020-02-16. CC-BY icon.svg Material was copied from this source, which is available under a Creative Commons Attribution 4.0 International License.
  4. Sandra Collins; Françoise Genova; Natalie Harrower; Simon Hodson; Sarah Jones; Leif Laaksonen; Daniel Mietchen; Rūta Petrauskaité; Peter Wittenburg (7 June 2018), "Turning FAIR data into reality: interim report from the European Commission Expert Group on FAIR data", Zenodo, doi : 10.5281/ZENODO.1285272
  5. G20 leaders (5 September 2016). "G20 Leaders' Communique Hangzhou Summit". europa.eu. European Commission.{{cite web}}: CS1 maint: numeric names: authors list (link)
  6. "European Commission embraces the FAIR principles – Dutch Techcentre for Life Sciences". Dutch Techcentre for Life Sciences. 20 April 2016.
  7. "Australian FAIR Access Working Group". www.fair-access.net.au. Retrieved 2020-04-03.
  8. Ministerie van Onderwijs, Cultuur en Wetenschap (2017-12-01). "Progress towards the European Open Science Cloud – GO FAIR – News item – Government.nl". www.government.nl (in Dutch). Retrieved 2020-02-15.
  9. "GO FAIR Offices". GO FAIR. Retrieved 2023-12-05.
  10. "FAIR Data Maturity Model WG". RDA. 2018-09-23. Retrieved 2020-02-16.
  11. "Decadal Programme – CODATA". www.codata.org. Retrieved 2020-02-16.
  12. Association of European Research Libraries (13 July 2018). "Open Consultation on FAIR Data Action Plan – LIBER". LIBER.
  13. Barend Mons; Cameron Neylon; Jan Velterop; Michel Dumontier; Luiz Olavo Bonino da Silva Santos; Mark D. Wilkinson (7 March 2017). "Cloudy, increasingly FAIR; revisiting the FAIR Data guiding principles for the European Open Science Cloud". Information Services & Use . 37 (1): 49–56. doi:10.3233/ISU-170824. ISSN   0167-5265. Wikidata   Q29051495.
  14. Science Europe (May 2016). "Funding research data management and related infrastructures" (PDF).
  15. "CARE Principles of Indigenous Data Governance". Global Indigenous Data Alliance. Retrieved 2019-09-30.
  16. O'Donnell, Dan (2021-12-16). "Thinking about the CARE Principles in the Digital Humanities". DARIAH-Campus.
  17. Annika Jacobsen; Ricardo de Miranda Azevedo; Nick Juty; et al. (31 January 2020). "FAIR Principles: Interpretations and Implementation Considerations". Data Intelligence: 10–29. doi:10.1162/DINT_R_00024. ISSN   2641-435X. Wikidata   Q76394974.
  18. Sorbonne Declaration on Research Data Rights, Jan 27 2020
  19. Open data 'tougher' than open access and needs 'mindset change', Times Higher Education, January 31 2020
  20. Ehrlinger, Lisa; Schrott, Johannes; Melichar, Martin; Kirchmayr, Nicolas; Wöß, Wolfram (2021), Kotsis, Gabriele; Tjoa, A Min; Khalil, Ismail; Moser, Bernhard (eds.), "Data Catalogs: A Systematic Literature Review and Guidelines to Implementation", Database and Expert Systems Applications - DEXA 2021 Workshops, Communications in Computer and Information Science, vol. 1479, Cham: Springer International Publishing, pp. 148–158, doi:10.1007/978-3-030-87101-7_15, ISBN   978-3-030-87100-0, S2CID   237621026 , retrieved 2022-06-26
  21. Scheffler, Matthias; Aeschlimann, Martin; Albrecht, Martin; Bereau, Tristan; Bungartz, Hans-Joachim; Felser, Claudia; Greiner, Mark; Groß, Axel; Koch, Christoph T.; Kremer, Kurt; Nagel, Wolfgang E. (2022-04-28). "FAIR data enabling new horizons for materials research". Nature. 604 (7907): 635–642. arXiv: 2204.13240 . Bibcode:2022Natur.604..635S. doi:10.1038/s41586-022-04501-x. ISSN   0028-0836. PMID   35478233. S2CID   248415511.
  22. Candela, Leonardo; Mangione, Dario; Pavone, Gina (2024-05-27). "The FAIR Assessment Conundrum: Reflections on Tools and Metrics". Data Science Journal. 23: 33. doi: 10.5334/dsj-2024-033 .