Try out our Laces apps Laces Suite

dutch infrastructure

What is FAIR data?


  • Insight
2 minutes
What does FAIR stand for?

FAIR is an acronym for Findable, Accessible, Interoperable, and Reusable. These topics are relevant to those wishing to reuse their data and have machines automatically find and process their data to help people do so.

  • Findable data refers to data that has been given an address for others to locate. This information about the data, or metadata, should be easy to find for both humans and computers.
  • Accessible means data sources can be accessed, possibly including authentication and authorization to retrieve data.
  • Interoperability data is suitable for machines to work together and exchange data with minimal effort.
  • Reusable data refers to information that is not just technically reusable, but the organization and legislation for use are in place as well.

These so-called ‘FAIR Guiding Principles’ (for scientific data management and stewardship) were published in Scientific Data in 2016 (see also Mark D. Wilkinson et al., 2016) to help people make their data FAIR.  Although FAIR was initiated by predominantly academic stakeholders aiming at scientific and scholarly data, today, new technologies open up entirely different domains we will address in this blog.

Why do we need FAIR data?

In science, the desire to cooperate freely and not have technology (or people) be a barrier is as ancient as science itself. Cooperation is a means to get more done in less time and to avoid doing work redundantly or, worse, inconsistently. As in science, these desires, needs, and ambitions are no different. For example, an American study found that 92% of job vacancies analyzed demanded digital skills (NSC, 2023).

However, since the rise of information technology, the amount of data has increased ten times in the past ten years and has grown exponentially (Statista, 2023). With the increasing power of computer processing, the need for automation grew as well. Together with that was the ambition to have computers do analysis people could never do themselves (think of the current expectations of AI).

Another driver for FAIR data is the fact that data can’t be stored in the same database anymore because it becomes too complex for technical and organizational reasons. It’s becoming too much data for too many different purposes to stay manageable. To counter this ‘monolithic’ approach, there is currently a whole movement of working distributed and federated (using other but cooperating databases/machines). As Wilkinson et al (2016) already noticed, a growing, less centralized data ecosystem will make data itself more diverse, as well as increasingly require it to become FAIR.

So what is the problem with data that is not FAIR by default? In other words, what makes data UNFAIR?

What makes data UNFAIR?

To help people, we first need people to help computers. It is people who make agreements on how data should be described and how computers should handle it. This is known as making data ‘machine-actionable’, and it helps because, by default, there is no common ground for understanding data by man and machines. But why is this the case?

The biggest challenge is to instruct computers -or to be more specific, software applications- to process data in a common (FAIR) way. However, there are barriers. Firstly, software applications arise in their own ‘vacuum’, and the need for sharing data outside that vacuum often comes at a later stage. Thus, most applications aren’t created with FAIR in mind from the start (Laces is!). Secondly, the way a specific application handles data should always be optimized for its own specific processing. That’s why applying unFAIR principles is common, and FAIR principles aren’t a priority for software developers.

In conclusion, most software applications use their metadata within their own locating systems (unfindable), with their own authentication and authorization mechanisms (unaccessible), giving their own meaning to their data and data formats (not interoperable) and having no common ways of organizing the creation, maintenance, and use of data (not reusable). That makes data, most of the time, unFAIR.

Requirements to degrees of FAIRness?

As GO FAIR puts it: there are degrees of FAIRness. The FAIR Guiding Principles are high-level guidelines. FAIR is not a standard nor a specific technology or solution. So, if FAIR doesn’t prescribe solutions, what solutions are already available that tick the FAIR boxes?

To be Findable and Accessible, data resources need to be identified, made searchable, and use a standard communication protocol for computers, including authentication and authorization procedures. Furthermore, technology needs to be open, free, and universally implementable. Another requirement is for data to be Interoperable- it needs to contain all meaning necessary for an application to query and understand (interpret and process) the data.

Last but certainly not least: to be Reusable, data needs to be created with its reuse in mind. The community of users and the data ecosystem should agree on all relevant information for the community as a whole to use the data, like license terms and clear descriptions of contextual information and provenance. To make data Reusable, it requires people to determine who does what, when, and how. And because these are not technical choices, they involve software functionality to support people in the process.

What technology is available that fits these requirements?

Linked FAIR Data?

The most evident technology standards that tick these boxes are W3C’s web standards: the dominant technology standards, accepted and managed by the Internet community and authority W3C. Particularly web standards that exceed communication standards for mere websites but aim to enable all (types of) data to become FAIR. These standards are gathered in a technology called Linked Data.

Linked Data is in essence, a combination of existing and additional agreements on identifying data online, making it retrievable and allowing for both man and machine to describe and interpret all meaning necessary for understanding the contents.

By publishing data as Linked Data, many FAIR boxes are ticked by default. It makes data Findable, Accessible, Interoperable, and, when done right, Reusable. It does require software developers to make either translations and transformations from their proprietary data to Linked Data or more and more software developers to take FAIRness into account from the start.

By using Laces, users are not burdened with the complexity of implementing FAIR Principles but are enabled to manage, publish, and share FAIR data from scratch using off-the-shelf tooling.

Curious about the Laces solutions? Schedule a free demo with one of our experts. It will take about 45 minutes, no strings attached.

Reference:

  • GO FAIR website, accessed November 2023: https://www.go-fair.org/fair-principles/
  • GO FAIR and Linked Data: https://github.com/peta-pico/FAIR-nanopubs/blob/master/principles.ttl
  • National Skills Coalition website, https://nationalskillscoalition.org/resource/publications/closing-the-digital-skill-divide/
  • Statista website, accessed November 2023, https://www.statista.com/statistics/871513/worldwide-data-created/
  • Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3:160018 doi: 10.1038/sdata.2016.18 (2016).
  • Wilkinson, M. D. et al. Addendum The FAIR Guiding Principles for scientific data management and stewardship (2019)

Read more
Insight
Laces smart standards blogpost

How to Create Reusable and Software-Neutral Smart Standards with Laces

You shouldn’t be spending your time manually scanning PDF after PDF, hunting for specifications and requirements to copy and paste into your software applications, like systems engineering or requirements management software. This kind of approach can only lead to inconsistencies and redundancies. Not to mention the vast amount of time and resources wasted just to […]

Read
Insight

5 Types of Data Standardization Using Linked Data

The exchange of information between computers is essential for facilitating collaboration and streamlining processes across different stakeholders. For this to happen, computers must agree on how to interpret data, just like humans need to understand the language, vocabulary, and grammar used in a conversation. To interpret data, software requires uniform data, and data can only […]

Read
Insight

What is a Knowledge Graph?

Nowadays, the term ‘knowledge graph’ is used by many in many different ways. To make things worse, there are all sorts of associated terms to describe the concept, like ontology, property graph, and semantic networks or nets. A commonality is that knowledge graphs have to do with structured data and the specific structure of graphs. […]

Read