What is FAIR data?

Insight

2 minutes

What does FAIR stand for?

FAIR is an acronym for Findable, Accessible, Interoperable, and Reusable. These topics are relevant to those wishing to reuse their data and have machines automatically find and process their data to help people do so.

Findable data refers to data that has been given an address for others to locate. This information about the data, or metadata, should be easy to find for both humans and computers.

Accessible means data sources can be accessed, possibly including authentication and authorization to retrieve data.

Interoperability data is suitable for machines to work together and exchange data with minimal effort.

Reusable data refers to information that is not just technically reusable, but the organization and legislation for use are in place as well.

The authors published the ‘FAIR Guiding Principles’ (for scientific data management and stewardship) in Scientific Data in 2016 (refer also to Mark D. Wilkinson et al., 2016) to assist individuals in making their data FAIR. Although predominantly academic stakeholders initiated FAIR with the aim of addressing scientific and scholarly data, today, new technologies are opening up entirely different domains that we will address in this blog.

Why do we need FAIR data?

In science, the desire to cooperate freely and not have technology (or people) be a barrier is as ancient as science itself. Cooperation is a means to get more done in less time and to avoid doing work redundantly or, worse, inconsistently. As in science, these desires, needs, and ambitions are no different. For example, an American study found that 92% of job vacancies analyzed demanded digital skills (NSC, 2023).

However, since the rise of information technology, the amount of data has increased ten times in the past ten years and has grown exponentially (Statista, 2023). With the increasing power of computer processing, the need for automation grew as well. Together with that was the ambition to have computers do analysis people could never do themselves (think of the current expectations of AI).

Another driver for FAIR data arises from the complexity that emerges when data cannot be stored in the same database anymore due to technical and organizational reasons. It’s becoming too much data for too many different purposes to stay manageable. To counter this ‘monolithic’ approach, there is currently a whole movement of working distributed and federated (using other but cooperating databases/machines). As Wilkinson et al (2016) already noticed, a growing, less centralized data ecosystem will make data itself more diverse, as well as increasingly require it to become FAIR.

So what is the problem with data that is not FAIR by default? In other words, what makes data UNFAIR?

What makes data UNFAIR?

To help people, we first need people to help computers. It is people who make agreements on how to describe data and how computers should handle it. We know this as making data ‘machine-actionable’, and it helps because, by default, there is no common ground for understanding data by man and machines. But why is this the case?

The biggest challenge is to instruct computers -or to be more specific, software applications- to process data in a common (FAIR) way. However, there are barriers. Firstly, software applications arise in their own ‘vacuum’, and the need for sharing data outside that vacuum often comes at a later stage. Thus, most applications aren’t created with FAIR in mind from the start (Laces is!). Secondly, the way a specific application handles data is something we always have to optimize for its own specific processing. That’s why applying unFAIR principles is common, and FAIR principles aren’t a priority for software developers.

In conclusion, most software applications use their metadata within their own locating systems (unfindable), with their own authentication and authorization mechanisms (unaccessible), giving their own meaning to their data and data formats (not interoperable) and having no common ways of organizing the creation, maintenance, and use of data (not reusable). That makes data, most of the time, unFAIR.

Requirements to degrees of FAIRness?

As GO FAIR puts it: there are degrees of FAIRness. The FAIR Guiding Principles are high-level guidelines. FAIR is not a standard nor a specific technology or solution. So, if FAIR doesn’t prescribe solutions, what solutions are already available that tick the FAIR boxes?

To be Findable and Accessible, data resources need to be identified, made searchable, and use a standard communication protocol for computers, including authentication and authorization procedures. Furthermore, we have to ensure that technology is open, free, and universally implementable. Another requirement is for data to be Interoperable- it needs to contain all meaning necessary for an application to query and understand (interpret and process) the data.

Last but certainly not least: to be Reusable, data needs to be created with its reuse in mind. The community of users and the data ecosystem should agree on all relevant information for the community as a whole to use the data, like license terms and clear descriptions of contextual information and provenance. To make data Reusable, it requires people to determine who does what, when, and how. And because these are not technical choices, they involve software functionality to support people in the process.

What technology is available that fits these requirements?

Linked FAIR Data?

The most evident technology standards that tick these boxes are W3C’s web standards: the dominant technology standards, accepted and managed by the Internet community and authority W3C. Particularly web standards that exceed communication standards for mere websites but aim to enable all (types of) data to become FAIR. These standards are gathered in a technology called Linked Data.

Linked Data is in essence, a combination of existing and additional agreements on identifying data online, making it retrievable and allowing for both man and machine to describe and interpret all meaning necessary for understanding the contents.

By publishing data as Linked Data, many FAIR boxes are ticked by default. It makes data Findable, Accessible, Interoperable, and, when done right, Reusable. It does require software developers to make either translations and transformations from their proprietary data to Linked Data or more and more software developers to take FAIRness into account from the start.

By using Laces, users are not burdened with the complexity of implementing FAIR Principles but are enabled to manage, publish, and share FAIR data from scratch using off-the-shelf tooling.

Curious about the Laces solutions? Schedule a free demo with one of our experts. It will take about 45 minutes, no strings attached.

Reference:

GO FAIR website, accessed November 2023: https://www.go-fair.org/fair-principles/
GO FAIR and Linked Data: https://github.com/peta-pico/FAIR-nanopubs/blob/master/principles.ttl
National Skills Coalition website, https://nationalskillscoalition.org/resource/publications/closing-the-digital-skill-divide/
Statista website, accessed November 2023, https://www.statista.com/statistics/871513/worldwide-data-created/
Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3:160018 doi: 10.1038/sdata.2016.18 (2016).
Wilkinson, M. D. et al. Addendum The FAIR Guiding Principles for scientific data management and stewardship (2019)

Insight

Interoperability is Gaining Traction as Data Sharing Evolves

In basic terms, interoperability is the ability of computer systems to work together. This means that they can connect to each other, understand each other, and seamlessly communicate and exchange information. This creates the conditions for effective data sharing between systems. However, to do this, it is crucial to ensure that all the systems and […]

Read

Updates

Laces Hub, The GitHub For Linked Data: What Is It And How Can It Benefit You?

In the early 1990s, the internet was on its way to becoming the go-to medium for exchanging knowledge and communicating. However, early internet users encountered a significant problem. To access and explore a website, they needed to know its URL address. Yahoo solved this challenge by developing the first search engine capable of searching the […]

Read

Insight

Avoiding Pitfalls for an Object Type Library (OTL)

An Object Type Library (OTL) is a standardized framework that improves communication between different parties and their software systems, making data exchange easier. While there are many benefits, you need to model an OTL correctly to use them for their intended purpose. This requires the right people, tools, and resources. This article discusses pitfalls that […]

Read