FAIR Principles for Research Software (FAIR4RS Principles)
FAIR for Research Software (FAIR4RS) WG |
Group co-chairs: Michelle Barker, Paula Andrea Martinez, Leyla Garcia, Daniel S. Katz, Neil Chue Hong, Jennifer Harrow, Fotis Psomopoulos, Carlos Martinez-Ortiz, Morane Gruenpeter |
Recommendation Title: FAIR Principles for Research Software (FAIR4RS Principles) |
Authors: Neil P. Chue Hong*, Daniel S. Katz*, Michelle Barker*; Anna-Lena Lamprecht, Carlos Martinez, Fotis E. Psomopoulos, Jen Harrow, Leyla Jael Castro, Morane Gruenpeter, Paula Andrea Martinez, Tom Honeyman; Alexander Struck, Allen Lee, Axel Loewe, Ben van Werkhoven, Catherine Jones, Daniel Garijo, Esther Plomp, Francoise Genova, Hugh Shanahan, Joanna Leng, Maggie Hellström, Malin Sandström, Manodeep Sinha, Mateusz Kuzak, Patricia Herterich, Qian Zhang, Sharif Islam, Susanna-Assunta Sansone, Tom Pollard, Udayanto Dwi Atmojo; Alan Williams, Andreas Czerniak, Anna Niehues, Anne Claire Fouilloux, Bala Desinghu, Carole Goble, Céline Richard, Charles Gray, Chris Erdmann, Daniel Nüst, Daniele Tartarini, Elena Ranguelova, Hartwig Anzt, Ilian Todorov, James McNally, Javier Moldon, Jessica Burnett, Julián Garrido-Sánchez, Khalid Belhajjame, Laurents Sesink, Lorraine Hwang, Marcos Roberto Tovani-Palone, Mark D. Wilkinson, Mathieu Servillat, Matthias Liffers, Merc Fox, Nadica Miljković, Nick Lynch, Paula Martinez Lavanchy, Sandra Gesing, Sarah Stevens, Sergio Martinez Cuesta, Silvio Peroni, Stian Soiland-Reyes, Tom Bakker, Tovo Rabemanantsoa, Vanessa Sochat, Yo Yehudi (*) Lead authors with equal contributions |
Impact: Adoption and implementation of the FAIR for research software principles will create significant benefits for many stakeholders, including increased research reproducibility for research organisations, better practices and more software usage for its developers, clarity for funders around their own policies and requirements for software investments, and guidelines for publishers on sharing requirements. This work will be of value to software project owners, researchers, users of research data and software, the scientific community, research software engineers, software developers who publish their software, software catalogue maintainers, repository managers, software preservation and archival experts, policymakers who are responsible for defining digital policies, and organisations that create, modify, manage, share, protect, and preserve research software, funders of research, and others with an interest in the FAIR principles for research software. |
DOI: 10.15497/RDA00065 |
Citation and download: Chue Hong, N. P., Katz, D. S., Barker, M., Lamprecht, A.-L., Martinez, C., Psomopoulos, F. E., Harrow, J., Castro, L. J., Gruenpeter, M., Martinez, P. A., Honeyman, T., et al. (2021). FAIR Principles for Research Software (FAIR4RS Principles). Research Data Alliance. DOI: 10.15497/RDA00065 |
This output has been superseded by the FAIR Principles for Research Software (FAIR4RS Principles) DOI: 10.15497/RDA00068
Abstract:
Research software is a fundamental and vital part of research worldwide, yet there remain significant challenges to software productivity, quality, reproducibility, and sustainability. Improving the practice of scholarship is a common goal of the open science, open source software and FAIR (Findable, Accessible, Interoperable and Reusable) communities, but improving the sharing of research software has not yet been a strong focus of the latter.
To improve the FAIRness of research software, the FAIR for Research Software (FAIR4RS) Working Group has sought to understand how to apply the FAIR Guiding Principles for scientific data management and stewardship to research software, bringing together existing and new community efforts. Many of the FAIR Guiding Principles can be directly applied to research software by treating software and data as similar digital research objects. However, specific characteristics of software — such as its executability, composite nature, and continuous evolution and versioning — make it necessary to revise and extend the principles.
This document presents the first version of the FAIR Principles for Research Software (FAIR4RS Principles). It is an outcome of the FAIR for Research Software Working Group (FAIR4RS WG).
The FAIR for Research Software Working Group is jointly convened as an RDA Working Group, FORCE11 Working Group, and Research Software Alliance (ReSA) Task Force.
Attachment | Size |
---|---|
FAIR4RS_Principles_v0.3_RDA-RFC.pdf | 781.84 KB |
- Log in to post comments
- 34325 reads
Author: Keith Russell
Date: 14 Jun, 2021
Hi all,
Thank you for a really interesting translation of the FAIR principles for software. I like the solutions for addressing the fact that Accessible, Interoperable and (Re-)Usable are different things for software. One thing I wondered about is whether it would be worth explicitely mentioning use of software for specific analysis and therefore include links to Identifiers of articles and data sets. I think you do already cover that to some extent under Interoperable, but might it be worth a mention under Provenance?
I noted one unfinished line "would not have responsibility for making the depende....."
But again, really interesting and great work.
Regards
Keith
Author: Neil Chue Hong
Date: 25 Jun, 2021
Dear Keith,
on behalf of the FAIR4RS drafting team, thank you for your comment.
As you note, we partially address this under I2. Software includes qualified references to other objects.
Under R1.2 Software is associated with detailed provenance (which should be read in conjunction with R1) our intent was that this should primarily cover the provenance of the software itself, rather than the software's role in the provenance of analysis.
I think the suggestion you are making is implicitly covered by R1. Software is described with a plurality of accurate and relevant attributes, but I wanted to clarify whether you meant that software is more reusable if it explicitly includes links (with identifiers) to articles and data sets that show its use for a specific type of analysis (effectively documenting its use for a type of research)? Or did you mean something else?
Good spot! That sentence should read "This would ultimately be intractable as the authors of the software would not have responsibility for making the dependencies FAIR."
Author: Keith Russell
Date: 06 Jul, 2021
Hi Neil,
To answer your question:
'whether you meant that software is more reusable if it explicitly includes links (with identifiers) to articles and data sets that show its use for a specific type of analysis (effectively documenting its use for a type of research)? Or did you mean something else?' That is indeed what I meant.
R1 in the FAIR principles is a bit of a tricky one as it often just viewed as a header. The real principles are the two Sub-principles R1.1 and R1.2. So to cover off these rich attributes they would need to be covered under Provenance.
For a researcher trying to interpret whether they can re-use a piece of software for their purpose having a link to the article describing the analysis and the data set used in the analysis is extremely valuable, it will give them a much better view for what purpose the software was designed. It is great that you are asking for these identifiers under I2. I just wonder if it doesn't hurt to be more explicit that such information is invaluable to increase the reusability and not just the interoperability of the software. This is possibly more important for the guiding documentation than that it can be covered in the principles itself.
I hope this is helpful.
Kind regards,
Keith
Author: Joachim Wuttke
Date: 14 Jun, 2021
»To support a wide range of reuse scenarios, the license should be as
open as possible« [R1.1] - Adoption of this rule would preclude voluntary choice of the GPL.
Many researchers consciously choose the GPL. The aporia here is about how to support reuse. Is software reuse at large supported best by allowing inconditional use of our creations? Or by the reasonable and fair demand that those who reuse our code also allow reuse of their extensions?
Whatever your stance on this, you should make it explicit, and not advise against the GPL without proper explanation.
Author: Neil Chue Hong
Date: 25 Jun, 2021
Dear Joachim,
on behalf of the FAIR4RS drafting team, thank you for your comment.
Thanks for raising this - it was something that came up drafting discussions. We are conscious that the original intent of the FAIR principles was "as open as possible, as closed as necessary". Therefore we've drafted the principle itself to only say "R1.1 Software must have a clear and accessible license." In the explanatory text, we've said "To support a wide range of reuse scenarios, the license should be as open as possible. This license must also be compatible with the requirements of the licenses of the software’s dependencies so that the software can be legally combined." The intent here is to persuade people to use open source licenses where possible, not preclude the use of the GPL. Personally, I would include both permissive and copyleft licenses to be "as open as possible". We will make this clearer in the guidance that is used by communities.
If you have a suggestion of a specific rewording to improve the explanatory text itself, please do let us know
Author: Limor Peer
Date: 15 Jun, 2021
Thank you for producing a very comprehensive and clear document. I'm pleased to see language in this version that refers to the shared responsibility for applying FAIR4RS Principles -- I think it's important to emphasize that while the primary responsibility lies with software creators and owners, it often falls to those tasked with quality review and stewardship (who are really the first users) to follow through. I suggest also referencing this issue, and the need to build capacity for this type of work, in the section on the path to adoption. Thanks again for great work!
Author: Neil Chue Hong
Date: 25 Jun, 2021
Dear Limor,
on behalf of the FAIR4RS drafting team, thank you for your comment.
You make a very good point, and we will add a reference to this issue in the section on path to adoption as you suggest.
Author: Yo Yehudi
Date: 21 Jun, 2021
These principles are very clear and well laid out - two small comments, both about possible examples of the principles:
A1 talks about protocols to access software. I wasn't sure if this meant something like git or https, or whether it meant a defined process document on a website, or something else. Maybe it meant all of those? :)
Similarly, F4: Metadata are FAIR and indexable. I couldn't decide based on this if publishing a software artifact as a ZIP on zenodo, with embedded .cff might becompliant with this rule, or if perhaps I am supposed to upload the cff itself to a repo somewhere.... or maybe something else? I broadly understand the _intention_ of this rule but struggled a little to understand the specifics about how one might meaningfully comply.
Other than that I thought the rules were really clear and wasdelighted to see the note about overloading accessiblity as a term :) it's too little loved as it is and I dread seeing it dropped in favour of FAIR accessibility.
Author: Neil Chue Hong
Date: 25 Jun, 2021
Dear Yo,
on behalf of the FAIR4RS drafting team, thank you for your comments.
There was considerable debate during the initial discussions and consultations about how to best interpret this specific principle from the way it is phrased for FAIR data. For much software, there are very commonly used technical communications protocols such as git or https to gain access to the software. Arguably (though we did not include this in the explanatory text) a line of text on a website saying "email the author at this address and you'll be emailed the code" could fulfil A1, A1.1 and A1.2.
It would be good to get a sense from the community about whether there is broad commonality around how people normally get access to software, in which case we can be more specific in the explanatory text.
I think the challenge here is that we don't have enough community practice around this yet. Adding metadata in a file to a repository would certainly comply, as would registering the metadata when depositing the software in e.g. Zenodo.
If you have some suggestions for the kind of guidance that you'd like to see to help you apply this principles, it would be really helpful for us as we develop training and guidance materials.
There is definitely a need for follow-on work to decide what principles are required to complement FAIR for research software in the same way that, for instance, the CARE principles extend and complement FAIR for data.
Author: Nicola Soranzo
Date: 24 Jun, 2021
Thanks for working on this important topic! I haven't had the time to read it through yet, but just noticed a copy-paste typo at page 23: "F4. Metadata are FAIR and is
searchable and indexable." should be "F4. Metadata are FAIR and are searchable and indexable."
Also a question: for "R1.1. Software must have a clear and accessible license.", did you consider the license proliferation issue? This affects the reusability of the software when used as a dependency for other softwares (see also https://opensource.org/proliferation-report ), so it may be a good idead to recommend choosing, when going for an open source license, one of the "popular licenses" listed at https://opensource.org/licenses .
Author: Neil Chue Hong
Date: 25 Jun, 2021
Dear Nicola,
on behalf of the FAIR4RS drafting team, thank you for your comments.
Thanks for spotting this - as you can tell, we did a final editing pass to rationalise the way that we used terms that can be both singular, plural and plural singular, but clearly missed some.
This is a good point to raise, and it did come up in the discussions and earlier consultations. During the drafting, it was decided that we should aim to respect community standards and norms. However this particular issue is probably common to all communities, so is something that we could address in the explanatory text to improve practice.
Author: Nicola Soranzo
Date: 01 Jul, 2021
Hi Neil,
thanks for your reply!
Yes, addressing in the explanatory text would be appropriate, I think. Thanks for considering!
Author: Tek Raj Chhetri
Date: 24 Jun, 2021
Thank you for producing the comprehensive document. I have a few comments (or suggestions).
Great work!
Regards,
Tek
Author: Neil Chue Hong
Date: 25 Jun, 2021
Dear Tek,
on behalf of the FAIR4RS drafting team, thank you for your comments.
The use of linked data is certainly an approach that supports the FAIR principles for research software. In the draft, we didn't explicitly mention it, because its use is not commonplace across all communities, and there are other approaches that are being used by different communities. Here, I think there's a strong role for community specific guidance which explains how linked data can be used to make software (and other research objects) FAIR.
There is definitely a wider challenge around how legislation (and other policies) affect the exchange and processing of data. However these are probably at a level above the FAIR principles for research software - it would be possible for the software to be FAIR, even if the data is subject to GDPR or similar laws. An example might be its use on synthetic data.
If you have a specific scenario which you think isn't currently addressed by the principles around data protection, we'd be interested to hear it, so we can discuss whether this should be explicitly addressed in the FAIR4RS principles.
Would it be more appropriate to refer to the OpenAPI Specification (which is a successor to Swagger API documentation)?
Author: Joris van Eijnatten
Date: 29 Jun, 2021
Clearly, a lot of work has gone into drafting these principles. It is great to see the result of this effort. Thanks, on behalf of the community! For the Netherlandse eScience Center, transparency, reproducibility and reusability of research software are fundamental and we hope these principles will indeed be helpful in that direction. I note that principle R1.1 is equivalent with the existing recommendation https://fair-software.eu/recommendations/license/.
In principle R3 “domain-relevant community standards” seems to encompass a very broad range of things (documentation, coding practices, standards for testing). Although we agree that standard practices will be community dependent, it would be great to see at least some level of agreement between these communities. Otherwise, providing guidance is impossible.
Another question concerns the definition of community itself in the context of R3. If research software is a ‘fundamental and vital part of research worldwide’ , it seems an omission that research and researchers as such are hardly mentioned in fleshing out the FAIR principles. For example, in R3 it isn’t made clear what is understood by ‘the community’. If the community is seen as a community of developers/RSEs in a narrower sense, then coherence is easier to come by than if the community is seen as something focused more on research as such. Ideally both communities would form an integral unity, but the point is that research tends to follow its own course. In other words: research follows research problems rather than research software, while research software should follow research demand. This means that ‘fragmentation of community practice’ is necessarily difficult (if not impossible) to avoid.
Quality is mentioned once as a challenge (in the abstract) and once as a goal (as the need for ‘high quality software’). Yet the relevance and importance of quality software aren’t addressed explicitly anywhere in the document. One would expect ‘quality’ to surface under ‘Reusable’, but it doesn’t. The implication is that software quality has little or no direct bearing on usability or reusability, if only because quality means different things for different people and for different types of software . If that is the case (and it is one that could be defended), why mention quality in the first place? Or is software quality something that partially overlaps with the FAIR principles but not completely addressed by them?
Author: Neil Chue Hong
Date: 02 Jul, 2021
Dear Joris,
on behalf of the FAIR4RS drafting team, thank you for your comments.
We acknowledge the effort and leadership that the NLeSC has provided in this area, and look forward to your support for the implementation and adoption of the FAIR4RS Principles.
One of the challenges identified here is that different research communities have different "transitions" between what is considered appropriate for different maturity levels of a piece of research software. Therefore, while it may be possible to get some consensus about the minimum standards that should be used, it would be near impossible for the FAIR4RS Principles to document this in a more specific way that worked for all communities.
However, the wider work of the FAIR4RS working group, in particular the FAIR4RS Roadmap work, will be addressing the question of how communities can provide guidance, including on how to choose and apply standards in relation to the FAIR4RS Principles.
How the FAIR4RS Principles should define "community" came up in many of the comments in the previous consultations, and it may be that we have not adequately explained how we define the concept. Our intent is that community broadly means the wider definition of a research community (which will include both researchers, RSEs and others). We have seen some of these communities address fragmentation of community practice through the agreement of guidance / guidelines e.g. the ESIP Software Assessment Guidelines for the Earth Sciences.
We'd be happy to take suggestions of how to improve the explanatory text to define "the community" more clearly.
This is a good point. In an earlier draft, there was some additional text around potential aspects of software quality under the "Reusable" principles, principally around the "dependendability" or "robustness" of software. This was later removed for the reasons that you mention - reusability was redefined slightly, and software quality is seen as something which overlaps with the FAIR4RS principles but is not directly addressed by them. We will review the text to reduce any confusion around this.
We envisage that there will be complementary principles to the FAIR4RS principles, in the same way that there have been for data e,g, FAIR+CARE, that address other software principles such as quality or accessibility (in the other software engineering sense).
Author: Neil Chue Hong
Date: 07 Jul, 2021
The following comments were received via Twitter from Paul Secular, and are cross-posted here with permission.
Thanks for this feedback. It is clear that the multiple meanings of accessibility in the context of software will need to be clearly explained.
As you note, the FAIR4RS Principles on their own are not sufficient for adoption. Further work by the FAIR4RS working group will address adoption, and the wider work of the Research Software Alliance and collaborators is aimed at those changes in research culture, funding and institutions that are required - as noted by Dan Katz, a co-author of the FAIR4RS Principles, FAIR is not the end goal, it's just one part of the solution.
We agree that non-gendered language should be used.
In this case, I believe “she/he" is only in Appendix B, from directly quoted text from the GO-FAIR website presented to show the evolution of the principles - I will pass this suggestion on to them.
Author: Rob van Nieuwpoort
Date: 09 Jul, 2021
I would like to express my thanks and appreciation for this wonderful initiative and report. Great work!
Here some feedback on the document.
On P9 the document states: “Many software engineering practices are relevant to various of the FAIR4RS Principles. For instance: localization can improve accessibility, design patterns can improve interoperability, and documentation and encapsulation can improve reusability. Nevertheless, while important more generally for producing high quality software, they are best addressed separately from (but as a complement to) the FAIR4RS Principles.”
I understand the details of these best practices are out of scope here. Nevertheless, this is one essential point that distinguishes software from data, and thus one of the reasons we need specific FAIR principles for software in the first place. I think the most profound impact of the best practices are around reusability (I also like the other examples given, but I would argue they all have an impact on reusability as well). So, shouldn’t we include a general statement about software quality and best practices as principle R4? For example:
R4. Software aims to adhere to relevant software engineering best practices.
And then the examples as given on P9.
On page 12, the Interoperable principle:
“I: The software interoperates with other software through exchanging data and/or metadata, and/or through interaction via application programming interfaces (APIs).”
When phrased like this, it is unclear what the added value of this principle is. How else should software interoperate then through exchanging data and/or metadata, and/or through interaction via application programming interfaces (APIs)? I.e., this principle always holds and does not exclude any software? Moreover, interoperation is not a goal by itself, it is a means to an end. I think what is meant is that these data formats and APIs should be standardized? If so, can’t we simply start with the text of principle I1?
On page 12: “This includes the use of data types and formats that ideally are formally described using controlled vocabularies, to facilitate machine readability and data exchange.”
This is a very limited view on data exchange. Quite often, data is simply exchanged via formally defined (binary or text-based) standard formats. Not every data exchange standard uses controlled vocabularies, nor should.
Also, I think we should carefully consider the language we use. It should be understandable by domain researchers. I suspect almost no one (other than computer scientists) knows what “controlled vocabularies” are.
P13: “Where software interacts via APIs, these should be documented so that their capabilities can be inspected and understood by humans and machines.”
This is a fairly limited scope of this principle. Shouldn't we advise the use of open API protocols when possible here?
P14, second paragraph of R3.
I understand the point of this paragraph, but I think we should consider that communities can be very diverse, as are their members. Many different programming languages and file formats are used within any given community, with good reason. That often is a strength, not a weakness. I don't think we should universally strive for convergence and certainly not in a top-down manner or by including this in the FAIR4RS principles. There are so many aspects to this, such as using the right tool for the job, availability of hardware and software, training, domain-specific aspects, etc. I don't believe in convergence of languages and formats as a goal of the FAIR4RS principles. If this is a goal or not is up to the relevant communities. I would suggest omitting the second paragraph of R3 altogether.
Thanks again for all the hard work, excellent result!
Author: Neil Chue Hong
Date: 09 Jul, 2021
The following comments were received from Wilhelm Hasselbring and are cross-posted with permission.
How to refer to containers, and their role in the FAIR principles for software is something that was discussed in the previous community consultations. It was felt that the use of containers (as opposed to software in another form) does not inherently make software more interoperable or reusable if the FAIR4RS principles are followed. However it would be reasonable to add some text to include containers as a common form of executable package.
Software as a Service is primarily discussed under Accessibility (and in Challenges to Implementation). We think there needs to be more work done to understand whether any aspects of SaaS need to be considered by the FAIR4RS Principles or as part of more general FAIR principles for services / workflows. We will add something to the Challenges to Implementation section to note this.
Author: Neil Chue Hong
Date: 09 Jul, 2021
The following comment was received from Peter Hill via the RSE Slack and is cross-posted with permission.
Thanks for this input. We will discuss how best to address this in the text. During the drafting process, we felt that qualified references for software to other objects were slightly different from data. We have provided some examples and were looking to keep the text more open to different ways that these references were expected to be used, but your comment suggests that this could be clarified further.
Author: Michelle Barker
Date: 11 Jul, 2021
The following comments were received via email and posted with permission from:
Michael Barton
Director CoMSES.Net (Network for Computational Modeling in Social and Ecological Sciences)
Author: Michelle Barker
Date: 11 Jul, 2021
The following comments were received via email and posted with permission from:
Michael Barton
Director CoMSES.Net (Network for Computational Modeling in Social and Ecological Sciences)
Author: Morane Gruenpeter
Date: 11 Jul, 2021
A few comments were recieved during the FAIR4RS RDA France atelier in the notes document:
https://docs.google.com/document/d/19NCSJuPJiAVPb0tclfHJoGURYpEV09o7n7aF...
I'll translate the comments on the document and copy here ASAP.
Author: Roberto Di Cosmo
Date: 11 Jul, 2021
Dear all,
thanks for asking for input from a broader community.
As a general remark, this document shows how difficult it is to try and adapt the FAIR principles to software : this does not come as a surprise, since FAIR principles were designed with data bases in mind, while software is of a totally different nature, and it is not clear at all that the best way to approach its specificities is to try and translate principles designed for something else.
One striking example is the particularly surprising statement in the current draft that these principles are not concerned with long term preservation of software : this is in total contradiction with clear statements made in various high level documents like the recently released National Plan for Open Science in France, and the EOSC SIRS report published in 2020.
The absence of documents like the EOSC SIRS report from the bibliography (which is much too short and is not sufficiently used to support the statements made in the document), makes one wonder whether the working group is missing key relevant information.
The section "Challenges to Implementation" reveals that there is no clearcut and consensual approach for a broad range of important subjects identified in the report, and it is difficult to understand how one can state, in the « path to adoption » section that the next step is to « promote the outcomes, aiming to raise awareness and facilitate a wider adoption of the FAIR4RS WG outcomes by existing and emerging initiatives ».
I strongly suggest that the working group takes the time to rreconsider in depth this draft, before moving forward.