FAIR Data Maturity Model: specification and guidelines - draft

    You are here

10
Apr
2020

FAIR Data Maturity Model: specification and guidelines - draft

By Marieke Willems


FAIR Data Maturity Model WG

Group co-chairs: Edit HerczogKeith Russell, Shelley Stall

Supporting Output title:  FAIR Data Maturity Model: specification and guidelines

Impact: This document describes a maturity model for FAIR assessment with assessment indicators, priorities and evaluation methods. This is useful for the normalisation of assessment approaches to enable comparison of their results.

Authors: FAIR Data Maturity Model Working Group

DOI: 10.15497/RDA00045

Citation:  RDA FAIR Data Maturity Model Working Group (2020). FAIR Data Maturity Model: specification and guidelines. Research Data Alliance. DOI: 10.15497/RDA00045

Note: Supporting output Results of an Analysis of Existing FAIR Assessment Tools can be found here

 

This output has been superseded by the FAIR Data Maturity Model: specification and guidelines DOI: 10.15497/rda00050

Context

Findability, Accessibility, Interoperability and Reusability – the FAIR principles – intend to define a minimal set of related but independent and separable guiding principles and practices that enable both machines and humans to find, access, interoperate and re-use research data and metadata. The FAIR principles have to be considered as inspiring concepts but not strict rules. Unfortunately, they often lead to diverse interpretations and ambiguity.

To remedy the proliferation of FAIRness measurements based on different interpretations of the principles, the RDA Working Group “FAIR data maturity model” established in January 2019 aims to develop a common set of core assessment criteria for FAIRness, as an RDA Recommendation. In the course of 2019 and the first half of 2020, the WG established a set of indicators and maturity levels for those indicators.

As a result of the work, a first set of guidelines and a checklist related to the implementation of the indicators were produced, with the objective to further align the guidelines for evaluating FAIRness with the needs of the community.

 

Objective

This document specifies the indicators for the FAIR assessment designed for re-use in evaluation approaches and provides guidelines for their use. The guidelines are intended to assist evaluators to implement the indicators in the evaluation approach or tool they manage.

The exact way to evaluate data based on the core criteria is up to the owners of the evaluation approaches, taking into account the requirements of their community. The objective here is then to make sure that the indicators, the maturity levels and the prioritisation are understood in the same way. The maturity model is not meant as a “how to”, but instead as a way to normalise assessment.

 

Use of this document

The FAIR data maturity model guidelines primarily address owners of (FAIR) assessment methodologies, including questionnaires and automated tools, as listed for example in FAIRassist.

Nevertheless, this document is not only restricted to these stakeholders. It may also be used by researchers, data service owners, funders and infrastructures in different scientific and research disciplines, industry and the public sector, who are active and/or interested in the FAIR data principles and in particular in assessment criteria and methodologies for evaluating their real-life uptake and implementation level. This document provides definitions and examples for every indicator - as mentioned above - in order to avoid confusion or ambiguity, and aims to provide a clear outline of the framework (i.e. indicators with their maturity levels and priorities) linking the indicators to the principles, and suggesting the way the indicators may be evaluated.

 

Read the full recommendation.
 

Output Status: 
Other Outputs (Not official)
Review period start: 
Tuesday, 14 April, 2020 to Wednesday, 13 May, 2020
Group content visibility: 
Use group defaults
Primary WG Focus / Output focus: 
Domain Agnostic: 
Domain Agnostic
  • Makx Dekkers's picture

    Author: Makx Dekkers

    Date: 20 Apr, 2020

    A request from the editorial team: please indicate the section of the document or the indicator that your comment is about, and, if possible, also include a suggestion for improvement. Many thanks!

  • John Brown's picture

    Author: John Brown

    Date: 22 Apr, 2020

    Hi FAIR Data Maturity Model Working Group,

    This looks very thorough - thankyou for your excellent work!

    I was curious in reading s3.1 (ID: RDA-F1) - would the expectation be that both the data and the metadata have separate persistent, globally unique identifiers (eg. DOIs or other)? Or would it be that either the metadata OR the dataset has the PID? I ask as I think many end-users would be confused about which DOI to cite if each dataset had two IDs (one for data, one for metadata).

    Thanks,
    JB

  • Christophe Bahim's picture

    Author: Christophe Bahim

    Date: 27 Apr, 2020

    Thanks John. Having two separate indicators – with a priority essential – implicitly requests to have a persistent and globally unique identifier for both the data and metadata. 

    As stated in the following article, discussing the interpretations and implementation considerations per FAIR principle; 

    Principle F1 states that digital resources, i.e., data and metadata, must be assigned a globally unique and persistent identifier in order to be found and resolved by computers.

    Current choices are for each community to choose, for all appropriate digital resources (i.e., data and metadata) [...]

    Furthermore, to answer your second question, this WG does not treat whether the metadata DOI, data DOI or both DOIs should be cited. It is up to the community's discretion. 

  • Dzulia Terzijska's picture

    Author: Dzulia Terzijska

    Date: 23 Apr, 2020

    I don't understand the wording of:

    "Useful: such an indicator addresses an aspect that is nice-to-have but is not could indicator MAY be satisfied, but not necessarily indispensable." (2.2 Priorities)

  • Makx Dekkers's picture

    Author: Makx Dekkers

    Date: 23 Apr, 2020

    Thanks Dzulia, this is indeed an editorial glitch. The text should read:

    Useful: such an indicator addresses an aspect that is nice-to-have but is not necessarily indispensable.

  • Sharif Islam's picture

    Author: Sharif Islam

    Date: 24 Apr, 2020

    Github issue #52 mentions: 

    "Upgrade priority of I2-01D Data uses standard vocabularies from Important to Essential"
    "Upgrade priority of I2-01M Metadata uses standard vocabularies from Important to Essential"

    However, in the pdf, I noticed RDA-I2-01D is marked as Useful and I2-01M is marked as Important. 

    Shouldn't these changes be reflected in this version? 

  • Christophe Bahim's picture

    Author: Christophe Bahim

    Date: 27 Apr, 2020

    Thanks Sharif. Besides being proposed for priority change, these two indicators (I2-01D & I2-01M) were also proposed for deletion – as identified out of scope (see GitHub). Following the deletion and combination of indicators, the IDs have been reshuffled. That is why, one should not try to compare indicators using old (e.g. F1-01M) and new IDs (e.g. RDA-F1-01M).

  • Hervé L'Hours's picture

    Author: Hervé L'Hours

    Date: 25 Apr, 2020

    Great progress and work, congratulations. Lots of comments to follow. For now I'd like to double check RDA-A1.2-02D.  Should this be RDA-A1.2-01D which is otherwise absent from '3.1 List of Indicators'? Both are used in the relevant part of '3.3 Indicators for Accessible'. I need to update the alignment of these indicators to some other work so it would be useful to confirm. Thanks,

     

  • Christophe Bahim's picture

    Author: Christophe Bahim

    Date: 27 Apr, 2020

    Thanks Hervé. Due to the suppression and combination of indicators, and subsequently the reshuffle of IDs, it seems that we have forgotten to consistently ID'ed that indicator! I thus confirm; RDA-A1.2-02D should indeed be RDA-A1.2-01D. 

  • Hervé L'Hours's picture

    Author: Hervé L'Hours

    Date: 30 Apr, 2020

    Great, thanks for confirming Christophe

  • Fernando Aguilar's picture

    Author: Fernando Aguilar

    Date: 08 May, 2020

    Hi Group,

    Thanks for your work, it is really interesting. Some comments below:

    RDA-F2-01M Rich metadata is provided to allow discovery

    This indicator can be evaluated by verifying that metadata is provided. The amount of metadata to be provided may also be part of the metadata policy of the repository where the data is published.

    I think this is not enough. The quality of the metadata is not about the quantity. I would suggest any mechanism to extract information or any link to standards, metadata formats, etc.

    RDA-A1-03M Metadata identifier resolves to a metadata record 
    The assessment is too ambiguous.

    RDA-I1-02D Data uses machine-understandable knowledge representation 
    The assesment should be different from the metadata one. Most of the data will probably not be available in  RDF, OWL, JSON-LD and SKOS formats.

     

    Kind redards,

  • Christophe Bahim's picture

    Author: Christophe Bahim

    Date: 11 May, 2020

    Hello Fernando, Thanks for your email. 

    • RDA-F2-01M Rich metadata is provided to allow discovery. We agree with you that quality is not about quantity. Nevertheless, the indicators were developed by decomposing the FAIR principles and the latter do not refer to quality.  If quality was to be mentioned, as presented by this article, it should be in any of the indicators covering the R1.3 sub-principle; "Several disciplinary communities have defined Minimal Information Standards describing most often the minimal set of metadata items required to assess the quality of the data acquisition and processing and to facilitate reproducibility"
    • RDA-A1-03M Metadata identifier resolves to a metadata record. What would be your proposition to improve the assessment details? 
    • RDA-I1-02D Data uses machine-understandable knowledge representation. The assessment details will be rephrased by referring to, for instance, community standards for data models.

    Happy to hear your thoughts. 

  • Dimitra Mavraki's picture

    Author: Dimitra Mavraki

    Date: 11 May, 2020

    Dear WG,

    firstly I would like to congratulate you on this interesting and useful work. 

    If I understand correctly from the analysis, on page 38, the first paragraph instead from"In this example, it shows that the evaluated resource does not reach a minimum level of FAIRness for Accessible and Reusable. " I think it should be written, "does reach a minimum level". 

    kind regards

  • Christophe Bahim's picture

    Author: Christophe Bahim

    Date: 12 May, 2020

    Thanks for your message Dimitra. I understand your point of view.  Yet, in the example the resource does not reach the minimum level of FAIRness, what about then rephrasing "In this example, it shows that the evaluated resource does not reach level 1, the minimum level of FAIRness for Accessible and Reusable."?

    Best, 

  • Dimitra Mavraki's picture

    Author: Dimitra Mavraki

    Date: 12 May, 2020

    yes, I agree with you that this is more accurate. It could also read as follows:" ...the evaluated resource scores low but does not reach level 1, the minimum level of FAIRness for Accessible and Reusable". 

  • Christophe Bahim's picture

    Author: Christophe Bahim

    Date: 13 May, 2020

    Yes, thanks for the proposition

  • Karsten Kryger Hansen's picture

    Author: Karsten Kryger ...

    Date: 12 May, 2020

    Hi,

    I was wondering about the wording in the introduction 1.1 "Unfortunately, they often lead to diverse interpretations and ambiguity". I know that this document is about harmonising, but I think the strength of the FAIR principles has been the ability and flexibility in adoption. And now there is a need for a higher level of confirmity.

    For RDA-A1-01M I would suggest adding a ", if evaluator is elibible." at the end of the assessment details. Often evaluators might not have legitimate access to e.g. personal sensitive data in a repository. Or is this access considered to be implicit as part of being able to conduct the evaluation? My concern is that some repositories and dataset might be considered un-FAIR by external evaluators, if they do not provide access to data. And in many cases you will need e.g. an non-machine-signable agreement.

  • Christophe Bahim's picture

    Author: Christophe Bahim

    Date: 13 May, 2020

    Thanks Karsten. 

    • I understand your point, what about rephrasing to "This means that they may lead to diverse interpretations and ambiguity"
    • Concerning RDA-A1-02M, it is implicit that the evaluator should have access to the resource he/she is evaluating, we will clarify the assessment details. 

  • Margie Smith's picture

    Author: Margie Smith

    Date: 12 May, 2020

    Hi there,

    I would like to congratulate the working group on this comprehensive body of work. 

    Thinking of issues that we are coming across in our catalogue could you relay any discussion the group have had around the following two cases involving the Essential indicator RDA-A1-03D (which is very similar to RDA-A1-05D Data can be accessed automatically (i.e. by a computer program) Which is only 'important'?)

    The (RDA-A1-03D) data identifier resolving to a digital object A is suddenly found to contain an error in the data object. The data object is updated to version B, assigned a resolvable persistent identifier (for the new version) and made available through the metadata record.

    My thought on this is that there would be an expectation that the data identifier to digital object A should change to resolve to a metadata page (rather than the digital object) communicating the error found and exposing the corrected data object (B) identifier but I am not sure.

    There is also an issue where two (or more) versions of a digital object exist but do not supecede each other (eg Images at different resolutions). If a User only finds one version's resolvable identifier through a generic web search, how will the User ever know that other versions exists that may be better suited for their purposes? Would it not be better if initially the data id resolved to a metadata landing page which would indicate all available versions/delivery formats?

    Apologies for probably going over old ground on this. I would reiterate that I find the content understandable, comprehensive and the tool to test compliance is interesting. I have no other major concerns at this stage.

     

     

  • Christophe Bahim's picture

    Author: Christophe Bahim

    Date: 03 Jun, 2020

    Margie, thanks for your comment

    • Priorities have been defined and validated by voting majorities, they are consequently not subject to change (at least in this iteration). Looking at the priorities of indicators according to the different communities is definitly something we plan to do in the maintenance phase. 
    • This is more something about versioning policies, for example in a DMP, than about evaluating FAIRness. In other words, versioning is not tackled by the FAIR principles and thus neither by the FDMM. 
    • On the issue of what the identifier should resolve to, the FAIR principles are very clear: the data identifier should give access to the data and the metadata identifier should give access to the metadata.

  • Nora Dörrenbächer's picture

    Author: Nora Dörrenbächer

    Date: 13 May, 2020

    Thanks for all the hard work that went into this document. Below a small comment from the perspective of the Social Sciences where sensitive data is a critical issue.

    Concerning R1.1. we would like to suggest that a differentiation between data and meta data is made. So far the recommendations seem to focus on the meta data. For data that can be disseminated without restrictions the use of (e.g. CC) Licences may be recommended. For all other data we recommend not to use licence models for the data use because available licences lack regulations regarding restrictions in data use stemming from privacy or confidentiality. The terms of data use are specified in the usage contracts.

  • Christophe Bahim's picture

    Author: Christophe Bahim

    Date: 03 Jun, 2020

    Nora, thanks for your comment. 

    • R1.1 FDMM indicators are about (meta)data; they require the presence of data licences within the metadata. 
    • As for your comment on other data and your recommendation due to missing regulation restrictions, there are mentions of local licences which can be used for the situation described. 

  • Sarah Jones's picture

    Author: Sarah Jones

    Date: 13 May, 2020

    Congratulations on very comprehensive work. It’s really impressive and very useful from the EOSC FAIR Working Group perspective.

    I agree with John Brown’s concerns about having two persistent identifiers set as essential – one for the data and one for the metadata. This is not typical of practice and is something we would find very hard to require under EOSC. Indeed, the EOSC Persistent Identifier policy says “Multiple PIDs may identify any given entity and users should be able to use whichever they are most comfortable with.” It should be an either/or in my opinion, as we have to allow the principles to be adapted to community practice. That flexibility is the strength of the principles.

    My main concern is the version of the FAIR principles you are basing the metrics on. You note them as defined by GO-FAIR, but this organisation wasn’t established when the FAIR principles were conceived. In the EOSC FAIR Working Group we explicitly reference the FORCE 11 version as this was the first published, endorsed result of what came out of the Lorentz workshop in 2014. See: https://www.force11.org/group/fairgroup/fairprinciples

    There have been several publications such as the Science Data paper and the DTL “FAIR data principles explained” webpages, which are now replicated on the GO-FAIR website. These are incredibly helpful for the clarifications they offer, but they also incorporate slight changes to the wording of the principles. While these are very minor, there’s a fundamental problem in not having a transparent and community governed process to adjust and agree interpretations of the FAIR principles when so many policies and initiatives like EOSC are reliant on them. 

    I would strongly recommend basing metrics off the original published version of the principles as a standalone reference and not one specific interpretation of them.

  • Christophe Bahim's picture

    Author: Christophe Bahim

    Date: 03 Jun, 2020

    Sarah, thanks for your comment. 

    • It was implied by this article and later confirmed by the WG that both metadata and data should be associated with a PID. Nonetheless, I agree with you that in practice things are different. Therefore, a section about the implementation of the FAIR data maturity model will be spelled out. 
    • You are right. We did not solely base ourselves on GO FAIR but referenced them as they were providing a lot of examples to illustrate the FAIR principles. The references have been removed and we have clarified in the introduction which publications were used as baseline. 

  • Francoise Genova's picture

    Author: Francoise Genova

    Date: 13 May, 2020

    Thank you very much for the huge and difficult work, and for the care you took to involve the community and gather its feedback.

    I have a few comments at different levels.

    . I agree with Sarah that this document, which will be used as a reference, should use the FORCE11 version as a reference for the wording of the principles. There is a need to define how the FAIR principles themselves are maintained, now that they are an asset for the whole community, well beyond the original authors.

    . The final distribution of priorities per FAIR area shown in Table 2 is very interesting. As a result of the feedback, I is a second class citizen with no essential parameters. I am not surprised by the result because I is not really addressed by many communities yet, but for some it is essential. At this stage, this means that in the pass-or-fail measurement, I one can reach level 1 without any interoperability capacity, but one needs all the F ones, 2/3 of A and ½ of R. I am not suggesting to add essential parameters for I, but I just point at that there are issues with using the proposed pass-and-fail method which gives very different weights to the F, A, I and R areas.

    . One of the findings of the tests is likely that FAIR practices can vary a lot from one community to another, eventually because their requirements can be different. I think that it would be useful to say it explicitly. This is alluded to in the Future Maintenance section which states that the RDA Maintenance WG should interact with research communities. One of the points is that feedback from implementation, and possibly adverse consequences from the model, will likely differ from one community to another.

    Minor comment: in RDA-A1-01M (ii) and (iii) are not fully clear when data is open with no access condition (maybe because (i), (ii) and (iii) are linked by ‘and’).

    Thanks again for this very useful work!

  • Christophe Bahim's picture

    Author: Christophe Bahim

    Date: 02 Jun, 2020

    Françoise, thanks for you comment. 

    • See my response to Sarah; "You are right. We did not solely base ourselves on GO FAIR but referenced them as they were providing a lot of examples to illustrate the FAIR principles. The references have been removed and we have clarified in the introduction which publications were used as baseline."
    • Good observation. It seems that Interoperabily is something somewhat complex which is not fully understood / implemented by communities. Therefore, it introduces a bias in the evaluation method. But, it is important to remind that the evaluation method is something rather descriptive than prescriptive. Besides, dwelling on your observation the weight of each FAIR area is not the same. For starters, all FAIR areas have a different number of indicators, which introduce a bias. Furthermore, these FAIR areas do not have the same proportions of indicators ranked essential, important or useful. In conclusion, we will make sure to spell out these observation in the document. 
    • That is true, FAIR practices do vary from one community to another. This will be clearly stated in the implementation section we intend to write out. Besides, it will be mentioned that the maintenance working group will be tasked to gather further evidence of usage of the model and interact with research communities to incorporate further requirements towards future versions of the model.
    • We will clarify that. 

  • Daniel Heydebreck's picture

    Author: Daniel Heydebreck

    Date: 13 May, 2020

    Thanks for the great and labor-intensitive creation of this document. It seems to be very comprehensive.

    A general question: We have a data file that contains measurement values, units of the measurements and information on what was measured (e.g. temperature). Are these three elemets considered as data? Or are "unit" and "what was measured" metadata stored in the data file?

    This question came up, when I though about indicator RDA-I3-01D "Data includes references to other data". I would consider references within data files as metadata. From my perspective, there are two levels of metadata: metadata outside of data files (provided on landing page; for harvesting via OAIS; ...) and metadata inside the data files. Therefore the previous question.

    Another question/remark on RDA-I3-01M and RDA-I3-01D: It is expected that metadata reference metadata and data reference data. What about "Metadata includes references to other metadata or data" (RDA-I3-01M) and @Data includes references to other data and metadata" (RDA-I3-01D) instead?

  • Christophe Bahim's picture

    Author: Christophe Bahim

    Date: 02 Jun, 2020

    Daniel, thanks for your comment. 

    • First, is important to note that this document has not the vocation to define all terms and especially define what is (meta)data and what is not. To my knowledge, if this information is in the data file, it is data. That is true, there are definitly multiple levels of metadata. 
    • We do have the two indicators you are proposing, namely RDA-I3-02M and RDA-I3-02D

  • Yan Grange's picture

    Author: Yan Grange

    Date: 13 May, 2020

    Here are the comments from Hanno Holties and myself. We represent ASTRON (the Dutch institute for Radio Astronomy) that operates radio telescopes (LOFAR and APERTIF). Great document in general. We had the following comments and proposals for improvement.

        - The assessment of several indicators depends on whether metadata also includes genral documentation which may contains details on data access, licensing, etc.
        - There is a very consequent split between data and metadata. They should have a persistant and globally unique identifier. Can the identifier of both be the same (or being daisy chained by the metadata directly pointing to the data by e.g. URL)?
        - Should there be a principle that states the data has been unchanged since recording/storage (think about bit rot, or other data corruption)? Basically a quality parameter or a checksum?
        - Data deletion or corruption is a bit ambiguously handled. For example RDA-A1-03D and RDA-A2-01M are both essential while the first one can not exist if data has been deleted.
        
        - RDA-F2-01M: this is essential. The assessment details are very light (check if there is any metadata at all) if there is no policy in the repository.
        - RDA-A1-01M: This should be essential since data access is one of the main points. Also RDA-A1-02D depends on this indicator and that one is essential so this one should as well. 
        - A1-04M/D: The important point should be that a protocol is documented. We would propose to make a "documented" protocol essential and make a standardised protocol important (in the same way as RDA-R1.1-01M and RDA-R1.1-02M)
        - RDA-A1.2-02D: This indicator is linked to a principle that explicitly states "where necessary". It is a bit unclear what the status of this one is when irrelevant. Why did this addition drop off? If included the importance could also change to important. 
        - RDA-I1-01M/D: See comment on A1-04M/D on using "documented" as essential and "standardised" as important.
        - RDA-I1-02M/D: Machine understandable is a bit vague here
        - RDA-I3-01M/D and RDA-I3-02M/D: Propose to add "if applicable"
        - I3-03M and I3-04M: Seems like this could be merged with counterparts in I3-01M ans I3-02M
        - R1-01M: We had a hard time parsing what this indicator means, by the word "Plurality". Maybe using a different wording could help. 
        - RDA-R1.1-01M: Adding a sentence why this is essential may be helpful.
        - RDA-R1.2-01M: Isn't this already covered by rich metadata? Also: provenance seems very essential to us.
        - RDA-R1.3-01M/D seem to overlap with RDA-I1-01M/D
        - RDA-R1.3-02M This is the only indicator that involves machine-understandability that is essential. This seems a bit odd to us.

     

  • Christophe Bahim's picture

    Author: Christophe Bahim

    Date: 03 Jun, 2020

    Yan, thanks for your comment. 

    • The indicators and their assessment details have an instructive purpose, but it is agreed that their implemenation and interpretation may vary from one community to another.
    • The FAIR principles advise to associate a separate PID to metadata and data. Besides, the WG validated this proceeding.
    • It could be, but the FDMM Working Group has not the purpose of revising and re-designing the FAIR principles
    • Data deletion and data corruption concern operational issues of the DM, which is beyond the scope of this Working Group
    • As for your comments concerning specific indicators, they have been taken into account and put into motion if deemed appropriate. One general comment though; priorities have been defined and validated by voting majorities, they are consequently not subject to change (at least in this iteration). Additionally, the authors of the FAIR principles claimed that the latter do not overlap, and so do not the indicators.  

  • Erzsebet Toth-Czifra's picture

    Author: Erzsebet Toth-Czifra

    Date: 13 May, 2020

    Let me also congratulate to the WG on this impressive body of work, this set of guidelines is already a significant contribution to the successful implementation of the FAIR data mandates across disciplines, very clear and easy to follow publication that eliminates lots of confusion and makes some of the difficult-to-grasp issues concrete.

    The following comments are contributions from DARIAH-EU (authors: Laurent Romary and me), mainly subtle and humble recommendations.

    • ‘RDA-I2-01D Data uses FAIR-compliant vocabularies ⬤ Useful’: Although ‘FAIR compliant’ is explained at a further part of the document, there are still chances that the term would be confusing for some.  Would it be possible to replace it to : ‘well-documented and maintained vocabularies based on community standards’?

     

    • ‘RDA-I1-02D Data uses machine-understandable knowledge representation ⬤⬤ Important’ - We may expect that much of data coming from humanities fields, especially from outside of Digital Humanities, will not be expressed in a machine understandable knowledge representation (RDF, SKOS or LOD) by nature but instead, it is often expressed in natural language, even if encoded using machine readable methods (e.g. TEI). Therefore, we suggest downgrading this indicator from important to useful.

     

    • ‘R1 RDA-R1-01M Plurality of accurate and relevant attributes are provided to allow reuse ⬤⬤⬤ Essential’ - It may be worth to complement this indicator by: adding all the necessary information (e.g. software specification) that is essential for the execution, processing and analysis of the data is a must. 

     

    • ‘R1.1 RDA-R1.1-02M Metadata refers to a standard reuse licence ⬤⬤ Important’ - It might be worth highlighting here that not all kinds of data are compliant with the Creative Commons framework. As an alternative from the humanities/cultural heritage sector, the Europeana licensing framework could be mentioned as a good example (including Rights Statements). 

     

    • ‘R1.2 RDA-R1.2-01M Metadata includes provenance information according to community-specific standards ⬤⬤ Important’ - This indicator is especially important to the arts and humanities communities. A good practice that could be added as an example is the TEI <sourceDesc> element which is even a mandatory one in a <teiHeader>: https://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-sourceDesc.html it may contain the description of one or several sources from which the current document has been derived. 

    Comments on the implementation of the indicators as metrics

     

    • Although we are aware that the implementation of the metrics across the different evaluator groups (funders, institutional data support staff etc.) is out of the scope and control of the RDA FAIR Data Maturity Model WG, it is really great that the WG provides support for the evaluation methods in chapter 5. I am convinced that one cannot highlight enough the importance of the responsible and careful implementation of these metrics to minimize unintended consequences. Certain indicators may be less important or even irrelevant to certain, less data-intensive disciplinary communities, still, it is essential that different scholarly fields have equal chances to comply with the FAIR criteria. I suggest laying down these principles and inclusivity measures in chapter 5 to provide guidance to evaluators and pave the way of responsible implementation. 

     

    • It might also be worth to note in chapter 5 that if these criteria are checked at the publication or project evaluation phases only, in the majority of cases it is already too late to improve the FAIRness of datasets. Reflecting on these indicators in DMPs, then in the middle phase of the project and finally at the point of publication could make sure that scholars are guided through the process and room for improvement is ensured. 

    • It would be really valuable to call for implementation use cases from across different disciplines and evaluate the metrics in the light of them.

    Thanks a lot for following this longer thread.

    All best,

    Erzsébet

  • Christophe Bahim's picture

    Author: Christophe Bahim

    Date: 03 Jun, 2020

    Erzebet, thanks for your comment. 

    • Concerning your comments about the indicators, they have been taken into account and put into motion if deemed appropriate. It is worthy to note that describing the indicators was a top-down exercise; we looked at the common denominator for all the communities. In a next phase, we will surely dive into all the views and the implications of the communities.
    • Spot on, the implementation of the indicators is out of scope and control of the FAIR data maturity model. Nevertheless, the intention is to add a section which will deal with the implementation of the indicators and cover the elements you just mentioned. 
    • Yes, we have added the following "The model may be used during the development of Research Data Management Plans before any data and metadata  have been produced to specify the level of FAIRness that the resources are expected to achieve. It can also be used after the production of data resources to test what the achieved level of the resources is. Data producers, i.e. researchers, and data publishers can use the model to determine where their practices could be improved to achieve a higher level of FAIRness, while project managers and funding agencies can use the model to determine whether the resources achieve a pre-defined, expected level of FAIRness."
    • Indeed, the plan for the maintenance phase is to call for implementation use cases from across different communities. 

  • Derek Scuffell's picture

    Author: Derek Scuffell

    Date: 17 Dec, 2020

    First off can I express, how wonderful this work is. The group has taken on a tough and gnarly problem and delivered a useful tool. - I am grateful. My experience in large organizations is that Interoperability is an indicator that teams have less appetite for and it frequently (almost always) gets ignored. And, because it is ignored it leads to technical debt making it harder and harder for the organization to exploit their data. When I look at the indicators for Interoperability, I was surprised to see that none of them has an Essential priority. This implies that a dataset need not meet Interoperability indicators, yet still, be considered FAIR.

    For me, one of the most common causes of failure for interoperability is when data is not described using meta-data.   Indicators I1 have a got at this, but are caveated with how these meta-data are expressed. Is there an indicator that serves interoperability in the same was as A1 does for Accessibility. Something along the lines of "Metadata contains information to enable the user/agent to determine what the data represents". Something like a meaningful column-header in a spreadsheet would be an example of meeting such an indicator.

     

    Cheers Derek

     

submit a comment