RDA Common Descriptive Attributes of Research Data Repositories

    You are here

31
Jan
2024

RDA Common Descriptive Attributes of Research Data Repositories

By Bridget Walker


 Data Repository Attributes WG

Group co-chairs: Matthew Cannon, Allyson Lister, Washington Luís Ribeiro de Segundo, Kathleen Shearer, Michael Witt, Kazu Yamaji

Recommendation Title: RDA Common Descriptive Attributes of Research Data Repositories

Authors: Michael Witt, Matthew Cannon, Allyson Lister, Washington Segundo, Kathleen Shearer, Kazu Yamaji and the Research Data Alliance Data Repository Attributes Working Group

Impact

A complete and current description of a research data repository is important to help a user find the repository; to learn the repository’s purpose, policies, functionality and other characteristics; and to evaluate the fitness for their use of the repository and the data that it stewards. Many repositories do not provide adequate descriptions in their websites, structured metadata and documentation, which can make this challenging. Even fewer make this information available in a machine-readable or actionable manner, which hampers interoperability. Descriptive attributes may be expressed and exposed in different ways, making it difficult to compare repositories and to integrate repositories with other infrastructures such as registries. They can be difficult to navigate and find: they may be locked behind authentication, obscured within workflows, or buried in myriad documentation and web pages.

Well-described data repositories provide value and impact to a broad set of stakeholders such as researchers, repository managers, repository developers, publishers, funders, registries and others to discover and utilize data repositories. Some motivating use cases for the development of these attributes include:

  • As a researcher, I would like to be able to discover repositories where I can deposit my data based on attributes that are important to me.
  • As a repository manager, I would like to know what attributes are important for me to provide to users in order to advertise my repository, its services, and its data collections.
  • As a repository developer, I would like to understand how to express and serialize these attributes as structured metadata for reuse by users and user agents in a manner that can be integrated into the functionality of my repository software platform.
  • As a publisher, I would like to inform journal editors and authors of what repositories are appropriate to deposit their datasets that are associated with manuscripts that are being submitted.
  • As a funder, I would like to be able to recommend and monitor data repositories to be utilized in conjunction with public access plans and data management plans for the research that I am sponsoring.
  • As a registry, I would like to be able to easily identify and index attributes of data repositories to help users find the best repository for their purpose.

Approach:

After the approval of its case statement, the Data Repository Attributes Working Group (DRAWG) conducted its work over the course of 3 RDA plenaries and 37 meetings of its co-chairs and working group between February 2022 and March 2024. The 146 members of the group broadly represented a spectrum of stakeholders engaged in different capacities with data repositories from a variety of roles, disciplines, and countries. Meetings were prepared each month by the co-chairs to facilitate scoped discussions that actively engaged members in an iterative, consensus-building process to design and populate the attributes that included creating user stories; engaging stakeholders in focus groups; reviewing published literature and guidance from funders, publishers, and scholarly societies; inviting expert critiques from outside of the group; trial implementations; and community review. The RDA Common Descriptive Attributes of Research Data Repositories represents a synthesis of these intensive discussions that is intended to be practical and useful for those who build, manage, use, track, and interoperate with data repositories through both human and machine interfaces.

Recommendation package DOI: 10.15497/rda00103

Citation: Witt, M., Cannon, M., Lister, A., Segundo, W., Shearer, K., Yamaji, K., & Research Data Alliance Data Repository Attributes Working Group. (2024). RDA Common Descriptive Attributes of Research Data Repositories (Version 1.0). Research Data Alliance. https://doi.org/10.15497/RDA00103

 

Abstract

The RDA Common Descriptive Attributes of Research Data Repositories outlines a list of common, concise, high-level descriptors that represent information that can be useful in describing a research data repository along with examples of how each attribute can be expressed as metadata in different schemata, limitations and potential complications that each might pose in harmonization, a brief rationale for why each attribute is important and a gap analysis of how easy or difficult it may currently be to locate this information from a data repository. The attributes are conceptual with examples provided for illustration purposes without endorsement of any particular approach, standard or implementation.

 

UN Sustainable Development Goals

While this recommendation does not support one or more of the specific SDGs, well-described, interoperable, well-promoted data repositories support research that can advance any of the goals in a general manner.

Output Status: 
Recommendations with RDA Endorsement in Process
Review period start: 
Thursday, 1 February, 2024 to Friday, 1 March, 2024
Group content visibility: 
Use group defaults
Primary WG Focus / Output focus: 
Domain Agnostic: 
Domain Agnostic
  • Beth Plale's picture

    Author: Beth Plale

    Date: 01 Feb, 2024

    Excellent work.  it would be helpful for the RDA-US region to include a paragraph or so on how this work complements/supports the OSTP Desirable Characteristics 2022 document.  Its complement/connection to DataCite should be addressed too (as was asked in OAB meeting).  This could be accomplished more generally through a new section titled "setting guidance in context" and doing same brief connecting the dots between this doc and known significant influencing contexts in other regions.

     

    https://www.whitehouse.gov/wp-content/uploads/2022/05/05-2022-Desirable-...

  • Michael Witt's picture

    Author: Michael Witt

    Date: 08 Mar, 2024

    Hi Beth, we'd like to keep the recommendation from being specific to the requirements of a particular country or set of funders, but I'm looking forward to continuing our Slack thread on contextualizing and applying it locally through RDA-US, which is important work!

  • Yuri Carrer's picture

    Author: Yuri Carrer

    Date: 02 Feb, 2024

    Hi!

     in the context of the the project https://www.rd-alliance.org/fair-enabling-citation-model-cultural-heritage-objects we published in our zenodo community https://zenodo.org/communities/fair-cho this deliverable:

    A2 Case Studies Examination https://doi.org/10.5281/zenodo.8215438 (Page 10)

    where we explore the use of  “Citation Guideline URL” (re3data:citationGuidelineUrl) metadata field in re3data.org repositories.

    Also available:

    A2.2a Digital repositories data citation practices. Supplementary material https://doi.org/10.5281/zenodo.8188806

  • Allyson Lister's picture

    Author: Allyson Lister

    Date: 22 Feb, 2024

    Dear Yuri,

    Thank you very much for your comments! Indeed, the FAIR CHO work seems really interesting.  While the work seems to primarily around the attributes for data citation at a dataset level (and the DRA WG instead at the repository level), it makes sense that we're aware of each other's efforts and see how we can help each other.

     

    Separately to the DRA WG, please let me know if you'd like to know more about how FAIRsharing might help your efforts; it may even be that your group would like to provide a FAIRsharing Community Champion (https://fairsharing.org/community_champions) to ensure closer collaboration and feedback. For instance, did you know that FAIRsharing has a record for the FORCE11 Data Citation Principles (https://doi.org/10.25504/FAIRsharing.9hynwc) and there is a relationship graph of the policies and other resources that utilise it? We also align with the DRA WG attributes list and provide many searchable metadata fields for database attributes and conditions (https://fairsharing.gitbook.io/fairsharing/additional-information/databa..., which can be filtered via our Assistant at https://assist.fairsharing.org/) that should be helpful to you.

  • Dorothea Strecker's picture

    Author: Dorothea Strecker

    Date: 02 Feb, 2024

    Thank you for compiling this excellent draft!

    One point to consider might be the gap analysis categorization for 'certification'. This piece of research shows that out of a sample of research data repositories, the majority provided certification information on their websites:

    Donaldson, D. R. (2020). Certification information on trustworthy digital repository websites: A content analysis. PLOS ONE, 15(12), e0242525. https://doi.org/10.1371/journal.pone.0242525

  • Michael Witt's picture

    Author: Michael Witt

    Date: 08 Mar, 2024

    Hi Doro, we corresponded by email but in case others are interested, the gap analysis drew on the expertise and intensive rounds of discussion from the working group based on general experience with data repositories writ large. The scope of Devan's paper is certified repositories and how they represent their certification. For a general population of repositories, I believe this information was found to not be present because the majority of them are not certified.  Thank you for the comment and for your many contributions as a member of the group.

  • Mingfang Wu's picture

    Author: Mingfang Wu

    Date: 08 Feb, 2024

    Excellent work!

    It will be good if there is a short explantion on how the gap analysis is derived, for example, is a gap scale based on the analysis of 100 repositories?

  • Michael Witt's picture

    Author: Michael Witt

    Date: 28 Feb, 2024

    Thank you for the input, Mingfang. The co-chairs discussed it, and we've added a section titled Approach so that people can understand our process, at least at a high level. It is also encapsulated in the artifacts of all of our meetings, but it would be helpful to surface and summarize with the output. Hopefully this can address your comment and others by Francoise and Claduia. The feedback is appreciated!

  • Francoise Genova's picture

    Author: Francoise Genova

    Date: 13 Feb, 2024

    Thanks for putting together this excellent work. It might be useful to provide in Annex a short explanation on how the WG reached the result (information used as input in addition to the use cases, methodology).

  • Michael Witt's picture

    Author: Michael Witt

    Date: 08 Mar, 2024

    Thank you Francoise, both for the comment and for your many contributions to the group's work! We added a section titled Approach to give a view of the group's process and methods.

  • Claudia Bauzer Medeiros's picture

    Author: Claudia Bauzer ...

    Date: 13 Feb, 2024

    Very good and useful work. I would have liked to see an e.g. 3 paragraphs to 2 pages explaining the methodology - how did you decide on which attributes to examine and recommend, which repositories did you analyze?

     

    Also, I noticed that many of the recommended attributes are used to describe re3data repositories. I would also have liked to see a table describing which of your recommended attributes are also re3data attributes, and which are not. In other words, if I want to register "my" repository with re3data, I need to fill a form containing many of the attributes you recommend = they are the metadata that re3data requires for cataloguing a repository. Many of these attributes coincide with your recommendations. Which ones do not, and vice-versa?

  • Michael Witt's picture

    Author: Michael Witt

    Date: 08 Mar, 2024

    Hi Claudia, a few people made a similar suggestion, so we added a section to the description of the Recommendation titled Approach to give a view of our process and methods, at least at a high level. We didn't want to burder the Recommendation itself with too much of an academic treatment, because it is geared towards being concise for practice. We are working on a companion paper for publication to give deeper context, lit review, etc. re3data has done a crosswalk of its schema to the attributes that we'll incorporate into the documentation on its website. Thanks again for this useful feedback.

  • Christin  Henzen's picture

    Author: Christin Henzen

    Date: 21 Feb, 2024

    Thanks for preparing an excellent draft. From discussions within our project (nfdi4earth.de), we would be interested in the spatial extent that is covered by the repositories' data. In our use cases, users often look for data with a specific spatial extent, and they would not visit a repository' webpage if they could find out that the repository only provides data for other regions. Have you already considered/discussed this in preparation for the draft?
    Moreover, we see the necessity to provide structured and _machine-readable_ information on preservation, e.g., how long will data be archived? 10y.' By now, there is no option to filter for "long-term archives" within re3data. Would it be worth sharpening the preservation concerning machine-readability?

    As previously mentioned, it would be interesting to learn more about the applied methods/approach for the gap analysis and reuse it for specific disciplinary repository descriptions.

  • Matthew Cannon's picture

    Author: Matthew Cannon

    Date: 07 Mar, 2024

    Hi Christin, 

    Thanks for reviewing and commenting on our draft. In the discussions we spent a lot of time considering features that would be repository specifc vs dataset specific. Some of this information you mention may be dataset specific, but if there were geographical limitations on the kinds of data stored by a specific repository we would expect that information would be covered by either the Country, Organisation or Description fields - attributes 3, 5 or 7.

    We agree that repositories surfacing this information should be looking to do so in a machine readable way, to make this as useful as possible.
     

    Michael has responded to a similar comment from Mingfang Wu about clarifing the methodology followed, and the co-chairs will write this up and add a methods section to the description of the recommendation. 

     

    Thank you again for taking the time to review our work and share this feedback with us. 

     

    Matt - on behalf of the DRAWG co-chairs

  • Reyna Jenkyns's picture

    Author: Reyna Jenkyns

    Date: 27 Feb, 2024

    Great work on compiling this list of contextualized attributes for data repositories. As others have mentioned, it would be helpful to have a summary of the methodology used to gather and formalize the attributes. Additionally, proposed next steps and areas for development may be useful to demonstrate how this attributes may be actualized - adopters, schema updates, applications, etc. 

    A few attribute specific comments: 

    For #5 Organization, it will be important to clarify the roles of each organization, and so there should be some suggestion for what role vocabulary to utilize. 

    For #10 Machine Interoperability, the concepts of analysis-ready or AI-ready datasets might be something to consider, especially as the ways to express these notions mature. 

    For #15, Local Contexts labels (for Indigenous data) are another important example - these are not not technically a license, but they describe terms of use. 

    That leads me to wonder how a Local Contexts Open to Collaborate Notice (https://localcontexts.org/notice/open-to-collaborate/) could be represented within these attributes. Perhaps that would fit under Terms of Deposit?

    For #16, certified repositories by CoreTrustSeal is apparent at https://amt.coretrustseal.org/certificates (and also in the Re3Data records). But for certifications in general, I agree this is not simple determine. 

    Looking forward to seeing this used in action in support of the use cases, and to see what information we can extract about data repositories. 

  • Michael Witt's picture

    Author: Michael Witt

    Date: 08 Mar, 2024

    Hi Reyna, all of these points are excellent, thank you. One of our goals with the Recommendation was to encourage deeper discussion beyond the limits of what we could accomplish in the document (and keep it to a useful length and level of detail for practice in implementation). We're glad to be working with you at the World Data System and to have WDS on board as an early adopter!

submit a comment