Data Repository Attributes WG Case Statement

03 Dec 2021

Data Repository Attributes WG Case Statement

Case Statement: Data Repository Attributes Working Group

 

1. Charter

 

The Data Repository Attributes Working Group seeks to produce a list of common attributes that describe a research data repository and to provide examples of the current approaches that different data repositories are taking to express and expose these attributes.[1]

 

The working group will produce two documentary outputs over the course of 18 months and four Research Data Alliance (RDA) plenary meetings; they are:

 

  1. a list of common descriptive attributes of a data repository with
    1. a definition of each attribute,
    2. a rationale for the use and value of each attribute,
    3. the feasibility of its implementation,
    4. a gap analysis of its current availability from data repositories, and
  2. a selection of examples that illustrate the approaches currently being taken by repositories to express and expose these attributes to users and user agents.

 

The list of descriptive attributes of a research data repository will be submitted for review and endorsement to become an RDA Recommendation, and the selection of exemplars will be submitted for consideration as an RDA Supporting Output. This work is planned to take place over 18 months between January 1, 2022 and June 30, 2023.

 

2. Value Proposition

 

A complete and current description of a research data repository is important to help a user discover a repository; to understand the repository’s purpose, policies, functionality, and other characteristics; and to evaluate the fitness for their use of the repository and the data that it stewards. Many repositories do not provide adequate descriptions in their websites, structured metadata, and documentation, which can make this challenging. Descriptive attributes may be expressed and exposed in different ways, making it difficult to compare repositories and to enable interoperability among repositories and other infrastructures such as registries. Incomplete and proprietary repository descriptions present challenges for stakeholders such as researchers, repository managers, repository developers, publishers, funders, and registries to enable the discovery and comparison of data repositories. For example:

 

  • As a researcher, I would like to be able to generate a list of repositories to determine where I can deposit my data based on a query of descriptive attributes that are important to me.
  • As a repository manager, I would like to know what attributes are important for me to provide to users in order to advertise my repository, its services, and its data collections.
  • As a repository developer, I would like to know how to express and serialize these attributes as structured metadata for reuse by users and user agents in a manner that is integrated into the functionality of my repository software platform.
  • As a publisher, I would like to inform journal editors and authors of what repositories are appropriate to deposit their datasets that are associated with manuscripts that are being submitted.
  • As a funder, I would like to be able to recommend and monitor data repositories to be utilized in conjunction with public access plans and data management plans for the research that I am sponsoring.
  • As a registry, I would like to be able to easily harvest and index attributes of data repositories to help users find the best repository for their purpose.

 

While this is not an exhaustive list of stakeholders and potential use cases, the value of identifying and harmonizing a list of descriptive attributes of data repositories and highlighting current approaches being taken by repositories would help the community address these important challenges and move towards developing a standard for the description and interoperability of information about data repositories. The statements of interest below demonstrate that there is a significant interest in this work.

 

3. Engagement With Existing Work in the Area

 

Many sets of attributes have been identified by different initiatives with differing scopes and motivations.[2] These attributes have included information about data repositories such as terms of deposit, subject classifications, geographic coverage, API and protocol support, funding models, governance, preservation services and policies, openness of the underlying infrastructure, adherence to relevant standards and certifications, and more. The results of these efforts reflect the variety of stakeholders and the diversity of repository attributes of interest across different communities. The harmonization of a common set of repository attributes, accompanied by the rationale for these attributes, will provide the community with a clearer understanding of the needs and requirements of different communities, and this commonality can enable greater interoperability across repositories, registries, and other data infrastructures.

 

4. Work Plan

 

The proposed co-chairs of the working group have submitted a Birds of a Feather (BoF) session proposal for RDA P18 to engage stakeholders in further discussion around these issues and revision of this case statement, if necessary. It will meet monthly via Zoom throughout the 18 months with a rotation of co-chairs formulating and sharing agendas in advance and leading each meeting. All meetings will be open to the community and progress towards the two deliverables will be noted on the RDA wiki. Correspondence between meetings will take place using an RDA mailing list that will be archived and accessible to all RDA members. The working group will strive to achieve consensus in its decision-making through open and respectful discussion. All views will be recorded from the deliberations of the group (e.g., mailing list archive, wiki) for consideration and community review of its outputs.

 

The working group will refine its methods based on feedback from the review of this case statement and member input, but in general we anticipate taking these actions:

 

  1. Identify current standards and approaches to describing data repositories
  2. Define use cases/user stories for utilizing metadata about data repositories
  3. Draft a list of attributes and the rationale for their use/value
  4. Conduct focus groups to validate/refine list
  5. Perform an environmental scan to identify exemplars of different approaches
  6. Submit a list of data repository attributes as an RDA Recommendation for broader community input, review, revision, and adoption
  7. Submit a selection of exemplars as a RDA Supporting Output
  8. Outreach to present and promote the adoption of the outputs

 

Milestones will include:

 

  1. BoF session at RDA P18 and approval of case statement
  2. Monthly meetings commencing January 2022
  3. Identification of current descriptive approaches, use case definitions, and first draft of attributes (Action items 1-3) before RDA P19
  4. Environmental scan to identify exemplars, completion and submission of list of attributes as RDA Recommendation three months after RDA P20
  5. Revision of list based on community input and submission of exemplars as RDA Supporting Output by June 20, 2023
  6. After conclusion of working group, presentation of outputs and report of early adoption at RDA P21

 

Minimally, the working group will engage the Metadata Interest Group (IG), Domain Repositories IG, and Repository Platforms for Research IG, and we will explore joint sessions as needed. Other interest and working groups as well as stakeholders from outside of the RDA will also be welcomed and encouraged to participate.

 

5. Adoption Plan

 

Primary adopters:

 

  1. Repository managers
  2. Repository software developers
  3. Registries - e.g., FAIRsharing, OpenDOAR, re3data

 

Consultation and input from both the working group and the broader stakeholder community will be undertaken to identify

  1. the relevance of the descriptive attributes drafted as the first output of the group, and
  2. the feasibility of adopting these characteristics from the perspective of implementation.

 

Consultation on attribute relevance will allow an iterative process of development. Stakeholders will identify the important functionalities they need from repositories (such as facilitating the data peer review process for publishers and their authors, or integration with funder review processes); provide the rationale for why these characteristics are important for their community; and clearly articulate the aspects and functions needed to support their use cases.

 

This consultation will ensure that the final version of the list of descriptive attributes will both represent those repository attributes that are already in use as well as those which are of most relevance to our stakeholders, who will benefit directly from a harmonized, common list of attributes, and who will ultimately lead in their adoption and implementation.

 

6. Initial Membership

 

The working group will be led by co-chairs who represent international perspectives from a variety of stakeholders, including a variety of repositories, registries, publishers, and librarians.

 

Co-chairs:

  • Matthew Cannon, Taylor & Francis (UK)
  • Allyson Lister, FAIRsharing.org, University of Oxford (UK)
  • Washington Segundo, Instituto Brasileiro de Informação em Ciência e Tecnologia (Brazil)
  • Kathleen Shearer, Confederation of Open Access Repositories (Canada)
  • Michael Witt, re3data, Purdue University (USA)
  • Kazu Yamaji, National Informatics Institute (Japan)
Review period start: 
Friday, 3 December, 2021 to Monday, 10 January, 2022
  • Sarah Williams's picture

    Author: Sarah Williams

    Date: 09 Dec, 2021

    The Data Repository Attributes WG case statement clearly articulates the group's purpose, and it is an important effort with valuable outputs and many potential stakeholders.

  • Allyson Lister's picture

    Author: Allyson Lister

    Date: 04 Feb, 2022

    Hi Sarah!

    Thanks very much for your comment on the working group - I'm very happy to hear that you feel it is has a clear purpose and appropriate outputs. If you would like, please feel free to add yourself as a member of the WG, as more interested parties will make for a better result!

    All the best,

    Allyson

  • Jiban K. Pal's picture

    Author: Jiban K. Pal

    Date: 14 Dec, 2021

    There is no doubt, the Case Statement placed by the Data Repository Attributes WG would have a far-reaching impact on the development and control of Research Data Repositories worldwide. It will give incremental benefits to the repository managers and other stakeholders to make a fair decision toward implementing common standards. Indeed essential for improving data and metadata management with the provisions of FAIRER datasets - viz. Findable, Accessible, Interoperable, Reusable, Ethical, and Reproducible.

  • Ilona von Stein's picture

    Author: Ilona von Stein

    Date: 23 Dec, 2021

    The FAIRsFAIR project would like to thank the authors for the opportunity to respond to this Case Statement. Requirements and attributes related to data object, data services including data repositories, are at the heart of the FAIRsFAIR project. We much appreciate this proposal coming from a wide variety of key stakeholders (as reflected in the co-chairs). 

    Please find our FAIRsFAIR project Comments on the Case Statement and an annotated version of the Case Statement attached. See also here: https://doi.org/10.5281/zenodo.5798178 

     

  • Allyson Lister's picture

    Author: Allyson Lister

    Date: 05 Feb, 2022

    I'd like to thank you for the comments that you have provided on behalf of FAIRsFAIR. The co-chairs of the DRAWG will shortly be having an organisational chat to discuss meetings and timelines, and it would be great to work with you to discuss the points you've raised. I'm happy to say that the definitions of "data repository" and our users / user agents are something that we have discussed already, and your comments are helpful. Thank you also for providing information related to the work FAIRsFAIR has already done, as I'm sure this will be helpful.

  • Ilona von Stein's picture

    Author: Ilona von Stein

    Date: 23 Dec, 2021

    The FAIRsFAIR project would like to thank the authors for the opportunity to respond to this Case Statement. Requirements and attributes related to data object, data services including data repositories, are at the heart of the FAIRsFAIR project. We much appreciate this proposal coming from a wide variety of key stakeholders (as reflected in the co-chairs). 

    Please find our FAIRsFAIR project Comments on the Case Statement and an annotated version of the Case Statement attached. See also here: https://doi.org/10.5281/zenodo.5798178 

     

  • Lisa de Leeuw's picture

    Author: Lisa de Leeuw

    Date: 23 Dec, 2021

    The CoreTrustSeal Board would like to thank the authors for the opportunity to respond to this Case Statement.

    Please find our comments and an annotated version of the Case Statement attached. It can also be found on Zenodo: https://doi.org/10.5281/zenodo.5801501 

  • Keith Russell's picture

    Author: Keith Russell

    Date: 06 Jan, 2022

    First of all I would like to say that this reads as a really valuable piece of work and it is great to see the group of stakeholders involved in setting this up. The plan looks sound and I will definitely join this group to see the discussion and work towards the agreed attributes. As part of the existing attributes that are already out there I can recommend work that has already been done on the FAIR-enabling-ness of repositories and of course CoreTrustSeal, but I see that has already been suggested :-)

  • Leonidas Pispiringas's picture

    Author: Leonidas Pispiringas

    Date: 27 Jan, 2022

    Congratulations to the RDA for initiating the Data Repository Attributes WG. The recommendations of the descriptive attributes for the Data repositories are very important as they will further help and enhance the functionality and smooth diffusion of the latter to the scholarly community. 

    FAIR principles play a crucial role to the Data repositories too and should also be considered through the process. The WG should also consider the OpenAIRE (Open Access Infrastructure for Research in Europe) guidelines for Data Archives as long as for Software Repositories.

    The goal of OpenAIRE guidelines is to achieve compatibility among European repositories that will lead to a future interoperability between research infrastructures for the benefit of the scholarly community.

    OpenAIRE guidelines are going to constructively contribute to the recommendations of the Working Group.

    More information on all sets of OpenAIRE guidelines can be found here: https://guidelines.openaire.eu/en/latest/

     

  • Matthew Cannon's picture

    Author: Matthew Cannon

    Date: 17 Feb, 2022

    On behalf of all the co-chairs of this working group, I wanted to thank those who have provided comments or in some cases quite significant feedback on our case statement. 

    The co-chairs met today, and have agreed a series of monthly meetings which will start in March 2022, with a clear plan of work to accomplish in the run up to P19. We welcome all who are interested to join the group to contribute to the monthly meetings and help us achieve the outputs we set out in the case statement. Emails will be going out to those who have joined the working group about the meeting series very shortly.

    We look forward to seeing you and beginning this exciting work very soon.

submit a comment