RDA-OfR Mapping the landscape of digital research tools WG Case Statement
1. Overview
The digital research data infrastructure landscape comprises a myriad of tools for managing and sharing research data during various stages of the research data lifecycle. Such research tools vary widely depending on data type, user requirement, provider, and subject area. In the context of this WG, research ‘tools enable researchers to perform one or more operations, typically on data, and often with data as the output. Tools are usually intended for use by humans. In this context we are explicitly excluding physical instruments.’
The diversity and variety of research tools can prove overwhelming and challenging for stakeholders working within the digital research data ecosystem to understand, navigate, and select the most appropriate tool to meet their needs and objectives. The categorisation of research tools, based on their features, functionalities and how they interoperate, remains unclear. In many cases, research tools are not interoperable, often leading to siloed working within organisations and disciplines, thereby limiting the scope of research and the ability to share and reuse data.
This RDA Working Group (WG), supported by Oracle for Research (OfR), aims to address these challenges by: (i) categorising different types of research tools; and, (ii) mapping different types of research tools to the research data lifecycle based on their features and functionalities.
The WG will produce a categorisation schema (a conceptual framework) of research tool types that includes terminology, definitions and associated metadata describing features and functionalities of different tool types. The categorisation schema will be stored in an autonomous database provided by Oracle Cloud Infrastructure. The WG will undertake the following programme of work to achieve its deliverables:
The creation of a research data lifecycle model and crosswalk to existing models (Deliverable 1)
The WG will examine and identify the different stages of the research data lifecycle. Since numerous different models of the research data lifecycle exist that have been conceptualised for specific research paradigms and audiences, the WG will conduct a landscape review to research and consult existing models (see Section 3) and identify common stages of the research data lifecycle for use as the framework to guide the research tool categorisation. Each research data lifecycle stage will be supported by terminology and definitions. The WG will create a crosswalk to demonstrate connections between the chosen model and existing models.
The identification, categorisation, and mapping of different types of research tools: A categorisation schema (Deliverable 2)
The WG will research and consult existing work in the area to identify, categorise, and map different types of research tools. Such tools may include, but are not limited to: open science frameworks, data management planning tools, electronic laboratory notebooks (ELNs), laboratory information management systems (LIMS), virtual research environments (VREs), databases, repositories, and archives. Types of research tools will be described, categorised, and mapped to the research data lifecycle framework based on their utility, and assessed based on their interoperability.
The aim of this deliverable is to highlight the potential for and current limitations of streamlined flow of research data and metadata throughout the research data lifecycle based on how different types of research tools interoperate. This will be highly valuable in the context of the development of the national and international open research commons.
This work will contribute to and build on the work of the RDA’s Global Open Research Commons IG and GORC International Model WG. Task Group 5 of the GORC International WG has undertaken an extensive literature review and released a Commons Attributes Model (Version 0.5) that identifies a suite of services and tools that will inform the work of this WG. Efforts to describe the features, functionality, and interoperability of different types of research tools will complement the development of the ‘Commons Integration Roadmap’ (GORC WG Deliverable) by providing key information about different types of research tools, and highlighting areas for the improvement of their interoperability and user experience.
The creation of a preliminary structural framework for an online open access ‘map of the digital research tool landscape’ (Deliverable 3)
The WG will undertake the necessary foundational work required to create an autonomous relational database that is hosted by the RDA Foundation (as a legal entity on behalf of the RDA), owned by the community, and powered by Oracle for Software. This arrangement has been discussed and agreed by RDA and Oracle for Research.
The open access database, navigable by research data lifecycle stage, will: (i) contain searchable information (e.g., features, functionalities, interoperability) about different types of research tools; and: (ii) allow for ongoing community curation and further development. The WG will provide recommendations for the long-term maintenance, sustainability, and adoption of the database to ensure that it remains current, relevant, and useful for the research data community. Such recommendations will also propose methodologies for future community-curation (detailing who can contribute and how), management, and governance of the database.
The ultimate goal is to provide the research data community with a dynamic resource that remains up to date with newly emerging types of research tools and evolves with the ever-changing digital research data infrastructure landscape. This may include significant data and software-related developments, e.g., Artificial Intelligence (AI).
2. Value Proposition
To our knowledge, this RDA WG is the first initiative of its kind to categorise different types of research tools with a primary focus on their utility and interoperability within the research data lifecycle. Providing the global research data community with a high-level map of the digital research data tool landscape that can be navigated according to specific data management and sharing related tasks represents a novel approach to characterising the research data ecosystem. The outputs and recommendations produced by this WG aim to provide value and impact for the following adopters:
Adopter |
Value/Impact
|
Researchers (e.g., data creators and users) |
To understand, navigate, and select suitable research tools for managing and sharing data by providing information about their functionalities, relevance, and applicability to the various stages of the research data lifecycle.
|
Data support professionals (e.g., data managers) |
To gain improved understanding of the digital research data infrastructure landscape, and become better equipped with essential knowledge of different types of research tools to provide relevant support, training, and education.
|
Open Science/Research/Data Commons professionals |
To understand the features, functionalities, and interoperability of different types of research tools that can be used within diverse marketplaces or ‘commons’ for data and services.
|
Tool developers/ providers |
To: (i) understand the different research tools operating within the digital research data landscape; and, (ii) improve tool features, functionalities, harmonisation, and interoperability to enhance data management and sharing practice.
|
Research performing organisations |
To make informed recommendations at the organisational policy level to staff regarding appropriate types of research tools for the management and sharing of research data.
|
Publishers |
To make informed recommendations to authors and journal editors regarding appropriate types of research tools for the management, publication, and sharing of data associated with journal manuscripts.
|
Funders |
To make informed recommendations to researchers and project managers based on data management plans for funded research.
|
3. Engagement with existing work in the area
This working group contributes to and builds on a number of preceding and existing initiatives (e.g., frameworks, registries, and directories) that signpost or aggregate tools within the digital research data infrastructure landscape. However, most initiatives to date focus on specific: (i) regions/nations; (ii) disciplines; or (iii) research tools (primarily databases and repositories), providing a high level of granularity.
This WG aims to build on and contribute to existing work in the area by creating a high-level map (a ‘birds-eye view’) of the digital research tool landscape. To achieve its proposed programme of work (outlined in Section 1), the WG will engage with the following organisations, projects, and initiatives:
Please note this is not an exhaustive list and the WG may find more examples of relevant existing work to include during the initial research and consultation phase.
For the creation of a research data lifecycle framework and crosswalk to existing models:
-
NIST Research Data Framework (RDaF), specifically Version 1.5 - Provides a map of the research data space that uses a lifecycle approach with six high-level lifecycle stages, topics, and subtopics to organise key information concerning RDM and research data dissemination.
-
DCC Curation Lifecycle Model - A data-centric model that defines research data management workflows and associated roles and responsibilities within an organisation.
-
ARDC Research Data Management Framework for Institutions - Australian national framework that features 19 essential elements for research data management.
For the identification, categorisation, and mapping of different types of research tools:
As stated above, the WG will primarily extend the work of RDA groups working on global open research commons:
-
Global Open Research Commons IG (GORC IG: Typology and Definitions)
This RDA group is: (i) developing a shared understanding of what a ‘commons’ is within the research data space, (ii) connecting relevant national, regional and international initiatives; and, (ii) coordinating the delivery of a global Open Research Commons and monitoring related RDA groups.
-
GORC International Model WG (GORC WG Commons Attributes Model Version 0.5)
This RDA group is: (i) generating a set of pertinent attributes to identify common features across open research commons by reviewing and identifying attributes or features currently implemented by a target set of GORC organisations and when possible identifying how they measure their user engagement with these features.
Other relevant Open Science/Research/Data commons initiatives:
-
African Open Science Platform (AOSP) - A federated system that provides scientists and other societal actors with the means to find, deposit, manage, share and reuse data, software and metadata in pursuing their interests.
-
China Science and Technology Cloud (CSTCloud) - A national platform to provide scientists with efficient and integrated cloud solutions in the retrieval, access, use, transaction, delivery and other aspects of sharing scientific information and relevant services.
-
European Open Science Cloud (EOSC) - Contributes to the European Data Strategy by providing seamless access and reliable re-use of research data to European researchers, innovators, companies and citizens through a trusted and open distributed data environment and related services.
-
Global Open Science Cloud - This initiative aims to encourage cooperation, and ultimately alignment and interoperability, between these and similar initiatives addressing the challenges of interoperability, technical infrastructure, policy and legal dimensions, and governance and sustainability.
-
Malaysian Open Science Platform (MOSP) - A strategic transformative initiative to strengthen STI Collaborative Ecosystem for Malaysia that aims to make Malaysia’s research data a valuable national asset by developing a trusted platform that enables accessibility and sharing of research data aligned to national priorities and international best practices.
Other relevant RDA groups:
Aggregators of research data tools:
-
RDM Training and Tools WG outcome by El-Gebali S, Öjefors Stark K, Kronander, et al. (SciLifeLab RDM Training and Tools Working Group) - A Miro board identifying tools and services for open and reproducible research in the Life Sciences.
-
FAIRsharing - A curated, informative and educational resource on data and metadata standards, interrelated to databases and data policies.
-
Re3data - A global registry of research data repositories that covers research data repositories from different academic disciplines.
-
OpenDOAR - A quality-assured, global Directory of Open Access Repositories.
-
COAR - An international association that brings together individual repositories and repository networks in order to build capacity, align policies and practices, and act as a global voice for the repository community.
-
EOSC Marketplace and Portal - Federation of services and tool related to Open Science, including aggregators, repositories, tools for the research lifecycle
-
OpenAIRE Graph - An open resource that aggregates a collection of research data properties (metadata and links) available within the OpenAIRE Open Science infrastructure using a semantic graph database approach.
4. UN Sustainable Development Goals (SDGs)
Understanding the features, functionality, and interoperability of research tools within the global digital research data infrastructure landscape will help to support data management, sharing, and reuse to tackle grand societal challenges and address the United Nations Sustainable Development Goals (SDGs). In particular, this work contributes directly to SDG 17 which aims to ‘Strengthen the means of implementation and revitalise the Global Partnership for Sustainable Development’.
5. Adoption Plan
This WG will undertake the necessary preliminary work for the creation of an online database of different types of research tools mapped to the stages of the research data lifecycle. This work aligns with the RDA’s mission to build the social and technical infrastructure to enable researchers and innovators to openly share and re-use data across technologies, disciplines, and countries.
For transparent and accessible collaboration, the WG will use a Google Folder for its documentation. Updates will be regularly posted to the WG wiki page summarising meetings and sharing important updates relating to WG progress and timelines. The WG will organise regular dissemination activities and solicit community feedback during specific phases of the project. Community consultation (e.g., calls to action, surveys) may be employed to identify different types of research tools used and required by community members throughout various stages of the research data life cycle. The WG will also collaborate with tool providers and Open Science/Research/Data Commons professionals to understand the fast-evolving digital research data landscape and ensure the WG deliverables meet the needs of adopters.
It will be important to validate WG deliverables (Section 1) with the global research data community (researchers, data support professionals, research tool developers/providers, research performing organisations, publishers, funders, and policymakers) at various stages of the WG’s lifecycle.
The preliminary database of different types of research tools (Deliverable 3) is intended to be further developed to become a dynamic and community-curated resource in the future. As described above, the WG will develop recommendations for the long-term maintenance, sustainability, and adoption of the database by different stakeholders (outlined in Section 2).
6. Work Plan
A work plan has been defined that facilitates an efficient and timely delivery of WG deliverables. Working Group members will meet virtually via Zoom (for max. 90 mins) monthly from the end of May 2023. Tasks will be divided and allocated to task groups within the WG, and work undertaken by task groups in between meetings as required. Meetings will involve lightning updates from task groups and may include presentations from external speakers (e.g. tool providers, RDA groups, Open Science/Research/Data Commons professionals).
Month/Year |
Preliminary Working Group Activities |
April 2023 |
|
May 2023 |
|
June 2023 |
|
July 2023 |
|
August 2023 |
|
September 2023 |
|
October 2023 |
|
November 2023 |
|
December 2023 |
|
January 2024 |
|
February 2024 |
|
March 2024 |
|
April 2024 |
|
7. Initial Membership and Leadership
The WG will represent international perspectives from a variety of stakeholders, including researchers, data support professionals, system/service providers, policymakers, publishers, and librarians. Following two brainstorming workshops held in April 2023, the WG comprises the following initial membership and leadership*:
Name |
Affiliation |
Country |
Participation
|
|
1 |
Adam Leary |
Oxford University Press |
UK |
Member |
2 |
Adam Vials Moore |
JISC |
UK |
Co-chair |
3 |
Alex Moura |
King Abdullah University of Science and Technology (KAUST) |
Saudi Arabia |
Member |
4 |
Allyson Lister |
FAIRsharing, University of Oxford |
UK |
Member |
5 |
Christine Lemster |
GEOMAR Helmholtz Centre for Ocean Research Kiel |
Germany |
Member |
6 |
Cristiana Bettella |
University of Padua |
Italy |
Member |
7 |
Emmanuel Adamolekun |
Helix Biogen Institute |
Nigeria |
Co-chair |
8 |
Francis P. Crawley |
CODATA International Data Policy Committee & EOSC-Future |
Belgium |
Co-chair |
9 |
Hea Lim Rhee |
Korea Institute of Science and Technology Information (KISTI) |
Korea |
Co-chair |
10 |
Kathryn Claypool |
Arizona State University |
USA |
Member |
11 |
Lauren Maxwell |
University of Heidelberg, World Health Organization |
Germany |
Member |
10 |
Lina Harper |
Digital Research Alliance of Canada |
Canada |
Member |
12 |
Lisa Curtin |
figshare |
USA |
Member |
13 |
Louise Bezuidenhout |
DANS |
Netherlands |
Member |
14 |
Luc Betbeder-Matibet |
UNSW |
Australia |
Member |
15 |
Maggie Hellström |
ICOS Carbon Portal & Lund University |
Sweden |
Member |
16 |
Malgorzata Lagisz |
University of New South Wales Sydney |
Australia |
Member |
17 |
Marcelo Garcia |
King Abdullah University of Science and Technology (KAUST) |
Saudi Arabia |
Member |
18 |
Marina Razmadze |
Institute for Scientific and Technical Information |
Georgia |
Member |
19 |
Meredith Goins |
WDS-IPO |
USA |
Member |
20 |
Natalie Meyers |
Lucy Family Institute for Data & Society, University of Notre Dame |
USA |
Member |
21 |
Nina Weisweiler |
Helmholtz Association |
Germany |
Member |
22 |
Noel Chibhira |
University Of Pretoria |
UAE |
Member |
23 |
Paolo Manghi |
CNR-ISTI & OpenAIRE AMKE |
Italy |
Member |
24 |
Rebecca Koskela |
RDA-US |
USA |
Member |
25 |
Richard Pitts |
Oracle for Research |
UK |
Member |
26 |
Rory Macneil |
Research Space |
UK |
Co-chair |
27 |
Ross Maxwell |
Centre for In Vivo Imaging, Newcastle University |
UK |
Member |
28 |
Sarah Stewart |
University of Oxford |
UK |
Member |
29 |
Shawna Sadler |
ORCID |
Canada |
Member |
30 |
Stefanie Kethers |
ARDC |
Australia |
Member |
31 |
Susanna-Assunta Sansone |
University of Oxford, UK |
UK |
Member |
32 |
Ville Tenhunen |
EGI Foundation |
Netherlands |
Member |
33 |
Xin Chen |
Chinese Academy of Sciences |
China |
Member |
*Upon endorsement, the WG aims to recruit members from Asia-Pacific countries (East Asia, South Asia, Southeast Asia, and Oceania).
- Log in to post comments
- 2338 reads
Author: Robert Hanisch
Date: 11 May, 2023
The following is the NIST Team response to a review of the new RDA-OfR Working Group Case statement. We appreciate the opportunity to comment as the case proposal is reviewed and revised.
The case statement proposed by the RDA-OfR Working Group appears to overlap with the aims of the NIST Research Data Framework (RDaF).1 The RDaF, which was released in Version 1 in February 2021,2 has just been updated to Version 1.5 and already accomplishes much of what appears to be proposed by this version of the Case Statement, although the Case Statement would benefit by a tighter focus. For example, the Case Statement notes that the “data infrastructure landscape comprises thousands of different systems, tools, and platforms for managing and sharing research data” but does not address the data itself. In succeeding sections, the Case Statement discusses data management and sharing systems. What does “data systems” mean for the group? Is it a catalog of infrastructural components such as repositories, AI platforms, and analysis software or something else? Good definitions would be helpful. The RDaF has definitions for many relevant concepts and terms.
Clarifying the focus of the Case statement is important to ensure there is extension and not duplication of the work done on the RDaF being developed under NIST in the US with international input and reach. It would be appropriate for the Case Statement to cite the RDaF and recognize how this initiative relates to that one. From there, it is important to promote coordination and cooperation with the RDaF. We’d like to suggest a few points where clarification and cooperation is needed to avoid duplication and achieve synergy.
“1. Conducting a literature review of existing work”
The RDaF cites over 600 references relevant to the research data management (RDM) ecosystem, including best practices, policies, and vocabularies. These references can be linked to the “topics and subtopics” in the RDaF Framework Core which may or may not relate to components in the RDA Case Statement. The RDaF also catalogs more than 100 organizations, national and international, participating in some aspect of research data.These organizations are part of the data management landscape included in the RDaF.
“2. Creating an ontology and conceptual map of data management and sharing systems (Output 1)”
The Case Statement indicates it will look at all states of the lifecycle. The RDaF organizes the research data ecosystem into six major lifecycle stages, each of which has an extensive list of topics and subtopics. The relationships among these topics and subtopics are identified through 14 overarching themes and 8 professional “profiles” describing typical roles and responsibilities of people whose jobs influence or are influenced by RDM issues. The RDaF team explored both domain-specific and domain-agnostic approaches, finding that while there are unique issues in certain domains, the challenges of RDM largely transcend those specialties. It would be important to relate this work to what the Case Statement Team expects to accomplish and to use the RDaF work already done. Additionally, how will the proposed catalog of data systems be distinct and extend what re3data and FAIRsharing.org curate?
“3. Designing a preliminary framework for an online open access reference resource detailing different data management and sharing systems (Output 2)”
Depending on the definition of “… systems” in the Case Statement, this could build on what the RDaF has already done. The RDaF does not make recommendations about what systems or services should be used nor does it list specific systems, tools or technologies, although some generic elements are noted. This is intentional, as NIST is strictly neutral when it comes to implementation technologies. Rather, it provides topics that research data organizations need to consider in making decisions concerning specific systems, services, and tools. If the Case Statement describes these and other infrastructure components that would be used in addressing RDaF topics, this might be a useful extension of the existing RDaF work.
Development of the RDaF has been a nearly four-year, $2M effort that was built on community engagement through three plenary workshops and 15 stakeholder workshops, involving more than 300 professionals from across the RDM spectrum. RDA-US contributed to this through in-kind support of a consultant. The RDaF Steering Committee is chaired by the former secretary general of CODATA and has three international members, including the secretary general of RDA and the president of CODATA. Given the familiarity of these individuals with the RDaF, we see opportunities to help revise the current draft Case Statement so that it leverages what has already been done and channels additional efforts to identify RDM infrastructure components that could be used in the implementation of the RDaF.
We would be pleased to see the RDA provide valued-added contributions to the RDaF, such as implementation support to organizations desiring to assess and improve their RDM capacity, and hope the Case Statement can be revised to create that synergy. We also note that the Australian Research Data Commons recently released its “RDM Framework for Institutions.” While it appears to be primarily focused on Australian universities, there may also already be much information available therein that does not need to be repeated by the RDA. Similarly, FAIRSharing.org – which grew out of an RDA WG – already indexes numerous RDM service providers and is actively curating their metadata collection.
Author: Susanna-Assunta...
Date: 17 May, 2023
As a co-chair of the RDA FAIRsharing WG I also appreciate the opportunity to comment as this case proposal is reviewed and revised.
I agree with Robert's comments. It is essential to avoid duplication where consideranble time and effort from organizations and stakeholders have gone into creating similar resources that are widely used and adopted.
There are many aggregators of data systems, and it is clear that users need help in finding resources throughout the various stages of the research data life cycle. However, the key challenge of such aggregators is that the broader (and ambitious) is their coverage, the shallower (and inaccurate) is their content, often failing to deliver reliable and trustworthy advice to the users. Among others, the success of such aggregators depend on their ability to: (i) strike a balance between content depth and breadth, (ii) map and harmonize information extracted from different sources, (iii) have their content community-vetted, and (iv) keep it up-to-date. These require continuos community input and contributions, as well as considerable time and effort, which goes well beyond the 18 months of a WG life span. Last but not least, a sustainability plan (and community support) is vital to mantain and grow it, keeping it relevant, and most importantly open, and freely available. It is not clear how these will be addressed by the proposed WG.
Therefore, I strongly encourage the proponents to envisage also a way to provide valued-added contributions to existing primary resources, which already map (even if partially) the data infrastructure landscape. This is the case of the RDA-recommended FAIRsharing, interlinking standards, databases and policies across all disciplines. Being more focused, FAIRsharing is able to enrich, harmonize and curate the description of its content also with the input of the FAIRsharing Community Champion Programme, partly supported by an RDA grant.