Research Data Repository Interoperability
The initial idea of establishing this working group was presented during P6 in Paris in the Repository Platforms for Research Data IG session. Shortly after P6 a telephone conference was carried out with the conclusion to prepare a case statement and to finalize it during a BoF session at P7. The initial co-chairs are David Wilcox and Thomas Jejkal. Contacts to potential co-chairs from Asia were already made during P6 and will be finalized during P7.
For more information please visit the web page of this BoF group:
https://rd-alliance.org/groups/research-data-repository-interoperability-wg-bof.html
Charter
The Research Data Repository Interoperability Working Group will establish standards for interoperability between different research data repository platforms focusing on machine-machine communication. These standards may include (but are not limited to) a generic API specification and import/export formats summarized in a document serving as an implementation guide for adoption. The scope of this document and all the WG’s activities will be defined by the following list of initial use cases:
-
Migration/Replication of a Digital Object between research data repository platforms
-
Platform, data model and/or version may differ between source and destination
-
-
Retrieval of information related to the platform and/or its contents
-
E.g. to register the system in a (repository) registry or to harvest contents
-
This initial list might be extended in the first phase of the WG’s operational time.
In order to cover these use cases, existing standards and technologies will be identified and evaluated in the second phase. Evaluation results will be summarized in a separate deliverable and will form the basis of the final deliverable. During the evaluation phase, the preparatory work of other RDA WGs will be used as far as possible along with experiences gathered by the RDRI WG’s members during their work with and on existing research data repository platforms.
In the final phase the WG will strive for a consensus regarding a generic API specification and/or import/export formats needed for offering the listed functionalities. The final deliverable will then contain this consensus in a form such that it can be used as an implementation guide for later adoption.
Value Proposition
The Research Data Repository Interoperability working group will provide recommendations and implementation guidelines (e.g. for a generic API or import/export formats) for research data repository interoperability that can be integrated by platform developers and service providers. Therefore, existing standards and technologies will be evaluated and integrated where possible. Once adopted widely, these outcomes will allow institutions and organizations with research data repositories to deposit, access and share their data in a common way and to disseminate repository resources and contents to clients and services easily. For adopters and their users this means:
Removing Barriers: Defining and implementing interoperability standards for realizing the use cases mentioned above could help to identify and to acquire datasets stored in other platforms not available before in order to enrich the own research.
Easier Collaboration: Having a common way to exchange datasets stored in different research data repository platform instances from different institutions or even disciplines can help to identify new starting points for (inter-)disciplinary collaborations.
Creating Commonalities: Agreeing on and implementing common standards for realizing typical research data repository tasks might bring adopters closer together. For the future this could result in fruitful collaborations extending the basic set of functionalities that have been proposed by this WG.
As everything rises and falls with the adoption of the results, repository platform developers contributing to this group have agreed to implement the results as early adopters.
Engagement with Existing Work
A number of related standardization efforts have already taken place; for example, the OAI protocol for metadata harvesting, the SWORD protocol for repository deposits, and the re3data.org schema for collecting information on research data repositories for registration. The Research Data Repository Interoperability WG will review these and other related standards to see how they might be adopted or extended to support our goals. This review period will ensure that we do not duplicate existing efforts.
Related Work
Related RDA Groups
Work Plan
The work of the proposed group is organized in three phases framed by the RDA plenary meetings beginning with P8.
Timing |
Action |
Main Participants |
September 2016 |
Official start of RDRI WG at P8, working session at P8 for analyzing state of the art |
Session participants in an open discussion |
September – December 2016 |
Identification and discussion of additional use cases and adoptable technologies. Mapping of technologies for potential adoption to single functionalities. |
Registered members |
January – April 2016 |
Create a primer document describing all use cases and technologies for potential adoption. The document also points out gaps not covered by existing technologies. |
Co-chairs |
April 2016 |
Session during P9 to present the primer document and to prepare next steps, e.g. identification of functionalities or exchange formats. |
WG members |
April 2016 – September 2017 |
Discussion of functionalities, exchange formats and intended behavior. Create first draft of specification document. |
Registered members |
September 2017 |
Presentation of the specification draft at P10 and identification of open points and potential improvements. |
Session participants in an open discussion |
September – March 2018 |
Find consensus regarding final specification and write final deliverable serving as implementation/adoption guideline. |
Registered members/co-chairs (writing) |
March 2018 |
Present final results at P11. |
Co-chairs |
Deliverables
D1. Research Data Repository Interoperability Primer (M6): This document describes targeted use cases, needed functionalities, as well as existing technologies and their feasibility for adoption. Gaps not covered by existing technologies are also described in this document.
D2. Interface Specification Draft (M12): A first draft document of the final specification. The document gives a basic overview of functionalities, exchange formats and intended behavior targeted by the WG to cover the defined use cases. This document will be the basis for finding a consensus between all WG members.
D3. Interface Specification (M18): This specification represents a consensus of all partners regarding an interoperable repository interface. It describes all functionalities provided by this interface including exchange formats and the expected behavior of a repository platform implementing the interface. This document serves as guideline for adopting the results of this working group.
Mode and Frequency of Operation
The Research Data Repository Interoperability WG will primarily communicate asynchronously online using the mailing list functionality provided by RDA. Online voice meetings will be scheduled as needed; likely once per month. When possible, in-person meetings will also be scheduled; these will take place at RDA plenaries and at other conferences where a sufficient number of group members are in attendance.
Addressing Consensus and Conflicts
Group consensus will be achieved primarily through mailing list discussions, where opposing views will be openly discussed and debated amongst members of the group. If consensus cannot be achieved in this manner, the group co-chairs will make the final decision on how to proceed.
The co-chairs will keep the working group on track by setting milestones and reviewing progress relative to these targets. Similarly, scope will be maintained by tying milestones to specific dates, and ensuring that group work does not fall outside the bounds of the milestones or the scope of the working group.
Community Engagement
The working group case statement will be disseminated to mailing lists in communities of practice related to research data and repositories in an effort to cast a wide net and attract a diverse, multi-disciplinary membership. Group activities, where appropriate, will also be published to related mailing lists and online forums to encourage broad community participation.
Adoption Plan
Representatives of several major repository platforms have already joined this working group, including:
These representatives have agreed to consider implementing the standards recommended by the Research Data Repository Interoperability WG in their respective repository platforms. We will continue to seek representatives from a variety of repository platforms and services to ensure that this working group’s deliverables are widely adopted.
Initial Membership
Co-Chairs
Thomas Jejkal
David Wilcox
Members
Stefan Funk
Ralph Mueller-Pfefferkorn
Robert Olendorf
Rick Johnson
Ulrich Schwardmann
Ajinkya Prabhune
Andrew Woods
Wolfram Horstmann
Cynthia Hudson Vitale
Adam Soroka
Jared Whiklo
Colleen Fallaw
Rainer Stotzka
Stephen Abrams
Eleni Castro
Amy Nurnberger
Andre Schaaff
Christopher Harrison
Holger Mickler
Jibo Xie
Juanle Wang
Muhammad Naveed Tahir
Niclas Jareborg
Shaun de Witt
Volker Hartmann
William Gunn
Wouter Haak
- Log in to post comments
- 19842 reads
Author: Eva Méndez
Date: 19 May, 2016
I think this group is a very interesting idea. Congratulations and count with me for P8. However, on reading the Related RDA Groups, I am really missing its relationship with metadata WG. I think we should look for sinergies.
Looking forward to hearing more...
Author: Thomas Jejkal
Date: 20 May, 2016
Of course, there is also overlap with other RDA groups not explicitly mentioned in the Case Statement and there are definitely contact points with the metdata groups. Therefor, it would be great if we could stay in contact for information exchange.
Author: Elizabeth Griffin
Date: 20 May, 2016
What is outlined is obviously an essential stage in realizing All Data for All Researchers, but there is rather more to it than enabling repositories to contact and communicate seamlessly. That is only one small step, and it is a long way down the stream of eventual confluence of 'dissimilar' data-sets. In theory it sounds great, but how is it going to work in practice, and to what uses can merely transported data be put when there are so very many other variables in the works? Different sciences use different interpretations of the same word to describe features of their observations. It isn't as if all researchers use identical computers and identical reduction software or modelling tools. Data formats sound misleadingly alike, but can refuse to conform even within the same science. A simple example: different instruments will deliver either fluxes or intensities, or some machine-uncorrected version of either, and it can be crucial to sort that kind of trivial-sounding matter out before drawing erroneous conclusions. Of course, you will reply, all those things will be properly sorted out in due time. But when, and by whom, and will they all be? It only takes one publication to present wrong conclusions that resulted from not fully understanding the subtle differences between different types of data to place the whole effort in jeopardy. I therefore believe that, while the topic in question is worthy of deep consideration, it does also need to be placed very precisely within its rightful place along the whole chain of actions from inter-departmental agreements on format unification, language unification and metadata unification, via inter-university or country-wide or international agreements of the same kind, with ample trials and feedback at every stage and involving users at every stage, until it could be claimed that the data scientists have done their work thoroughly. It will then also take inordinate amounts of dedicated time for the other half of the population, the users, to come up with their own judgements at every step. All of that cannot be swept up in one RDA IG for 'interoperability', though trying to get a full perspective of the total procedure will help to place the intentions of this particular (would-be) IG more nearly into its correct context.
Author: Malcolm Wolski
Date: 24 May, 2016
I have the same concerns as Elizabeth. To achieve something within a short timeframe you will need to keep the scope narrow and focused. As Elizabeth points out it is a big issue. But we have to start somewhere. Perhaps there are some outputs around general principles and approaches rather than specific solutions for every situation.
Author: Thomas Jejkal
Date: 17 Jun, 2016
From your perspective, having the overall goal of "All Data for All Researchers" in mind, I totally agree with you. This is something a single WG can impossibly achieve. Of course, the proposed WG contributes only a very small piece to the ultimate vision of sharing every data with everybody. However, we think that this small piece is worth to be tackled and may contribute (on a more technical level) to improve data sharing and exchange.
All other aspects like format, language and metadata unification are out of the scope of this WG, but if there are recommendations of other groups, from inside or outside RDA, in these directions these recommendations will be definitely taken into account as far as possible.
Author: Donald Pellegrino
Date: 20 May, 2016
It might be useful to reach out to a representative of the iRODS repository platform as well. More information on iRODS can be found at http://irods.org/.
Author: Thomas Jejkal
Date: 17 Jun, 2016
Of course, having an iRods partner would be great. Do you have someone in mind?
Author: Stefan Kramer
Date: 03 Jun, 2016
I believe that this WG's proposed undertaking is a very worthwhile effort, having personally encountered the challenge of "Research Data Repository Interoperability" (or lack thereof) in investigating how to mirror data submitted to a data visualization platform, for interactive access to data (namely, opendata.american.edu) into a data archiving platform (namely, dra.american.edu).
Disclosure: I am one of the co-chairs of the Repository Platforms for Research Data IG (with David Wilcox & Ralph Müller-Pfefferkorn).
-- Stefan
P.S.: there seems to be a glitch in this platform - I posted this comment on June 3, but it was datestamped May 19, the same date that the case statement was posted for review. As are all the other previous comments.
Author: Tim Smith
Date: 20 Jun, 2016
Please could you add Invenio to the list of represented repository platforms (underlies services such as Zenodo, B2SHARE, INSPIRE, etc). I've joined the nascent WG and look forward to meeting at P8.
Author: Thomas Jejkal
Date: 06 Jul, 2016
Thank you for joining the group. I'll add Invenio to the list.