RDA Plenary 4 - WG Data Citation: Making Dynamic Data Citeable
The fourth plenary meeting in Amsterdam was the first RDA event I had the opportunity to attend. My participation at the plenary meeting was made possible by the Early Career European Researchers Programme. I am very grateful for this opportunity and explicitly want to thank the organizers for giving young scientists the chance to take part in such an event and exchange ideas with more than 500 experts on research data.
I had the chance to report from the breakout session of the Working Group on Data Citation which is chaired by Andreas Rauber (Vienna University of Technology), Ari Asmi (University of Helsinki) and Dieter van Uytvanck (Max Planck Institute). This Working Group was officially endorsed in March 2014 before the third RDA plenary. Its scope is to work on machine-actionable citations of dynamic datasets and their subsets. The Working Group is not addressing metadata for these citations or landing pages as these topics are handled in other Working Groups or Interest Groups. The goal is to develop concepts and recommendations and evaluate them – conceptionally as well as through implementing them in practice.
The Working Group chose the approach to use time-stamped and versioned database data and based on that, assign PIDs to a time-stamped query/selection expression which also allows a comparison to the current version of the results in the database. This approach was questioned in the breakout session discussion as several participants in the breakout session were new to the Working Group. Thus, the group co-chairs explained that – unlike taking a snapshot of a database and citing it – assigning a DOI to a time-stamped query is a generic and scalable solution.
The main part of the session was reporting from pilots that have already started practical implementations of dynamic data citation or introducing the challenges projects face and hope to solve in collaboration with the Working Group. First, John Watkins (Centre for Ecology and Hydrology) reported about the results from a UK Natural Environment Research Council (NERC) workshop held at the British Library in June 2014. There, the Working Group members discussed their approach to cite dynamic data with several NERC data centres and helped them shape their data citation developments. The second presentation by the Working Group co-chair Dieter van Uytvanck showed the support for dynamic data citation in linguistics by the example of the Common Language Resources and Technology Infrastructure (CLARIN). His short talk was followed by Adrian Burton (Australian National Data Service) who outlined the challenges Australia’s National Supercomputer Facility (NCI) is facing when it comes to citation of dynamic data. He emphasised their willingness to organise a workshop with the Working Group members to discuss the topic in more detail and find a suitable solution for the NCI. Hereafter Stefan Pröll (Secure Business Austria) presented two pilots he is working on. Following the high demand in the previous Working Group meetings, he developed a prototype for dynamic data citation of CSV data. His modularized approach still needs final module integration as well as a user-friendly interface and testing by more applications. Following the same practice he created a tool for citing dynamic XML data which he presented shortly and which also needs some work on the interface as well as testing with real world use cases. The final presentation was given by Andreas Rauber on behalf of Carlo Maria Zwölf (Virtual Atomic and Molecular Data Centre) and outlined a pilot using a worldwide e-infrastructure consisting of 41 federated, heterogeneous and interoperable Atomic and Molecular databases. More details of the presentations can be found in the slide deck.
The pilots showed some great progress compared to what was presented at the third plenary and the Working Group is still looking for more examples to work with and test their approach on, especially also negative examples where the approach will not work.
The discussion with the 30 to 40 attendees raised additional areas for research that are related to the dynamic data citation challenge such as the development of a standardised way to describe the search query. The Working Group however will take one step at a time and for now concentrate on the pilot implementations and recommendations arising from them.
Joining the fourth RDA plenary meeting was a great experience and I could take part in many interesting discussions. There was almost too much going on to be able to join all the sessions I was interested in and it is a pity that there was too much to talk about in too little time. I learned a lot in the four days in Amsterdam and I am looking forward to updates and outcomes from all the RDA activities going on.