Building standards in responsibility
The beginning of April 2019 marked also the week of the 13th RDA Plenary that took place in Philadelphia, USA. Once again, this major biannual event of all research data aficionados attracted 433 participants from 33 countries, who spent their time in more than 60 sessions, working meetings, BoFs and, of course, lots of caffeine-driven discussions in the halls of the Loews Hotel.
The major theme of this Plenary was "With Data Comes Responsibility". Responsibility is indeed a key ingredient in good scientific research - unfortunately, it is really hard to quantify and assess it, as and as such, it often ends up being overlooked and taken for granted. The speakers throughout the Plenary tackled this issue from multiple perspectives and attempted to provide some insights on the problem itself, as well as some suggestions for the way ahead. Prof. Julia Stoyanovich, as a keynote speaker, gave one of the most memorable talks of the event; the title "Translating Fairness, Accountability and Transparency into Data Science Practice" clearly resonated the issue of responsibility from the perspective of algorithmic bias introduced by unfair data harvesting approaches. More than that though, it highlighted another aspect that permeated most sessions I attended as an undercurrent; how to translate "responsibility" into practical requirements for researchers.
When discussing about "responsible" data management practices, one usually comes across keywords such as FAIRness and transparency. However, in an attempt to assess these across a data life cycle, there are several misconceptions that Julia pointed out. First and foremost, research software is closely linked to research data. Data by itself (especially in Life Sciences) cannot be directly used for decision making due to the enormous complexity and sheer size of it - enter research software. I will not go into detail about the current gap in training for Open Research Software - there is extensive literature on this that covers current recommendations, expected open science skills and practical instructions among others. "Transparency" in terms of Open Research Software tends to be equated to "open source" . However, this is not always the case, especially when software tends to have thousands of lines of code (a dataset by itself, really). In Julia’s words, "algorithmic transparency is not just releasing source code, which can be unnecessary, sometimes even impossible, and often insufficient". Mirroring this observation to data, the exact same concept applies. Paraphrasing Julia, data transparency is not just making data public; often it’s more important to have the additional information relating to the data gathering process, such as selection criteria, sampling, methodology used, provenance, etc. Ultimately, both these statements can be used as guiding principles for applying and assessing "transparency" in research data and software.
So what about "responsibility"? What would be the practical guidelines one should be aware of to ensure this? Obviously there is no single answer, but here is where the RDA community really shines. We are the technical trendsetters, that facilitate social and technical bridging in research data, thus enabling open data sharing. Or, as Mark Parsons succinctly stated, "we are the bridge builders in the brave new world of data". Every group within RDA is essentially connecting different perspectives - from technical aspects (such as the Data Discovery Paradigms IG and the Software Source Code IG), to domain specific (such as the ELIXIR Bridging Force IG and the ESIP/RDA Earth, Space, and Environmental Sciences IG). The recognition is already there; for example, the European Commission has recognised RDA as a non-standards body issuing technical specifications - and RDA's first four outputs have been approved for public procurement in Europe. This by itself is an amazing achievement, and evidence of how the work of the RDA community can have a direct impact to the world.
I will close this meandering article by raising a point that I came across during a twitter discussion near the closing of the RDA P13. The expectation of the principles of responsibility is (and has always been) an integral part of Science. However, it is significantly harder to ask for compliance with principles, as opposed to compliance with standards. And building global standards is exactly where the RDA community shines best.