RDA COVID-19 Guidelines and Recommendations (draft versions)

Please note: this is the landing page for the 5 draft versions of the RDA COVID-19 Working Group Recommendations and Guidelines on Data Sharing. These materials are made available following the RDA guiding principle of openness, to allow members and the wider community to view the progression of the document as well as comments received during the 5 review cycles.

The final version of the document can be found:

via the RDA COVID-19 working group's outputs page, which contains supporting details, context, and a link to the DOI, or
directly in Zenodo, DOI: https://doi.org/10.15497/rda00052

RDA COVID-19 Working Group

Group Co-chairs: Juan Bicarregui, Anne Cambon-Thomsen, Ingrid Dillo, Natalie Harrower, Sarah Jones, Mark Leggott, Priyanka Pillai

Subgroup Moderators:

Clinical: Sergio Bonini, Dawei Lin, Andrea Jackson-Dipina, Christian Ohmann

Community Participation: Timea Biro, Kheeran Dharmawardena, Eva Méndez, Daniel Mietchen, Susanna Sansone, Joanne Stocks

Epidemiology: Claire Austin, Gabriel Turinici

Indigenous Data: RDA International Indigenous Data Sovereignty Interest Group

Legal and Ethical: Alexander Bernier, John Brian Pickering

Omics: Natalie Meyers, Rob Hooft

Social Sciences: Iryna Kuchma, Amy Pienta

Software: Michelle Barker, Hugh Shanahan, Fotis Psomopoulos

Editorial team: Christoph Bahim, Alexandre Beaufays, Ingrid Dillo, Natalie Harrower, Mark Leggott, Nicolas Loozen, Robyn Nicholson, Priyanka Pillai, Mary Uhlmansiek, Meghan Underwood, Bridget Walker

Recommendation title: RDA COVID-19; recommendations and guidelines, 5th release (final draft) 28 May 2020

Authors: RDA COVID-19 Working Group and Subgroup Members

DOI: https://doi.org/10.15497/rda00046

Citation: RDA COVID-19 Working Group. recommendations and guidelines. Research Data Alliance, 2020. DOI: https://doi.org/10.15497/rda00046

Note: The overarching COVID-19 Working group includes the specific working group members, events, posts, wiki and documents can be viewed here:

Disclaimer: The views and opinions expressed in this document are those of the individuals identified, and do not necessarily reflect the official policy or position of their respective employers, or of any government agency or organisation.

Context:

During a pandemic, data combined with the right context and meaning can be transformed into knowledge for informing public health responses. Timely and accurate collection, reporting and sharing of data with the research community, public health practitioners, clinicians and policy makers will inform assessment of the likely impact of a pandemic to implement efficient and effective response strategies.

Public health emergencies clearly demonstrate the challenges associated with rapid collection, sharing and dissemination of data and research findings to inform response. There is global capacity to implement systems to share data during a pandemic, yet the timeliness of accessing data and harmonisation across information systems are currently major roadblocks. The World Health Organisation’s (WHO) statement on data sharing during public health emergencies clearly summarises the need for timely sharing of preliminary results and research data. On 28 May 2020, the G7 Science and Technology Ministers’ Declaration on COVID-19 was issued, which calls for government-sponsored COVID-19 epidemiological and related research results, data, and information to be accessible to the public to the greatest extent possible.There is also a strong support for recognising open research data as a key component of pandemic preparedness and response, evidenced by the 117 cross-sectoral signatories to the Wellcome Trust statement on 31st January 2020, and the further agreement by 30 leading publishers on immediate open access to COVID-19 publications and underlying data.

Objectives:

The objectives of the RDA COVID-19 Working Group (CWG) focusing on essential Clinical, Community Participation, Epidemiology, Indigenous Data, Legal & Ethical, Omics, and Social Sciences, and Software are:

to clearly define detailed guidelines on data sharing under the present COVID-19 circumstances to help stakeholders follow best practices to maximize the efficiency of their work, and to act as a blueprint for future emergencies;
to develop guidelines for policymakers to maximise timely, quality data sharing and appropriate responses in such health emergencies;
to address the interests of researchers, policy makers, funders, publishers, and providers of data sharing infrastructures.

5th Release - Final Draft for public comment:

The RDA COVID-19 Working Group (CWG) members bring various expertise to develop a body of work that comprises how data from multiple disciplines inform response to a pandemic combined with guidelines and recommendations on data sharing under the present COVID-19 circumstances. The work has been divided into four research areas with four cross cutting themes, as a way to focus the conversations, and provide an initial set of guidelines in a tight timeframe. The detailed guidelines in this body of work is aimed to help stakeholders follow best practices to maximise the efficiency of their work, and to act as a blueprint for future emergencies. The recommendations in the document are aimed at helping policymakers and funders to maximise timely, quality data sharing and appropriate responses in such health emergencies.

The CWG is addressing the development of such detailed guidelines on the deposit of different data sources in any common data hub or platform. The guidelines aim at developing a system for data sharing in public health emergencies that supports scientific research and policy making, including an overarching framework, common tools and processes, and principles that can be embedded in research practice. The guidelines contained herein address general aspects that data should adhere to, for example the FAIR principles (that research outputs should be Findable, Accessible, Interoperable, and Reusable), or the adoption of research domain community standards.

These detailed guidelines are supplemented with higher level recommendations aimed at the other stakeholder groups who need to work together with the researchers and data stewards to realise the timely and open sharing of research data as a key component of pandemic preparedness and response.

The work has been divided into four research areas with four cross-cutting themes, as a way to focus the conversations, and provide an initial set of guidelines in a tight timeframe.

The RDA COVID-19 WG was initiated after a conversation between the RDA and the European Commission. The first meeting of the CWG to determine the work was held in March. As of May, the CWG counted over 440 members, evenly spread across the different sub-groups. This effort also reflects the work of a host of other RDA Working Groups, as well as external stakeholder organisations, that has developed over a number of years.

The CWG and the sub-groups operate according to the RDA guiding principles of Openness, Consensus, Balance, Harmonisation, Community-driven, Non-profit and technology-neutral and are open to all.

The 5th release (final draft) starts in Section 2 with an overview of foundational, overarching elements that emerged across the different research areas. These recommendations touch upon a number of well-known topics in research data sharing and align to the statements made by many organisations and governance bodies, including but not limited to, the WHO, the G7 Science and Technology Ministers, the publishing industry and Wellcome Trust.In Sections 3 to 6 the focus is on the COVID-19 related research areas. Each section starts with a description of the area and the focus and scope of the work done, followed by the actual recommendations and guidelines. In sections 7 to 10 this same structure is used for the four cross-cutting themes. The document contains an extended glossary of terms to support the reader (Section 11), an overview of useful additional resources (Section 12) and a list of references (Section 13). Section 14 lists the contributors to this work.

Timing and Future Releases:

This is the fifth and final draft of the Recommendations and Guidelines from the RDA COVID-19 working group, and is open for public comment until 8th of June 2020. Following the open period, feedback will be considered and then the WG will seek endorsement of the document from the RDA governance bodies prior to final publication.

More information and insights on the plans for those releases, as well as highlights from the sub-groups are given during the informative webinars – see COVID-19 WG Events page for details and to access previous recordings and presentations.

Request for Comments:

In the spirit of the RDA community and its open process, your feedback on the content of the releases, the scope, the direction, are vital for all involved to shape and focus the document and the sections to be a useful and meaningful tool.

A request from the editorial team: please indicate the section of the document or the indicator that your comment is about, and, if possible, also include a suggestion for improvement. Many thanks!

Output Status:

Other Outputs (Not official)

Review period start:

Friday, 24 April, 2020 to Monday, 8 June, 2020

Group content visibility:

Public - accessible to all site users

Primary Domain/Field of Expertise:

Social Sciences, Medical and Health Sciences, Humanities

Primary WG Focus / Output focus:

Data Management

Data Collection

Data Description

Identity, Store, and Preserve

Disseminate, Link, and Find

Policy, Legal Compliance, and Capacity

Domain Agnostic:

File:

Attachment	Size
RDA COVID-19; recommendations and guidelines, 1st release 24 April 2020.pdf	558.21 KB
RDA COVID-19; recommendations and guidelines, 2nd release 1 May 2020.pdf	818.23 KB
RDA COVID-19; recommendations and guidelines, 3rd release 8 May 2020.pdf	1.9 MB
RDA COVID-19; recommendations and guidelines, 4th release 15 May 2020.pdf	2.13 MB
RDA COVID-19; recommendations and guidelines, 5th release (final draft) 28 May 2020.pdf	1.77 MB

Log in to post comments
46585 reads

Author: Patrick Dunn

Date: 28 Apr, 2020

What are the thoughts on the use of ISARIC-WHO Case Report Forms (CRFs) to collect data on individuals presenting with suspected or confirmed COVID-19 as a way of encouraging a core set of clinical data for COVID-19 patients?

https://isaric.tghn.org/covid-19-clinical-research-resources/

Apologies if this is redundant.

Cheers,

Patrick Dunn

Author: Abdelkrim Boujraf

Date: 30 Apr, 2020
To whom it may concern,

I am a bit knowledgable about software manufacturing;

The recommendations are a source of useful information for IT Solutions Architects who want to implement the IT solutions supporting the five subgroups.

Here are some challenges to ensure a smooth implementation of such IT Solutions:

the explosion of software developed by individuals annihilates the support that public administration may provide in the mid-to-long term. Software engineers develop hundreds of dashboards, offering a minimal amount of added value. See a short list of dashboards (see 1)

the lack of awareness of the RD-Alliance activities by the Software industry, an industry that should profit from RDA' best practices

Each sub-group should provide a link to documentation describing their needs (for dummies). For example, the OMICS group did a great job describing the data models the researcher should use to store its data from cell and molecular biology; is it possible to explain in which context they use those data? (or is it out of scope?)

Do the sub-groups plan to build a tool or website that the community may use daily as a checklist? I started mine :-) (See 2)

GREAT JOB!

Abdelkrim Boujraf

Links

(1) https://data-visualization.readthedocs.io/en/latest/dataviz/dashboards.html

(2) https://data-visualization.readthedocs.io/en/latest/dataviz/rda_covid-19...

Author: Hugh Shanahan

Date: 03 Jun, 2020

Hi - apologies for not getting to this.

I agree that Industry could benefit more from the outputs of the RDA (though there is some contact through specific industries such as Pharma).

The scope of the report is for researchers, policy makers and publishers and hence to keep in scope we've not provided an introductory guide to the research topics themselves (as it is the document is 120 pages long).

There will be a decision tree to aid the above which I think is similar to what you're suggesting.

Thanks again

Hugh

Author: Mogens Thomsen

Date: 05 May, 2020

As a member of the COVID-19 clinical subgroup I have followed the webinars that were very informative and I have had a look at the 2^nd release of Recommendations and guidelines. The document looks promising.

I have a few suggestions :

4.2.4 first line

it is critical to spend limited time and resources on reliable data sources … I propose :

it is important to concentrate the efforts on scrutinizing reliable data sources …

4.3

The list of additional working documents & links might be divided into references to articles and static web pages (item 4,6,7,8) and web sites with resources that are updated continuously.

For the other sections of the document I would also prefer that additional working documents and links and also the references were separated as mentioned above.

It is important that ISO standards are respected and in that context it should be avoided to have tables and figures with dates that do not follow the rules (e.g. 4/5 is the 5 April in US English and 4 May in UK English)

Finally, it is a good idea to have two new overarching subgroups. However, for the legal and ethical guidelines there is some redundancy with some parts of the chapter 8.2.4 in the Social Sciences subgroup. But I am sure that this will be worked out.

I look forward to see the 3rd release of the Guidelines.

Best regards

Mogens Thomsen

Author: Nora Dörrenbächer

Date: 06 May, 2020
Dear all,

the German Data Forum (RatSWD) is an advisory council to the German federal government for the research data infrastructure of the social, behavioural and economic sciences, connecting a network of currently 34 accredited Research Data Centres. We are an organisational member of RDA since 2015. Currently, we are also working on ways to provide researchers with guidelines on how to manage their data when researching the impact of Covid 19 on social issues (see here) – with a focus on research in German speaking countries. Therefore, we are highly excited about the initiative of the RDA Covid 19 working group to establish standards for data sharing and highly welcome the initiative. We would like to add the following suggestion to the 2nd release document:

8.2.2 Documentation, Standards, and Data Quality:

(addition) Ensuring data quality

In case of data collection using smartphone apps and other sensors:

note that most sensors are created as consumer products and their data quality does not meet the scientific demands to reliability and validity per se. Therefore, critically assess and document the data collection process, measurement accuracy, and the validity of the inferences

think about replicability of research – thorough research data management keeping in mind archiving raw data, software as well as hardware details and their versions as well as used code.

think about contractual restrictions, data protection and research ethics: researcher should make sure they have the rights to use sensor-based data or log files, the used format of informed consent should be – if applicable – confirmed by the actors or councils in charge. The consent forms should cover the archiving and reuse of data;

For specific recommendations see German Data Forum (RatSWD) (2020): Data collection using new information technology: Recommendations on data quality, -management, research ethics, and data protection Output 6 (6). https://doi.org/10.17620/02671.47. Berlin (soon available in English)

8.2.3. Storage and Backup:

Highlight that much social science data is sensitive personal data. Before sharing and storing the data anonymisation and/or pseudonymisation must be implemented.

8.2.4 Legal and Ethical Requirements:

Specify point. 2

„Researchers have a responsibility for ensuring research participants understand that there may be a risk of re-identification when data are shared. Generally, researchers should take all possible measures to reduce the risk of re-identification. These attempts should be documented.

Add a section

8.2.6. Ensuing access to data in times of the Corona pandemic

Formally anonymized microdata from official statistics are a central basis of numerous empirical research projects. Often, this data can only be used by researchers in a fixed location at guest workstations or via remote execution (i.e. generally without viewing and browsing data or results on the screen). In the current Corona pandemic, guest workstations are hardly usable and researchers have to work in their home office. Consequently, they no longer have access to important data.

Recommendation:

Establishing remote access to individual data from the researcher's workplace makes research processes more flexible. Expensive and time-consuming business trips are no longer necessary while securing still high levels of data protection. Role models already exist in a number of European statistical offices and German research institutions (for more information see: https://www.ratswd.de/en/publication/output-series/2855).

Author: Dawei Lin

Date: 06 May, 2020

7.2.1 Recommendations for virus genomics data Repositories

For assembled and annotated genomes we suggest deposition in one or more of these archives: NCBI GenBank, DDBJ Annotated/Assembled sequences, European Nucleotide Archive (ENA) Assembled/Annotated sequences, and/or NCBI Virus.

The 'and/or NCBI Virus' should be removed. NCBI Virus is a custom view of GenBank and RefSeq data, plus added analysis tools.

Under 7.2.2 - What is listed for 'Cell lines/Animals' is surprising as they are all gene expression resources not sample metadata resources. I would consider listing INSDC BioSample resources there instead.

Kim D. Pruitt, Ph.D

Chief, Information Engineering Branch

NCBI/NLM/NIH

Author: Rob Hooft

Date: 14 May, 2020

Dear Kim,

Thank you very much for your review of our draft recommendation. We have taken your suggestions to heart and now refer to NCBI Virus as a NCBI genbank subrecource. We have also made it more clear that submission to the genome-phenome and genome expression archives leads to creation of several data sets in multiple databases that are linked to each other.

Rob Hooft, representing The Omics team

Author: Peter Cornwell

Date: 21 May, 2020

There is duplication between statements made under the Community Participation and Data Sharing Section 4., and Data Sharing in Social Sciences 7. While restrictions about data sharing in SS are more specific and Section 7. might warrant independent clarification, 7.4.5 is currently fairly generic and also more comprehensive than the text at 4.4.3. The Epidemiology and Omics sections don't seem to have separate preservation discussions. The analysis preservation statements currently at the bottom of page 66 are valuable, if ambitious. As a quick fix maybe you would consider replacing 4.4.3 with the text at 7.4.5 ?

The preservation advice is basically use repository technology if possible; preferably a "trustworthy digital repository committed to preservation"; even better a "disciplinary repository .. for maximum visibility". It is worth recalling that in 2019 DataCite r3d100011136 which represented a major output of the Gates Foundation project on malaria at https://www.vecnet.org/ (these links still probably don't go anywhere) became unavailable. It had to be redelivered from an unsupported Fedora version using donated funds because this was too unfashionable for mainstream funders. Hopefully we can protect COVID research data more effectively. Maybe you would also consider building a repository of COVID publications similar to the vecnet model, with the benefit of current outputs rather than having to gather material from previous decades. Automated treatment techniques pioneered by Plazi in biodiversity could then be applied to significantly increase accessibility.

Author: Natalie Harrower

Date: 03 Jun, 2020

Dear Peter thank you for you comment. We have made substantial revisions in the latest release (5th/28th May) with an attempt to remove duplication. Please let us konw if you find any remaining issues on your points. Many thanks.

Author: Laurence DELHAES

Date: 30 May, 2020

Hi everyone

Great job !

No specific comment for me.

Cheers

Laurence

Author: Natalie Harrower

Date: 03 Jun, 2020

On behalf of all the contributors, thank you, Laurence!

Author: Manuela Teresa ...

Date: 03 Jun, 2020

Dear RDA COVID-19 Working Group 2020,

thanks for the huge work undertaken. May I suggest a further point, i.e. the promotion of "pre-clinical" data sharing as a cross-cutting theme transversal to many others including Clinical, Omics and Community.

The decision on which therapeutic agents or vaccinal strategies to bring into clinics, should be dictated by standard-shared and data-shared scientific evidence. This evidence should be based on integrated results of existing in silico, in vitro and in vivo preclinical studies. To speed-up the development of drugs, the experimental studies, in particular, should be based on the use of cutting-edge research tools instead of conventional tools that have a limited predictive capability. I have reviewed the updated set of research tools (in silico, in vitro and in vivo) already used in virology and vaccinology in the context of SARS-CoV-2: https://www.thno.org/v10p7034.htm. Sharing of pre-clinical data would speed-up the validation of these new tools, leading to an update of the current preclinical testing standards used by the pharmaceutical industry. This would improve the efficiency and effectiveness of the whole drug development process in virology and vaccinology, and many other pharmaceutical sectors as well.

My best regards, Manuela T. Raimondi (www.nichoid.polimi.it)

ATTACHMENT:

Attachment Size

v10p7034.pdf 2.22 MB

Attachment	Size
v10p7034.pdf	2.22 MB

Author: Natalie Harrower

Date: 03 Jun, 2020

Dear Manuela, thank you for your comments. I have raised your point with the WG for discussion.

Author: Romain DAVID

Date: 07 Jun, 2020
Congratulations!

This RDA recommendation work on covid-19 data is amazing. It is very likely that we will become adopters of the recommendations of standards contained in this work in the network of laboratories “BSL4” (ERINHA is a European Research Infrastructure of biocontainment laboratories which specialize in infectious disease research: www.erinha.eu). The problem today remains the implementation, especially for small structures. Where to start? How to obtain means? It would be useful to prioritize some recommendations over others (perhaps data stewardship?).

Concerning the ethical and privacy considerations and reducing the risk of data misuse, another important aspect that could enrich this document is how to ensure compliance with the FAIR principles while taking into account the dangers and pitfalls concerning dual use type data (because they are often the so good reason to not share).

Dual Use Research of Concern (DURC) are defined by the United States Government Policy for Oversight of Life Sciences Dual Use Research of Concern as ”Life sciences research that, based on current understanding, can be reasonably anticipated to provide knowledge, information, products, or technologies that could be directly misapplied to pose a significant threat, with broad potential consequences, to public health and safety, agricultural crops and other plants, animals, the environment, materiel, or national security.”

This definition may relate to some research data on covid-19. Many countries or regions have adopted laws to regulate exchanges concerning DURC. As an example, the European text exists for dual use items in general [https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A32009R0428]. How can we take these aspects into account so that the respect of the FAIR principles is done while respecting these legislations? How to increase the possibilities of sharing dual use data, to reduce overlap, encourage collaboration on DURC while securing their uses?

For example, what should be prohibited using shaired data?

Demonstrate how to render a vaccine ineffective

Enhance the harmful consequences of a pathogen or toxin or render a non-pathogen virulent

Increase the transmissibility of a pathogen

Alter the host range of a pathogen or toxin

Enable evasion of diagnostic or detection modalities

Enhance the susceptibility of a host population to a pathogen or toxin

Generate or reconstitute certain eradicated or extinct pathogens or toxins

Enable weaponization of a biological agent or toxin.

In this regard, pre-approved data sharing agreements is probably one major possible recommendation with highly secured repositories. A consideration of DURC in your suggestion of Data Access Board could be careful. Conversely, another pitfall would be to be too careful and to classify all data in dual uses, and to prevent re-use despite public health issues. Indeed, reduce duplication of effort and improve trial design concerning Dual Use research is a challenge, at least inside a country or a community of countries (e.g. Europe). For this dual use data sharing, information should be available centrally on an intergovernmental web page with explicit authority and or be applied considering officials sharing conventions.

I imagine that an appropriate answer to all these questions will be difficult to provide in the context of these recommendations, but in my opinion, the subject (DURC) and the related issues deserve to be mentioned (perhaps as priority issues).

In the social science section, it is suggested “Data should be stored in at least one non-proprietary format that is well-documented”. I think this proposal should be taken up by all the groups and / or generalized.

Finally, I did not see in the reference document file naming conventions, which improve the readability and management of data files when these data are retrieved following access to a repository. A homogenization of file name schemes seems to me as necessary as the use of other standards for research objects (all the more in interdisciplinary sciences).

For the glossary, I would suggest adding a few definitions: DURC (defined above) and sensitive data (not only Sensitive Personal Data eg p 45), (possible definition https://www.openaire.eu/sensitive-data-guide) and “research data” because future readers of these recommendations do not always understand the outline. (see https://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/...)

My remarks and suggestions are to be discussed, take into account what seems relevant to you! I thank the editorial team and the active collaborators for this remarkable work of synthesis.

Author: Claudia Bauzer ...

Date: 07 Jun, 2020

The following sum up suggestions from 3 members of the Brazilian Academy of Sciences. Many others wrote congratulating RDA on this document. - OMICS, EPIDEMIOLOGY, and SOFTWARE:

1) Adalberto Val, biologist, works in the Amazon - other kinds of major sources of health emergencies

Zoonosis is missing in the long document. I would suggest including this major current challenge in the document, both direct zoonosis (from animals to human) and reverse zoonosis (from human to animals), if possible and pertinent

Dr. Vaz also missed any reference to the Amazon region. With deforestation the number of zoonoses is increasing dramatically, some of which are spreading at a continental scale (South America). Data sharing helped determine that the reappearance of yellow fever all over Brazil was due to a major environmental disaster in central Brazil. (These remarks added by Claudia)

2) Daniel Martins, works in -omics = suggestion for the "omics" group

I would like to suggest something very simple about proteomics data sharing.

The document could suggest that researchers submit to the data repositories not only the mass spectrometry raw files, but also the files regarding the analyses performed (peptides and protein search and quantification). Because of the different instruments and softwares that may be used to this end, it may be interesting toaccess the exact settings and parameters used by researchers in their publications.

3) Sandoval Carneiro, works in Engineering - he contacted colleagues who have had lots of experience with health data (vaccines, infectologists). Forwarded suggestion from Dr Gulherme Oliveira, who worked in -omics in the Amazon area and is now scientific director of a research institution:

Epidemiology - should make code/script of models available

Software - use GitHub for distribution, but maintenance is time consuming and hard for academics. There are thousands of programs being generated, and it is extremely hard to decide which to adopt, and any decision depends on years of use. Best to hire a firm that has proved experience in its use, otherwise only time will advise. A possibility would be to destribute software as service, and not product. During pandemics, code availability is essential.

Author: Carme Plasencia

Date: 08 Jun, 2020

One of the major issues found for those working on developing novel therapeutic or diagnostic strategies is not only linked to Omic data, but also to modelling the disease, including appropiate models for infection for screening and evaluation of the relevant information acheived from OMICs.

Author: Michele Loi

Date: 08 Jun, 2020

"

Ideally, consent should be sought for collecting, processing, sharing and publishing data. However, there are other legal bases for processing personal data. Some specific examples from the European General Data Protection Regulation (GDPR, 2016) are described below. Our recommendation would therefore be as follows:

1. Where possible, use data where the data subject has provided a valid consent that includes or is compatible with intended use of the data and complies with the requirements on consent in the specific country or region."

I am not aware of any general, across-the board, context-independent justification for preferring informed consent as a preferable legal basis of data processing.

For example: in some EU countries and Switzerland, the use of COVID 19 contact tracing (voluntary) app is regulated by law, the legal basis for all uses of data (for contact tracing purposes and, in an anonymized fashion, for research) is the law. The national law specifies much more rigorous and strict criteria of data protection than terms and conditions subjected by (allegedly) "informed" consent of data use by the app users would ever achieve. The law also demands rigorous guarantees of de-anonymization, something that most users even ignore. I cannot imagine a better protection for the use of such data (both for public health purposes and for scientific purposes) through the mechanisms of using informed consent (as opposed to the law of the state) as a legal basis for processing all the data in question.

Hence, I do not support such claims by the ethics work package.

Michele Loi

Institute of Biomedical Ethics and the History of Medicine

University of Zurich

Author: Giorgio Rossi

Date: 08 Jun, 2020

From Italian EOSC GB and COVID-19 platform contact group

It is recommended that a more compact document is produced, with less redundant text that reduce the effectiveness of the current draft.

All general recommendations should be condensed in Chapter 2 “Foundational Elements”

Only specific recommendations should be contained by the sub-group sections. E.g. under 3.3.1 about “trustworthy” the general concepts are repeated with no advantage for the reader. Here only specific recommendations for clinical data should be given.

Table 1 should contain stronger and compact statements

Not all issues described as “challenges” are clear. E.g. in the clinical sub-group

the effective challenges are:

“Timely sharing of clinical data, protecting privacy Guidelines for Researchers: Standardized clinical terminologies Recommendations …: Organize the data sharing in suitable, trustworthy, secure data repository.

Other statements like: “Promotion of clinical data sharing is important due to many studies and trials being performed under enormous time pressure” do not define a challenge and reduces clarity.

2.2.1 recommends that the evaluation of public research grants should give value to open-data practices. It would make sense not to restrict to grants but stating that all research data based on public research funding should be made available and exploitable in a timely manner, in particular for those of critical interest during an emergency situation.

2.2.4 Data Management Plans are mandatory and useful, but should be as standardized as possible in the key features, in order not to create bureaucratic barriers for data exchange

2.2.8 Timely vs. Reliable is a key issue. In urgent need of research on data that could lead to translational actions (e.g. medicine) it is necessary to adopt criteria of reliability. One possibility is to recognize as priority for access those FAIR data sets from high-quality sources (institutions, research groups) as those are expected to be more reliable than contributions from uncontrolled sources that generate more noise than useful information.

Open Peer Review could be a way-ahead, but it must be made more explicit in the document what it means and how it works and finally a formal recommendation should be made.

The guidelines do mention alternative solutions (standards, formats, platforms, databases) that are exploited by the most advanced “omics” community, but do not suggest a pragmatic way ahead for the less advanced domain of clinical data

SOFTWARE

Again all software produced with public support should be open source and accessible, not only what is produced under grants

Proposals could adopt rules (like ISO 25000 and 27000) to use the software products (interoperable and serviced) to make it really transferable. E.g. software should be made available through consolidated practices like github, gitlab in order to favor research efforts vs. management plans.

The document lack of guidelines for code development of software to be made eventually open.

Collaborative software should be mentioned and encouraged.

No recommendation nor encouragement are made to use open source libraries or frameworks to develop research data analysis. No mention of Open Source Initiative or to Free Software Foundation that actually support the production of free software and provide good practices.

The GDPR issue is also central. An open debate on its interpretation leading to guidelines for its correct use and related good practices to avoid that: a) data are not shared fearing to violate GDPR, b) data are retained by the source claiming GDPR restrictions even when these do not actually apply.

Author: Hugh Shanahan

Date: 11 Jun, 2020

Many thanks for these comments.

With respect to Research Software, in terms of scope, we are focussed on Research Software rather than the wider issue of publicly funded software. Likewise, given the constraints on length we provide references for good practices in software development rather than trying to summarise those points. We made a deliberate decision to use a broad definition for software and hence even though open source libraries are not explicitly mentioned it is part of that definition. One of our guidelines for researchers is to make use of tools such as GitHub and GitLab which encourages collaborative development. Since GDPR is focussed on data rather than software we have deferred to other parts of the report where it is discussed in the Data Sharing in Clinical Medicine and Legal and Ethical Considerations.

Author: Philippe Després

Date: 08 Jun, 2020

The Imaging data section should encourage users to adopt good practices to report findings (e.g, outcomes, clinical variables, radiomic features), i.e. embedding these elements within DICOM Structured Reports along with the context of this information: who, when, how. This is well explained in this paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2666949/ This would allow for FAIRer, more robust imaging data (as opposed to recording outcomes and other clinical data in a separate, non-DICOM container).

The sentence "A list of imaging standards and repositories is available in the RDA-endorsed FAIRsharing" could be misleading here as DICOM should be the only standard for any image-related data.

Author: Claire Austin

Date: 08 Jun, 2020
I have received comments from several researchers, which I have collated together into this single posting.

A. General

The document is too long

The document is very repetitive, hampering its readability.

Suggest defining data as per CODATA's Beijing Declaration on Research Data, and IRiDiuM glossary.

Recommendations and guidelines are aimed at a broader audience than researchers and data stewards (e.g., public health officials). (p 10).

May want to distinguish between developers, vendors and private-sector data-holders. These roles often overlap, but are distinct in terms of interests and pressures.

Replace, "It's good enough," with "data of known quality." (p 11)

Remove ambiguous language, such as "where possible."

Use publishing standard in-text citation practices.

Should touch upon the need for data sharing agreements amongst all levels – government, P/Ts, healthcare system, and international authorities).

B. Epidemiology, and its supporting output.

Provide comparative information about population data sources to help guide researchers in which ones to select (Table 2, p 38).

Preparation, Detection, and Response: Add the notion of resilience; how to build resilience into the system. When the degree of complexity and interdependencies increase in human made systems, there is always the risk of collapse if not enough balance is built into the system (both the IT infrastructure, data governance, and the "people" part). This feeds into Integrity.

More information needed about Survey Initiatives.

More information needed about privacy in epidemiology.

C. Community

Not sure that encouraging community involvement throughout the data lifecycle would work in practice with, for example, a contact tracing app that is based on already-developed open source code.

For contact tracing apps, this is still a very generous guideline, as even some highly sensitive info (e.g. what floor of a high-rise you live on) could be seen as helping to answer a health question.

Provide more detail on methods aimed at protecting personal data. Mention centralized vs. decentralized servers.

What type of data should not be preserved (e.g. personally identifying info such as phone numebrs).

LEGAL/ETHICAL

The notion of broad consent may not be supported by law and may only be allowed by some research ethics policy guidelines under strict conditions. In Canada, the Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans-2, does not use the term broad consent. Rather, research ethics boards are able to relax rules for fully informed and ongoing consent under certain conditions. This is particularly important for the secondary use of research data and is critically important when establishing biobanks of human biological material and associated data.

Put as the first obligation that all research projects using human data must be approved by an independent research ethics board (or research ethics committee, or institutional research board) prior to the recruitment of participants and the collection of data. This is a universal requirement for all research involving humans and so should be the very first item in the list.

In Canada, IRB/REC are Research Ethics Boards do not only provide guidance. They have authority beyond just a review of a project. They approve, reject, require modification, and stop research projects.

The term ‘expedited’ is not used in Canadian research ethics policy statements because it can imply less ethically stringent REB review. Rather, in Canada we support efficient ethics review while maintaining high ethical standards. So, we do not agree with this part of the recommendation as stated. Also, we don’t see how it would be possible for a REB to both review projects expeditiously and seek public approval. In fact, REBs never seek public approval for their decisions. So, we don’t understand the second part of this recommendation.

Suggest also considering the OECD Recommendations on Health Data Governance. Sets out best practices related to: consent frameworks, review and approval procedures, safeguards to protect personal info, etc. https://www.oecd.org/health/health-systems/Recommendation-of-OECD-Counci...

D. APPS

Are you precluding automatic exposure notifications.

Consider how notifications may be worded (e.g., you MAY have been exposed so monitor your symptoms vs. you HAVE been exposed so go get tested).

Consider personal telephone calls instead of automated alerts.

Author: Natalie Harrower

Date: 11 Jun, 2020

Thanks Claire, I will send this to the WG listserv now.

Author: Kathryn Cassidy

Date: 10 Jun, 2020

This is a really useful resource, thank you!

I had a look through and compared with the COAR guidelines (https://www.coar-repositories.org/news-updates/covid19-recommendations/). Yours are obviously more comprehensive, but one idea from COAR that I thought was good was the recommendation to include the keyword "COVID-19" in the metadata.

At a panel on COVID research at the Open Repositories conference last week it was also noted that repositories are having to do complex searches to retrieve COVID-related content. One participant offered this search string that they have been using

year > 2018 AND (“COVID-19” OR “SARS-CoV-2” OR “2019-nCoV” OR “HCoV-19”) OR [(“Coronavirus” OR “Severe Acute Respiratory Syndrome”) AND “Wuhan”]

So it's clear that researchers and repositories are tagging COVID-related research outputs in many different ways. It might be a useful addition to these recommendations to propose standardised keyword / subject terms to tag COVID-related content in order to enhance discoverability.

I'd note that Library of Congress Subject Headings includes and entry for COVID-19 (Disease) http://id.loc.gov/authorities/subjects/sh2020000570

Author: Natalie Harrower

Date: 11 Jun, 2020

Thank you Kathryn this is a very important point.

submit a comment

RDA COVID-19

Status: Recognised & Endorsed

Chair(s): Juan Bicarregui, Anne Cambon-Thomsen, Ingrid Dillo, Natalie Harrower, Sarah Jones, Mark Leggott, Priyanka Pillai

O&A Members

MEMBERSHIP

RDA Groups

The Research Data Alliance

Membership

RDA Working and Interest Groups

RDA Solutions

RDA domain research

RDA COVID-19 Guidelines and Recommendations (draft versions)

You are here