RDA COVID-19 Guidelines and Recommendations (draft versions)
Please note: this is the landing page for the 5 draft versions of the RDA COVID-19 Working Group Recommendations and Guidelines on Data Sharing. These materials are made available following the RDA guiding principle of openness, to allow members and the wider community to view the progression of the document as well as comments received during the 5 review cycles.
The final version of the document can be found:
- via the RDA COVID-19 working group's outputs page, which contains supporting details, context, and a link to the DOI, or
- directly in Zenodo, DOI: https://doi.org/10.15497/rda00052
RDA COVID-19 Working Group |
Group Co-chairs: Juan Bicarregui, Anne Cambon-Thomsen, Ingrid Dillo, Natalie Harrower, Sarah Jones, Mark Leggott, Priyanka Pillai |
Subgroup Moderators: Clinical: Sergio Bonini, Dawei Lin, Andrea Jackson-Dipina, Christian Ohmann Community Participation: Timea Biro, Kheeran Dharmawardena, Eva Méndez, Daniel Mietchen, Susanna Sansone, Joanne Stocks Epidemiology: Claire Austin, Gabriel Turinici Indigenous Data: RDA International Indigenous Data Sovereignty Interest Group Legal and Ethical: Alexander Bernier, John Brian Pickering Omics: Natalie Meyers, Rob Hooft Social Sciences: Iryna Kuchma, Amy Pienta Software: Michelle Barker, Hugh Shanahan, Fotis Psomopoulos |
Editorial team: Christoph Bahim, Alexandre Beaufays, Ingrid Dillo, Natalie Harrower, Mark Leggott, Nicolas Loozen, Robyn Nicholson, Priyanka Pillai, Mary Uhlmansiek, Meghan Underwood, Bridget Walker |
Recommendation title: RDA COVID-19; recommendations and guidelines, 5th release (final draft) 28 May 2020 |
Authors: RDA COVID-19 Working Group and Subgroup Members |
Citation: RDA COVID-19 Working Group. recommendations and guidelines. Research Data Alliance, 2020. DOI: https://doi.org/10.15497/rda00046 |
Note: The overarching COVID-19 Working group includes the specific working group members, events, posts, wiki and documents can be viewed here: |
Disclaimer: The views and opinions expressed in this document are those of the individuals identified, and do not necessarily reflect the official policy or position of their respective employers, or of any government agency or organisation. |
Context:
During a pandemic, data combined with the right context and meaning can be transformed into knowledge for informing public health responses. Timely and accurate collection, reporting and sharing of data with the research community, public health practitioners, clinicians and policy makers will inform assessment of the likely impact of a pandemic to implement efficient and effective response strategies.
Public health emergencies clearly demonstrate the challenges associated with rapid collection, sharing and dissemination of data and research findings to inform response. There is global capacity to implement systems to share data during a pandemic, yet the timeliness of accessing data and harmonisation across information systems are currently major roadblocks. The World Health Organisation’s (WHO) statement on data sharing during public health emergencies clearly summarises the need for timely sharing of preliminary results and research data. On 28 May 2020, the G7 Science and Technology Ministers’ Declaration on COVID-19 was issued, which calls for government-sponsored COVID-19 epidemiological and related research results, data, and information to be accessible to the public to the greatest extent possible.There is also a strong support for recognising open research data as a key component of pandemic preparedness and response, evidenced by the 117 cross-sectoral signatories to the Wellcome Trust statement on 31st January 2020, and the further agreement by 30 leading publishers on immediate open access to COVID-19 publications and underlying data.
Objectives:
The objectives of the RDA COVID-19 Working Group (CWG) focusing on essential Clinical, Community Participation, Epidemiology, Indigenous Data, Legal & Ethical, Omics, and Social Sciences, and Software are:
- to clearly define detailed guidelines on data sharing under the present COVID-19 circumstances to help stakeholders follow best practices to maximize the efficiency of their work, and to act as a blueprint for future emergencies;
- to develop guidelines for policymakers to maximise timely, quality data sharing and appropriate responses in such health emergencies;
- to address the interests of researchers, policy makers, funders, publishers, and providers of data sharing infrastructures.
5th Release - Final Draft for public comment:
The RDA COVID-19 Working Group (CWG) members bring various expertise to develop a body of work that comprises how data from multiple disciplines inform response to a pandemic combined with guidelines and recommendations on data sharing under the present COVID-19 circumstances. The work has been divided into four research areas with four cross cutting themes, as a way to focus the conversations, and provide an initial set of guidelines in a tight timeframe. The detailed guidelines in this body of work is aimed to help stakeholders follow best practices to maximise the efficiency of their work, and to act as a blueprint for future emergencies. The recommendations in the document are aimed at helping policymakers and funders to maximise timely, quality data sharing and appropriate responses in such health emergencies.
The CWG is addressing the development of such detailed guidelines on the deposit of different data sources in any common data hub or platform. The guidelines aim at developing a system for data sharing in public health emergencies that supports scientific research and policy making, including an overarching framework, common tools and processes, and principles that can be embedded in research practice. The guidelines contained herein address general aspects that data should adhere to, for example the FAIR principles (that research outputs should be Findable, Accessible, Interoperable, and Reusable), or the adoption of research domain community standards.
These detailed guidelines are supplemented with higher level recommendations aimed at the other stakeholder groups who need to work together with the researchers and data stewards to realise the timely and open sharing of research data as a key component of pandemic preparedness and response.
The work has been divided into four research areas with four cross-cutting themes, as a way to focus the conversations, and provide an initial set of guidelines in a tight timeframe.
The RDA COVID-19 WG was initiated after a conversation between the RDA and the European Commission. The first meeting of the CWG to determine the work was held in March. As of May, the CWG counted over 440 members, evenly spread across the different sub-groups. This effort also reflects the work of a host of other RDA Working Groups, as well as external stakeholder organisations, that has developed over a number of years.
The CWG and the sub-groups operate according to the RDA guiding principles of Openness, Consensus, Balance, Harmonisation, Community-driven, Non-profit and technology-neutral and are open to all.
The 5th release (final draft) starts in Section 2 with an overview of foundational, overarching elements that emerged across the different research areas. These recommendations touch upon a number of well-known topics in research data sharing and align to the statements made by many organisations and governance bodies, including but not limited to, the WHO, the G7 Science and Technology Ministers, the publishing industry and Wellcome Trust.In Sections 3 to 6 the focus is on the COVID-19 related research areas. Each section starts with a description of the area and the focus and scope of the work done, followed by the actual recommendations and guidelines. In sections 7 to 10 this same structure is used for the four cross-cutting themes. The document contains an extended glossary of terms to support the reader (Section 11), an overview of useful additional resources (Section 12) and a list of references (Section 13). Section 14 lists the contributors to this work.
Timing and Future Releases:
This is the fifth and final draft of the Recommendations and Guidelines from the RDA COVID-19 working group, and is open for public comment until 8th of June 2020. Following the open period, feedback will be considered and then the WG will seek endorsement of the document from the RDA governance bodies prior to final publication.
More information and insights on the plans for those releases, as well as highlights from the sub-groups are given during the informative webinars – see COVID-19 WG Events page for details and to access previous recordings and presentations.
Request for Comments:
In the spirit of the RDA community and its open process, your feedback on the content of the releases, the scope, the direction, are vital for all involved to shape and focus the document and the sections to be a useful and meaningful tool.
A request from the editorial team: please indicate the section of the document or the indicator that your comment is about, and, if possible, also include a suggestion for improvement. Many thanks!
- Log in to post comments
- 47056 reads
Author: Patrick Dunn
Date: 28 Apr, 2020
What are the thoughts on the use of ISARIC-WHO Case Report Forms (CRFs) to collect data on individuals presenting with suspected or confirmed COVID-19 as a way of encouraging a core set of clinical data for COVID-19 patients?
https://isaric.tghn.org/covid-19-clinical-research-resources/
Apologies if this is redundant.
Cheers,
Patrick Dunn
Author: Abdelkrim Boujraf
Date: 30 Apr, 2020
To whom it may concern,
I am a bit knowledgable about software manufacturing;
The recommendations are a source of useful information for IT Solutions Architects who want to implement the IT solutions supporting the five subgroups.
Here are some challenges to ensure a smooth implementation of such IT Solutions:
GREAT JOB!
Abdelkrim Boujraf
Links
(1) https://data-visualization.readthedocs.io/en/latest/dataviz/dashboards.html
(2) https://data-visualization.readthedocs.io/en/latest/dataviz/rda_covid-19...
Author: Hugh Shanahan
Date: 03 Jun, 2020
Hi - apologies for not getting to this.
I agree that Industry could benefit more from the outputs of the RDA (though there is some contact through specific industries such as Pharma).
The scope of the report is for researchers, policy makers and publishers and hence to keep in scope we've not provided an introductory guide to the research topics themselves (as it is the document is 120 pages long).
There will be a decision tree to aid the above which I think is similar to what you're suggesting.
Thanks again
Hugh
Author: Mogens Thomsen
Date: 05 May, 2020
As a member of the COVID-19 clinical subgroup I have followed the webinars that were very informative and I have had a look at the 2nd release of Recommendations and guidelines. The document looks promising.
I have a few suggestions :
4.2.4 first line
it is critical to spend limited time and resources on reliable data sources … I propose :
it is important to concentrate the efforts on scrutinizing reliable data sources …
4.3
The list of additional working documents & links might be divided into references to articles and static web pages (item 4,6,7,8) and web sites with resources that are updated continuously.
For the other sections of the document I would also prefer that additional working documents and links and also the references were separated as mentioned above.
It is important that ISO standards are respected and in that context it should be avoided to have tables and figures with dates that do not follow the rules (e.g. 4/5 is the 5 April in US English and 4 May in UK English)
Finally, it is a good idea to have two new overarching subgroups. However, for the legal and ethical guidelines there is some redundancy with some parts of the chapter 8.2.4 in the Social Sciences subgroup. But I am sure that this will be worked out.
I look forward to see the 3rd release of the Guidelines.
Best regards
Mogens Thomsen
Author: Nora Dörrenbächer
Date: 06 May, 2020
Dear all,
the German Data Forum (RatSWD) is an advisory council to the German federal government for the research data infrastructure of the social, behavioural and economic sciences, connecting a network of currently 34 accredited Research Data Centres. We are an organisational member of RDA since 2015. Currently, we are also working on ways to provide researchers with guidelines on how to manage their data when researching the impact of Covid 19 on social issues (see here) – with a focus on research in German speaking countries. Therefore, we are highly excited about the initiative of the RDA Covid 19 working group to establish standards for data sharing and highly welcome the initiative. We would like to add the following suggestion to the 2nd release document:
8.2.2 Documentation, Standards, and Data Quality:
(addition) Ensuring data quality
In case of data collection using smartphone apps and other sensors:
8.2.3. Storage and Backup:
Highlight that much social science data is sensitive personal data. Before sharing and storing the data anonymisation and/or pseudonymisation must be implemented.
8.2.4 Legal and Ethical Requirements:
Specify point. 2
„Researchers have a responsibility for ensuring research participants understand that there may be a risk of re-identification when data are shared. Generally, researchers should take all possible measures to reduce the risk of re-identification. These attempts should be documented.
Add a section
8.2.6. Ensuing access to data in times of the Corona pandemic
Formally anonymized microdata from official statistics are a central basis of numerous empirical research projects. Often, this data can only be used by researchers in a fixed location at guest workstations or via remote execution (i.e. generally without viewing and browsing data or results on the screen). In the current Corona pandemic, guest workstations are hardly usable and researchers have to work in their home office. Consequently, they no longer have access to important data.
Recommendation:
Author: Dawei Lin
Date: 06 May, 2020
7.2.1 Recommendations for virus genomics data Repositories
For assembled and annotated genomes we suggest deposition in one or more of these archives: NCBI GenBank, DDBJ Annotated/Assembled sequences, European Nucleotide Archive (ENA) Assembled/Annotated sequences, and/or NCBI Virus.
The 'and/or NCBI Virus' should be removed. NCBI Virus is a custom view of GenBank and RefSeq data, plus added analysis tools.
Under 7.2.2 - What is listed for 'Cell lines/Animals' is surprising as they are all gene expression resources not sample metadata resources. I would consider listing INSDC BioSample resources there instead.
Kim D. Pruitt, Ph.D
Chief, Information Engineering Branch
NCBI/NLM/NIH
Author: Rob Hooft
Date: 14 May, 2020
Dear Kim,
Thank you very much for your review of our draft recommendation. We have taken your suggestions to heart and now refer to NCBI Virus as a NCBI genbank subrecource. We have also made it more clear that submission to the genome-phenome and genome expression archives leads to creation of several data sets in multiple databases that are linked to each other.
Rob Hooft, representing The Omics team
Author: Peter Cornwell
Date: 21 May, 2020
There is duplication between statements made under the Community Participation and Data Sharing Section 4., and Data Sharing in Social Sciences 7. While restrictions about data sharing in SS are more specific and Section 7. might warrant independent clarification, 7.4.5 is currently fairly generic and also more comprehensive than the text at 4.4.3. The Epidemiology and Omics sections don't seem to have separate preservation discussions. The analysis preservation statements currently at the bottom of page 66 are valuable, if ambitious. As a quick fix maybe you would consider replacing 4.4.3 with the text at 7.4.5 ?
The preservation advice is basically use repository technology if possible; preferably a "trustworthy digital repository committed to preservation"; even better a "disciplinary repository .. for maximum visibility". It is worth recalling that in 2019 DataCite r3d100011136 which represented a major output of the Gates Foundation project on malaria at https://www.vecnet.org/ (these links still probably don't go anywhere) became unavailable. It had to be redelivered from an unsupported Fedora version using donated funds because this was too unfashionable for mainstream funders. Hopefully we can protect COVID research data more effectively. Maybe you would also consider building a repository of COVID publications similar to the vecnet model, with the benefit of current outputs rather than having to gather material from previous decades. Automated treatment techniques pioneered by Plazi in biodiversity could then be applied to significantly increase accessibility.
Author: Natalie Harrower
Date: 03 Jun, 2020
Dear Peter thank you for you comment. We have made substantial revisions in the latest release (5th/28th May) with an attempt to remove duplication. Please let us konw if you find any remaining issues on your points. Many thanks.
Author: Laurence DELHAES
Date: 30 May, 2020
Hi everyone
Great job !
No specific comment for me.
Cheers
Laurence
Author: Natalie Harrower
Date: 03 Jun, 2020
On behalf of all the contributors, thank you, Laurence!
Author: Manuela Teresa ...
Date: 03 Jun, 2020
Dear RDA COVID-19 Working Group 2020,
thanks for the huge work undertaken. May I suggest a further point, i.e. the promotion of "pre-clinical" data sharing as a cross-cutting theme transversal to many others including Clinical, Omics and Community.
The decision on which therapeutic agents or vaccinal strategies to bring into clinics, should be dictated by standard-shared and data-shared scientific evidence. This evidence should be based on integrated results of existing in silico, in vitro and in vivo preclinical studies. To speed-up the development of drugs, the experimental studies, in particular, should be based on the use of cutting-edge research tools instead of conventional tools that have a limited predictive capability. I have reviewed the updated set of research tools (in silico, in vitro and in vivo) already used in virology and vaccinology in the context of SARS-CoV-2: https://www.thno.org/v10p7034.htm. Sharing of pre-clinical data would speed-up the validation of these new tools, leading to an update of the current preclinical testing standards used by the pharmaceutical industry. This would improve the efficiency and effectiveness of the whole drug development process in virology and vaccinology, and many other pharmaceutical sectors as well.
My best regards, Manuela T. Raimondi (www.nichoid.polimi.it)
Author: Natalie Harrower
Date: 03 Jun, 2020
Dear Manuela, thank you for your comments. I have raised your point with the WG for discussion.
Author: Romain DAVID
Date: 07 Jun, 2020
Congratulations!
This RDA recommendation work on covid-19 data is amazing. It is very likely that we will become adopters of the recommendations of standards contained in this work in the network of laboratories “BSL4” (ERINHA is a European Research Infrastructure of biocontainment laboratories which specialize in infectious disease research: www.erinha.eu). The problem today remains the implementation, especially for small structures. Where to start? How to obtain means? It would be useful to prioritize some recommendations over others (perhaps data stewardship?).
Concerning the ethical and privacy considerations and reducing the risk of data misuse, another important aspect that could enrich this document is how to ensure compliance with the FAIR principles while taking into account the dangers and pitfalls concerning dual use type data (because they are often the so good reason to not share).
Dual Use Research of Concern (DURC) are defined by the United States Government Policy for Oversight of Life Sciences Dual Use Research of Concern as ”Life sciences research that, based on current understanding, can be reasonably anticipated to provide knowledge, information, products, or technologies that could be directly misapplied to pose a significant threat, with broad potential consequences, to public health and safety, agricultural crops and other plants, animals, the environment, materiel, or national security.”
This definition may relate to some research data on covid-19. Many countries or regions have adopted laws to regulate exchanges concerning DURC. As an example, the European text exists for dual use items in general [https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A32009R0428]. How can we take these aspects into account so that the respect of the FAIR principles is done while respecting these legislations? How to increase the possibilities of sharing dual use data, to reduce overlap, encourage collaboration on DURC while securing their uses?
For example, what should be prohibited using shaired data?
Demonstrate how to render a vaccine ineffective
Enhance the harmful consequences of a pathogen or toxin or render a non-pathogen virulent
Increase the transmissibility of a pathogen
Alter the host range of a pathogen or toxin
Enable evasion of diagnostic or detection modalities
Enhance the susceptibility of a host population to a pathogen or toxin
Generate or reconstitute certain eradicated or extinct pathogens or toxins
Enable weaponization of a biological agent or toxin.
In this regard, pre-approved data sharing agreements is probably one major possible recommendation with highly secured repositories. A consideration of DURC in your suggestion of Data Access Board could be careful. Conversely, another pitfall would be to be too careful and to classify all data in dual uses, and to prevent re-use despite public health issues. Indeed, reduce duplication of effort and improve trial design concerning Dual Use research is a challenge, at least inside a country or a community of countries (e.g. Europe). For this dual use data sharing, information should be available centrally on an intergovernmental web page with explicit authority and or be applied considering officials sharing conventions.
I imagine that an appropriate answer to all these questions will be difficult to provide in the context of these recommendations, but in my opinion, the subject (DURC) and the related issues deserve to be mentioned (perhaps as priority issues).
In the social science section, it is suggested “Data should be stored in at least one non-proprietary format that is well-documented”. I think this proposal should be taken up by all the groups and / or generalized.
Finally, I did not see in the reference document file naming conventions, which improve the readability and management of data files when these data are retrieved following access to a repository. A homogenization of file name schemes seems to me as necessary as the use of other standards for research objects (all the more in interdisciplinary sciences).
For the glossary, I would suggest adding a few definitions: DURC (defined above) and sensitive data (not only Sensitive Personal Data eg p 45), (possible definition https://www.openaire.eu/sensitive-data-guide) and “research data” because future readers of these recommendations do not always understand the outline. (see https://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/...)
My remarks and suggestions are to be discussed, take into account what seems relevant to you! I thank the editorial team and the active collaborators for this remarkable work of synthesis.
Author: Claudia Bauzer ...
Date: 07 Jun, 2020
The following sum up suggestions from 3 members of the Brazilian Academy of Sciences. Many others wrote congratulating RDA on this document. - OMICS, EPIDEMIOLOGY, and SOFTWARE:
1) Adalberto Val, biologist, works in the Amazon - other kinds of major sources of health emergencies
Zoonosis is missing in the long document. I would suggest including this major current challenge in the document, both direct zoonosis (from animals to human) and reverse zoonosis (from human to animals), if possible and pertinent
Dr. Vaz also missed any reference to the Amazon region. With deforestation the number of zoonoses is increasing dramatically, some of which are spreading at a continental scale (South America). Data sharing helped determine that the reappearance of yellow fever all over Brazil was due to a major environmental disaster in central Brazil. (These remarks added by Claudia)
2) Daniel Martins, works in -omics = suggestion for the "omics" group
I would like to suggest something very simple about proteomics data sharing.
The document could suggest that researchers submit to the data repositories not only the mass spectrometry raw files, but also the files regarding the analyses performed (peptides and protein search and quantification). Because of the different instruments and softwares that may be used to this end, it may be interesting toaccess the exact settings and parameters used by researchers in their publications.
3) Sandoval Carneiro, works in Engineering - he contacted colleagues who have had lots of experience with health data (vaccines, infectologists). Forwarded suggestion from Dr Gulherme Oliveira, who worked in -omics in the Amazon area and is now scientific director of a research institution:
Epidemiology - should make code/script of models available
Software - use GitHub for distribution, but maintenance is time consuming and hard for academics. There are thousands of programs being generated, and it is extremely hard to decide which to adopt, and any decision depends on years of use. Best to hire a firm that has proved experience in its use, otherwise only time will advise. A possibility would be to destribute software as service, and not product. During pandemics, code availability is essential.
Author: Carme Plasencia
Date: 08 Jun, 2020
One of the major issues found for those working on developing novel therapeutic or diagnostic strategies is not only linked to Omic data, but also to modelling the disease, including appropiate models for infection for screening and evaluation of the relevant information acheived from OMICs.
Author: Michele Loi
Date: 08 Jun, 2020
"
Ideally, consent should be sought for collecting, processing, sharing and publishing data. However, there are other legal bases for processing personal data. Some specific examples from the European General Data Protection Regulation (GDPR, 2016) are described below. Our recommendation would therefore be as follows:
1. Where possible, use data where the data subject has provided a valid consent that includes or is compatible with intended use of the data and complies with the requirements on consent in the specific country or region."
I am not aware of any general, across-the board, context-independent justification for preferring informed consent as a preferable legal basis of data processing.
For example: in some EU countries and Switzerland, the use of COVID 19 contact tracing (voluntary) app is regulated by law, the legal basis for all uses of data (for contact tracing purposes and, in an anonymized fashion, for research) is the law. The national law specifies much more rigorous and strict criteria of data protection than terms and conditions subjected by (allegedly) "informed" consent of data use by the app users would ever achieve. The law also demands rigorous guarantees of de-anonymization, something that most users even ignore. I cannot imagine a better protection for the use of such data (both for public health purposes and for scientific purposes) through the mechanisms of using informed consent (as opposed to the law of the state) as a legal basis for processing all the data in question.
Hence, I do not support such claims by the ethics work package.
Michele Loi
Institute of Biomedical Ethics and the History of Medicine
University of Zurich
Author: Giorgio Rossi
Date: 08 Jun, 2020
From Italian EOSC GB and COVID-19 platform contact group
It is recommended that a more compact document is produced, with less redundant text that reduce the effectiveness of the current draft.
All general recommendations should be condensed in Chapter 2 “Foundational Elements”
Only specific recommendations should be contained by the sub-group sections. E.g. under 3.3.1 about “trustworthy” the general concepts are repeated with no advantage for the reader. Here only specific recommendations for clinical data should be given.
Table 1 should contain stronger and compact statements
Not all issues described as “challenges” are clear. E.g. in the clinical sub-group
the effective challenges are:
“Timely sharing of clinical data, protecting privacy Guidelines for Researchers: Standardized clinical terminologies Recommendations …: Organize the data sharing in suitable, trustworthy, secure data repository.
Other statements like: “Promotion of clinical data sharing is important due to many studies and trials being performed under enormous time pressure” do not define a challenge and reduces clarity.
2.2.1 recommends that the evaluation of public research grants should give value to open-data practices. It would make sense not to restrict to grants but stating that all research data based on public research funding should be made available and exploitable in a timely manner, in particular for those of critical interest during an emergency situation.
2.2.4 Data Management Plans are mandatory and useful, but should be as standardized as possible in the key features, in order not to create bureaucratic barriers for data exchange
2.2.8 Timely vs. Reliable is a key issue. In urgent need of research on data that could lead to translational actions (e.g. medicine) it is necessary to adopt criteria of reliability. One possibility is to recognize as priority for access those FAIR data sets from high-quality sources (institutions, research groups) as those are expected to be more reliable than contributions from uncontrolled sources that generate more noise than useful information.
Open Peer Review could be a way-ahead, but it must be made more explicit in the document what it means and how it works and finally a formal recommendation should be made.
The guidelines do mention alternative solutions (standards, formats, platforms, databases) that are exploited by the most advanced “omics” community, but do not suggest a pragmatic way ahead for the less advanced domain of clinical data
SOFTWARE
Again all software produced with public support should be open source and accessible, not only what is produced under grants
Proposals could adopt rules (like ISO 25000 and 27000) to use the software products (interoperable and serviced) to make it really transferable. E.g. software should be made available through consolidated practices like github, gitlab in order to favor research efforts vs. management plans.
The document lack of guidelines for code development of software to be made eventually open.
Collaborative software should be mentioned and encouraged.
No recommendation nor encouragement are made to use open source libraries or frameworks to develop research data analysis. No mention of Open Source Initiative or to Free Software Foundation that actually support the production of free software and provide good practices.
The GDPR issue is also central. An open debate on its interpretation leading to guidelines for its correct use and related good practices to avoid that: a) data are not shared fearing to violate GDPR, b) data are retained by the source claiming GDPR restrictions even when these do not actually apply.
Author: Hugh Shanahan
Date: 11 Jun, 2020
Many thanks for these comments.
With respect to Research Software, in terms of scope, we are focussed on Research Software rather than the wider issue of publicly funded software. Likewise, given the constraints on length we provide references for good practices in software development rather than trying to summarise those points. We made a deliberate decision to use a broad definition for software and hence even though open source libraries are not explicitly mentioned it is part of that definition. One of our guidelines for researchers is to make use of tools such as GitHub and GitLab which encourages collaborative development. Since GDPR is focussed on data rather than software we have deferred to other parts of the report where it is discussed in the Data Sharing in Clinical Medicine and Legal and Ethical Considerations.
Author: Philippe Després
Date: 08 Jun, 2020
The Imaging data section should encourage users to adopt good practices to report findings (e.g, outcomes, clinical variables, radiomic features), i.e. embedding these elements within DICOM Structured Reports along with the context of this information: who, when, how. This is well explained in this paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2666949/ This would allow for FAIRer, more robust imaging data (as opposed to recording outcomes and other clinical data in a separate, non-DICOM container).
The sentence "A list of imaging standards and repositories is available in the RDA-endorsed FAIRsharing" could be misleading here as DICOM should be the only standard for any image-related data.
Author: Claire Austin
Date: 08 Jun, 2020
I have received comments from several researchers, which I have collated together into this single posting.
A. General
B. Epidemiology, and its supporting output.
C. Community
LEGAL/ETHICAL
D. APPS
Author: Natalie Harrower
Date: 11 Jun, 2020
Thanks Claire, I will send this to the WG listserv now.
Author: Kathryn Cassidy
Date: 10 Jun, 2020
This is a really useful resource, thank you!
I had a look through and compared with the COAR guidelines (https://www.coar-repositories.org/news-updates/covid19-recommendations/). Yours are obviously more comprehensive, but one idea from COAR that I thought was good was the recommendation to include the keyword "COVID-19" in the metadata.
At a panel on COVID research at the Open Repositories conference last week it was also noted that repositories are having to do complex searches to retrieve COVID-related content. One participant offered this search string that they have been using
year > 2018 AND (“COVID-19” OR “SARS-CoV-2” OR “2019-nCoV” OR “HCoV-19”) OR [(“Coronavirus” OR “Severe Acute Respiratory Syndrome”) AND “Wuhan”]
So it's clear that researchers and repositories are tagging COVID-related research outputs in many different ways. It might be a useful addition to these recommendations to propose standardised keyword / subject terms to tag COVID-related content in order to enhance discoverability.
I'd note that Library of Congress Subject Headings includes and entry for COVID-19 (Disease) http://id.loc.gov/authorities/subjects/sh2020000570
Author: Natalie Harrower
Date: 11 Jun, 2020
Thank you Kathryn this is a very important point.