Composed by: Marek Cebecauer (RDA/EOSC Future Ambassador for Materials Sciences and Engineering) Downloadable disciplinary info sheet: Materials Sciences and Engineering |
Data sharing in Materials Sciences and Engineering
The Materials Sciences and Engineering (MSE) domain covers a highly heterogenous community of scientists. These include experimentalists and computational scientists. It comprises also commercial subjects and manufacturing. Open sharing of data is still in its early phase (childhood) in this domain, but several resources and tools have been created to facilitate this transition. A good example are community repositories such as NOMAD, DICE, and Materials Commons (to name just a few). These sites enable unrestrained data sharing, independent of whether data are open or with a reasonable access restriction.
Material scientists are also very active in the development of electronic laboratory notebooks (ELN) and laboratory information management systems (LIMS) for automatic data and metadata acquisition. NOMAD Oasis, KaDi4Mat and NexusLIMS were developed by the MSE research groups. However, their application is not always domain-specific (e.g., KaDi4Mat). Several ontologies are being developed to improve the interoperability of shared data. Among these, BFO, EMMO, and Metadata4Ing are upper- or medium-level ontologies covering a broader spectrum of disciplines and administrative/interoperability metadata. Others are more specialized, lower-level ontologies for more detailed, disciplined description of data (e.g., Materials Design Ontology – MDO or OM2 for units of measure). Moreover, selected repositories and data resources prefer the use of project vocabularies (DICE) or knowledge graphs (openAIRE/Graph) to facilitate engagement of researchers in data sharing and re-use. All these activities help to implement FAIR principles and Open Science at the increasing number of MSE-oriented universities or other research performing organization. It also leads to digitalization of several inherently non-digital objects, enabling machine-readability. Thus, the outputs can be further studied by the global scientific community or with the help of AI/ML approaches.
|
DMP challenges and solutions presented by DESY, photon and neutron facility |
Precise characterisation of materials and other samples inherently rests on the employment of diverse experimental and computational approaches. This fact increases the importance of data interoperability and automatisation of data/metadata collection into merged datasets. Integration of instruments and computational tools into interoperable ecosystems with minimal involvement of operators is tested at NIMS (Japan), NIST and JHU (USA), KIT (Germany) and CEITEC Nano (Czech Republic), to name just a few. This development opens new questions about experimental workflows and cookbooks, instruments’ booking calendars, data sensitivity and other aspects of data management, seamless integration of which is essential for a better adoption of FAIR data in the field.
Materials Sciences and Engineering in RDA
RDA facilitated the first activities towards standardization of data management in the MSE domain. International Materials Resource Registries RDA-WG delivered one of the first metadata schemas and vocabularies [1]. Like each discipline, MSE domain requires specific approaches for the research data management. At the same time, it is a highly heterogeneous domain and needs a high level of interoperability delivered already in the early phase of data sharing. Therefore, new RDA IGs are planned to help with cross-disciplinary research and solutions how to accelerate it. Among those, BoF Data representation in materials and chemicals based on harmonised domain ontologies focuses on the aggregation of MSE-relevant ontologies and other metadata standardization tools.
Complex instruments provide essential and valuable data in the MSE domain. To facilitate permanent connection between instruments and FAIR data, FAIR Instrument Data IG was established in 2023. It will continue the discussion on identification of instruments and verification of their setup, which was started by Persistent Identification of Instruments WG and resulted in PIDINST adoption.
On RDA Plenary 20, RDA/CODATA Materials Data, Infrastructure & Interoperability IG and Research Data Management in Engineering IG shared a joint meeting on the implementation of ELNs and LIMS to automatise data harvesting and metadata collection.
Researchers in the MSE domain frequently handle commercially or military sensitive data. It is important that recommended data repositories belong to trusted research environments (TRE). BoF Trusted Research Environments for Sensitive Data: FAIRness for "Closed" Data and Processes wants to establish best practices in the world of TRE, share the experience and define rules for TREs with diverse focus. This should enable the access and research on data that otherwise would not be possible.
Unfortunately, there is currently no active RDA IG or WG specifically focused on the data management issues in the MSE domain. Moreover, a better integration with commercial subjects is essential to avoid fragmentation and poor interoperability of global MSE data. The activities of European Materials Modelling Council are a good step forward.
RDA Working Groups and Interest Groups
The discipline specific groups are instrumental in providing a platform for domain-specific discussion on data management. Historically, there are two RDA IGs with focus on MSE:
-
RDA/CODATA Materials Data, Infrastructure & Interoperability IG – promotes awareness and development of a global materials data infrastructure based on open standards.
-
Research Data Management in Engineering IG - a platform for developing consensus on RDM best practices for engineering.
Groups of related disciplines:
-
Disciplinary Collaboration Framework IG - bringing together different research disciplines to discuss disciplinary-specific use cases of RDA outputs.
-
Chemistry Research Data IG: data management for chemists.
-
FAIR for Machine Learning (FAIR4ML) IG: automatisation, machine learning and FAIR data management.
Other relevant groups:
-
FAIR Instrument Data IG: connection of data to persistently identified instruments (see below).
-
Persistent Identification of Instruments WG: persistent identification of instruments.
-
Software Source Code IG: software code as a necessary component of FAIR data.
-
Sensitive Data IG: how to handle sensitive data, appropriate licensing.
-
FAIRsharing Registry WG: connecting data and metadata standards, repositories and policies
-
Education and Training on Handling of Research Data IG: RDM education for everyone.
-
Vocabulary Services IG: development of community-based recommendations on approaches to publication of controlled vocabularies on the web.
-
Open Science Graphs for FAIR Data IG: next generation standards for metadata – knowledge graphs.
-
Data Versioning IG: adoption of data versioning principles.
-
Active Data Management Plans IG: expanding functionality of machine-actionable Data Management Plans (maDMPs)
Useful links & resources:
-
Repositories: https://data-collections.nfdi4ing.de/
-
Ontologies: https://terminology.nfdi4ing.de/ts/ontologies?page=1
-
ELN/LIMS Finder: https://eln-finder.ulb.tu-darmstadt.de/home
-
Data management for the beginners - focus on material scientists and engineers: datamaterialized.eu
-
About EOSC: https://eosc.eu/eosc-about
-
FAIR principles: https://www.go-fair.org/fair-principles/
- CARE principles: https://www.gida-global.org/care