A Collection of Crosswalks from Fifteen Research Data Schemas to Schema.org
By Mingfang Wu
Research Metadata Schemas WG |
Group co-chairs: Mingfang Wu, Sarala Wimalaratne, Adam Shepherd, Leyla Garcia |
Supporting Output title: A collection of crosswalks from fifteen research data schemas to Schema.org |
Authors: Mingfang Wu, Penelope Hagan, Baptiste Cecconi, Stephen M. Richard, Chantelle Verhay, RDA Research Metadata Schemas WG |
DOI: 10.15497/RDA00069 |
Citation: Wu, M., Hagan, P., Cecconi, B., Richard, S. M., Verhey, C., & RDA Research Metadata Schemas WG. (2021). A Collection of Crosswalks from Fifteen Research Data Schemas to Schema.org. Research Data Alliance. https://doi.org/10.15497/RDA00069 |
Abstract:
The RDA Research Metadata Schemas Working Group has collected and aligned crosswalks from 15 source research metadata schemas to Schema.org. The source schemas include discipline agnostic schemas Dublin Core, Data Catalogue Vocabulary(DCAT), Data Catalogue Vocabulary - Application Profile (DCAT-AP), Registry Interchange Format - Collections and Services (RIF-CS), DataCite Schema, Dataverse; and discipline schemas ISO19115-1, EOSC/EDMI, Data Tag Suite (DATS), Bioschemas, B2FIND, Data Documentation Initiative (DDI), European Clinical Research Infrastructure Network (ECRIN), Space Physics Archive Search and Extract (SPASE); as well as CodeMeta for software.
The collection can serve as a reference for data repositories when they develop their crosswalks, as well as an indication of semantic interoperability among the schemas. The visualisation tool (developed by the World Data System - International Technology Office) provides a user-friendly interface to inspect the crosswalks, by querying either an individual property or a schema name.
The dataset:
The collection of the crosswalks are available in the following formats:
-
The xlsx file has all information about the revision history and the same crosswalks in two sheets with different classifications. The first classification (labelled “crosswalks”) includes categories: properties as recommended by the Google dataset search guidelines (A), DCAT properties that can be mapped to Schema.org (B) and DCAT properties that are unable to be mapped to Schema.org. The second classification (labelled “NISO classification”) organises the mapped Schema.org properties with the categories from the NISO metadata type, including Descriptive Metadata, Technical Metadata, Preservation Metadata, Rights Metadata and Structural Metadata.
-
The “crosswalks” sheet is also available in the csv file and the pdf file. The CSV file is accessible via the group's git repository, here.
Attachment | Size |
---|---|
RDA Crosswalks from schemas to schema.org (V1.0, 2021-12-16).pdf | 161.25 KB |
RDA Crosswalks from schemas to schema.org (V1.0, 2021-12-16) (1).xlsx | 137.95 KB |
Attachment | Size |
---|---|
A Collection of Crosswalks from Fifteen Research Data Schemas to Schemaorg.pdf | 1.02 MB |
- Log in to post comments
- 6226 reads
Author: Tobias Schweizer
Date: 04 Feb, 2022
Dear Members of the Research Metadata Schemas Working Group,
First of all, thanks for the effort that you put into this. We are working on a data model based on schema.org to represent metadata of research projects (docs are here) and I think your efforts will help us a lot.
Below please find some questions and comments.
- Terminology: Are the terms "mapping" and "crosswalk" interchangeable?
- Cardinalities (required, recommended): As far as I know, schema.org does not define any cardinalities (at least their validator does not care). What exactly are the Google dataset search recommendations that you mention in the document? Can they be influenced in some way (community-based)?
- Do you plan on making the crosswalks available in a machine-actionable form? I assume the jsons are meant for the UI only?
- Do you have any preferred data models (XML, RDF, etc.)? I am currently designing a mapping process for RDF data from different data models/ontologies to schema.org using SPARQL CONSTRUCT queries. Would that be of interest to the group? If yes, I'd be happy to share my ideas.
- How do you plan on approaching datatypes? Dates, for instance, can be extremely tricky as they might not be normalised (periods, imprecision). A source model could allow for all sorts of string representations of a date whereas in your target model you rather want something standardised to allow for cross project searches.
- Do you recommend to restrict schema.org's flexibility in terms of allowed datatypes in some way. Some properties' ranges are very broad which is impractical for application development.
- What is the main motivation for the crosswalks? Global searching and indexing or actual migration of data?
- How do you handle loss of information? Sometimes the source model is more specific than the target model and various source properties can be mapped to the same property in the target model. (Sometimes, the opposite is true ;-))
I hope that my comments are helpful in some way. Please do not hesitate to contact me in case something is unclear or if I misunderstood something.
Best,
Tobias
Author: Mingfang Wu
Date: 15 Feb, 2022
Dear Tobias,
Thank you very much for your constructive comments, and I am sorry for this late reply.
First, your ideas about making the crosswalk machine-actionable and your work on designing a mapping process using SPARQL CONSTRUCT queries would be very interested to the group. I would like to invite you to participate the group meetings if your are available to introduce and discuss the ideas. The next one is: 20:00 UTC, Wednesday 16 February. You can check your local time here. The zoom ID is available from this post.
Below is the reply to your questions:
Please feel free to add comment if you need further clarification.
Best,
Ming
Author: Chris Mungall
Date: 04 Mar, 2022
Hello, I put together an example of what a machine-actionable form of the crosswalks would look like using the LinkML schemasheets framework.
You can see it here:
https://linkml.io/rda-crosswalk/
Please note that this is an experiment in the spirit of leading a discussion and not an officially sanctioned RDA product!
The basic methodology is to allow curators to maintain the crosswalk in google sheets (with some minor modifications), with a translation to LinkML, which can then be translated to SHACL, JSON-Schema, as well as making a searchable website
Author: Mingfang Wu
Date: 01 Apr, 2022
Hi Chris,
That is great!
Would you be available to introduce your work and the tool in our group meeting?
Thanks,
Ming