A Collection of Crosswalks from Fifteen Research Data Schemas to Schema.org

22
Dec
2021

A Collection of Crosswalks from Fifteen Research Data Schemas to Schema.org

By Mingfang Wu


Research Metadata Schemas WG

Group co-chairs: Mingfang WuSarala WimalaratneAdam ShepherdLeyla Garcia

Supporting Output title: A collection of crosswalks from fifteen research data schemas to Schema.org 

Authors: Mingfang Wu, Penelope Hagan, Baptiste Cecconi, Stephen M. Richard, Chantelle Verhay, RDA Research Metadata Schemas WG

DOI: 10.15497/RDA00069

Citation:  Wu, M., Hagan, P., Cecconi, B., Richard, S. M., Verhey, C., & RDA Research Metadata Schemas WG. (2021). A Collection of Crosswalks from Fifteen Research Data Schemas to Schema.org. Research Data Alliance. https://doi.org/10.15497/RDA00069

 

Abstract:

The RDA Research Metadata Schemas Working Group has collected and aligned crosswalks from 15 source research metadata schemas to Schema.org. The source schemas include discipline agnostic schemas Dublin Core, Data Catalogue Vocabulary(DCAT), Data Catalogue Vocabulary - Application Profile (DCAT-AP), Registry Interchange Format - Collections and Services (RIF-CS), DataCite Schema, Dataverse; and discipline schemas ISO19115-1, EOSC/EDMI, Data Tag Suite (DATS), Bioschemas, B2FIND, Data Documentation Initiative (DDI), European Clinical Research Infrastructure Network (ECRIN), Space Physics Archive Search and Extract (SPASE); as well as CodeMeta for software. 

The collection can serve as a reference for data repositories when they develop their crosswalks, as well as an indication of semantic interoperability among the schemas. The visualisation tool (developed by the World Data System - International Technology Office) provides a user-friendly interface to inspect the crosswalks, by querying either an individual property or a schema name. 

The dataset: 

The collection of the crosswalks are available in the following formats:

  • The xlsx file has all information about the revision history and the same crosswalks in two sheets with different classifications. The first classification (labelled “crosswalks”)  includes categories: properties as recommended by the Google dataset search guidelines (A), DCAT properties that can be mapped to Schema.org (B) and DCAT properties that are unable to be mapped to Schema.org. The second classification (labelled “NISO classification”) organises the mapped Schema.org properties with the categories from the NISO metadata type, including Descriptive Metadata, Technical Metadata, Preservation Metadata, Rights Metadata and Structural Metadata.

  • The “crosswalks” sheet is also available in the csv file and the pdf file.  The CSV file is accessible via the group's git repository, here.

Output Status: 
RDA Supporting Outputs
Review period start: 
Wednesday, 12 January, 2022 to Saturday, 12 February, 2022
Group content visibility: 
Use group defaults
Primary WG Focus / Output focus: 
Domain Agnostic: 
Domain Agnostic
  • Tobias Schweizer's picture

    Author: Tobias Schweizer

    Date: 04 Feb, 2022

    Dear Members of the Research Metadata Schemas Working Group,

    First of all, thanks for the effort that you put into this. We are working on a data model based on schema.org to represent metadata of research projects (docs are here) and I think your efforts will help us a lot.

    Below please find some questions and comments.

    -  Terminology: Are the terms "mapping" and "crosswalk" interchangeable?

    - Cardinalities (required, recommended): As far as I know, schema.org does not define any cardinalities (at least their validator does not care). What exactly are the Google dataset search recommendations that you mention in the document? Can they be influenced in some way (community-based)?

    - Do you plan on making the crosswalks available in a machine-actionable form? I assume the jsons are meant for the UI only?

    - Do you have any preferred data models (XML, RDF, etc.)? I am currently designing a mapping process for RDF data from different data models/ontologies to schema.org using SPARQL CONSTRUCT queries. Would that be of interest to the group? If yes, I'd be happy to share my ideas.

    - How do you plan on approaching datatypes? Dates, for instance, can be extremely tricky as they might not be normalised (periods, imprecision). A source model could allow for all sorts of string representations of a date whereas in your target model you rather want something standardised to allow for cross project searches.

    - Do you recommend to restrict schema.org's flexibility in terms of allowed datatypes in some way. Some properties' ranges are very broad which is impractical for application development.

    - What is the main motivation for the crosswalks? Global searching and indexing or actual migration of data? 

    - How do you handle loss of information? Sometimes the source model is more specific than the target model and various source properties can be mapped to the same property in the target model. (Sometimes, the opposite is true ;-))

     

    I hope that my comments are helpful in some way. Please do not hesitate to contact me in case something is unclear or if I misunderstood something.

    Best,

    Tobias 

     

  • Mingfang Wu's picture

    Author: Mingfang Wu

    Date: 15 Feb, 2022

    Dear Tobias,

    Thank you very much for your constructive comments,  and I am sorry for this late reply.

    First, your ideas about making the crosswalk machine-actionable and your work on designing a mapping process using SPARQL CONSTRUCT queries would be very interested to the group.   I would like to invite you to participate the group meetings if your are available to introduce and discuss the ideas. The next one is: 20:00 UTC, Wednesday 16 February. You can check your local time here. The zoom ID is available from this post.

    Below is the reply to your questions:

    • We use the terms “mapping” and “crosswalk” interchangeably here, meaning translating elements from one schema to equivalent elements in another schema.
    • No, we don’t specify which elements in Schema.org should be mandatory, recommended, or optional.  We discussed in this guidelines that data providers should decide how many and which elements from their source schema should be mapped to, based on their purpose and what can be consumed by their anticipated downstream consumers. If their sole purpose is to be indexed by the google data search, then they should follow the recommendation from the google dataset search guide.
    • It is a great idea to make the crosswalks available in a machineactionable form. We haven’t discussed about this yet.
    • We haven’t recommended any data models, although our guidelines provide examples in JSON-LD.
    • The collection of crosswalks is at the level of semantic mapping, and the guidelines is about overall publishing process. We referred to the guide and examples from the scienceon-schema.org  for detailed implementation of datatypes and patterns of Schema.org properties.
    • We hope the crosswalks can serve as a reference for others when they plan to adopt a crosswalk (if they have the same source schema) or to check which elements are mapped to/from in other crosswalks when they develop their own crosswalk.  We include this recommendation “Adopt or develop a crosswalk from your repository schema to Schema.org” in the guidelines.
    • It is inevitable that some information may get lost when mapping from one schema to another, especially from discipline specific one to a general schema. How important if some information are lost is case by case, and we recommend to link/point back to source metadata record whenever possible.  We discussed this issue in the guidelines and in this paper (submitted to a journal, under review process for now).

    Please feel free to add comment if you need further clarification.

    Best,

    Ming

     

     

  • Chris Mungall's picture

    Author: Chris Mungall

    Date: 04 Mar, 2022

    Hello, I put together an example of what a machine-actionable form of the crosswalks would look like using the LinkML schemasheets framework.

    You can see it here:

    https://linkml.io/rda-crosswalk/

    Please note that this is an experiment in the spirit of leading a discussion and not an officially sanctioned RDA product!

     

    The basic methodology is to allow curators to maintain the crosswalk in google sheets (with some minor modifications), with a translation to LinkML, which can then be translated to SHACL, JSON-Schema, as well as making a searchable website

     

  • Mingfang Wu's picture

    Author: Mingfang Wu

    Date: 01 Apr, 2022

    Hi Chris,

    That is great!

    Would you be available to introduce your work and the tool in our group meeting?

     

    Thanks,

    Ming

submit a comment