Open Science Graphs interoperability: data models and data exchange protocols

You are here

23 Feb 2022
Group(s) submitting the application: 
Meeting objectives: 

An Open Science Graph (OSG) is an information space describing through metadata one or more entities and actors involved in the research lifecycle and knowledge production (projects, publications, data, software, services, researchers, organizations, facilities, etc.). 

The work of the Interest Group (IG) focuses on open challenges in Open Science Graphs for FAIR Data. As motivated in the case statement of the IG, the topic is of high interest and priority globally, with this specific RDA IG featuring in several European Commission calls for projects in 2022.

During the last IG session at RDA, it became clear that the most urgent challenge targets the definition of an interoperability framework enabling a seamless exchange of data across diverse OSGs. According to the audience, whose composition spans 40% researchers, 30% service providers, 30% graph providers, developers, and others, the following high-priority aspects were identified:

  • Identification of data sources (e.g., registries, Open Access archives, as well as other OSGs) integrated by the OSG at hand;

  • Implementation towards Openness and FAIRness;

  • Common export format and protocols: identification of a set of research entities, inclusive of publications, datasets, software, organizations, funders, authors (as well as services, patents, research concepts/topics, and annotations, here less relevant, but yet of interest);

  • Specification of added-value specific to the OSG at hand.

The objective of the session has the main objective of bootstrapping the discussion about the establishment of a specific Working Group (WG) discussing on “data models and exchange protocols for Open Science Graphs interoperability” that would tackle such challenges. In particular, we foresee a discussion centred around:

  • Data model

    • How to identify of objects in the graph: graph IDs and PIDs together with a vocabulary of PID providers (Crossref DOI, DataCite DOI, ORCID, ISNI, ROR, ArXiv, PubMedID, PDBs, etc.);

    • Types of entities: vocabulary of entity types;

    • Entity description: minimal (and optional) set of properties for each entity;

    • Relationships: scholix.org;

    • Provenance of the object, provenance of the object properties;

    • Trust of the object, trust of the object properties; trust should be a value (e.g., from 0 to 1) indicating the quality/accuracy of the object or property (e.g., 0 = randomly generated data; 1 = data manually verified by an expert agent in the domain);

    • Added-value: set of entity properties characterizing the added-value brought by the Graph (e.g., indicators, subjects, references, etc.).

  • FAIRness and Openness Definition of a “graph profile”, an exchangeable metadata description of an OSG describing in a machine-readable format all the properties above, enriched with:

    • Integrated data sources: refer to a registry of data source/service PIDs;

    • Licensing information;

    • Documentation: metadata schemas of the entities added-value, access protocols, list of dumps (and related PIDs).

 

The first part of the session will be dedicated to a discussion on this approach and of possible alternatives, with the intention of identifying broad agreements and disagreements on the WG topics to build a solid starting point to submit a request for an RDA WG. 

 

The second part of the session will focus on the definition of a roadmap for the WG, identifying a list of properties, contributors, and meetings during 2022. An optimal scenario is one where the group marries an incremental approach, defining a core Interoperability Framework that can be quickly adopted by OSG providers as a starting point to facilitate and track data exchange, followed up by richer capabilities, to be adopted at a later stage to further enhance the resulting ecosystem.

Meeting agenda: 

Collaborative session notes: https://docs.google.com/document/d/1em4bMa8UmSUHMV8wBAt4pQK2ZnuJfwTwxTYF...

  • Welcome (co-chairs) [5mins]
  • Objective and WG topics (Andrea Mannocci) [25mins]
  • Invited:
    • Kristian Garza (DataCite) - Title: PID Graph - an Open Science Graph [10mins]
    • Matthias Liffers (ARDC) - Topic: Persistent identifiers and interoperability [10mins]
    • Enrico Daga (Open University) - Title: Streamlining semantic data publishing with SPARQL Anything [10mins]
  • Open discussion (co-chairs as facilitators): identification of WG structure and activities, contributors beyond co-chairs
Target Audience: 
  • Publishers who want to understand better how to leverage on and contribute to OSGs;

  • Data repositories/centres who want to understand better how to leverage on and contribute to OSGs;

  • Consumers (e.g. SMEs, scholarly services) of (bulk) OSGs to build services (e.g., impact assessment, discovery, publishing)

  • OSGs aggregators (e.g., CrossRef, DataCite, OpenAIRE, ORCiD, OKRG, ResearchGraph, OpenAlex, etc.) to exchange information and benefit from added-value brought by third-party graphs.

Group chair serving as contact person: 
Brief introduction describing the activities and scope of the group: 

The goal of the Open Science Graphs for FAIR data Interest Group is to build on the outcomes of other RDA WGs/IGs to investigate the open issues and identify solutions towards achieving interoperability between Open Science Graph initiatives. The aim is to improve FAIRness of research data, and more generally FAIR*-ness of science, by enabling the smooth exchange of the interlinked metadata overlay required to access research data at the meta-level of the discovery-for-citation/monitoring and at the thematic level of the discovery-for-reuse. Such “FAIR-ness” and “interlinked-ness” provide strong support for research integrity and research innovation which in turn underpin significant social, environmental and economic benefits.

Short Group Status: 

The Interest Group was settled in 2019, had a BoF at P13 Philadelphia, and was first presented in P14 Helsinki. Then it slowly progressed throughout P15, P16, P17, and P18 being in the middle of the pandemic, but it still raised a lot of interest within the RDA community, as accounted for by the extensive list of subscribers, as well as externally, as suggested by the explicit mentions in EC official documents.

Since its establishment, the co-chairs have met five times, defined a concrete roadmap for the IG, and co-authored a position paper on the topic. The meeting will therefore be the occasion to bootstrap the activity of the aforementioned WG on a larger scale, involving the rather consistent list of IG members, on the high-priority topics of data modelling and exchange formats for OSGs.

Type of Meeting: 
Working meeting
Avoid conflict with the following group (1): 
Avoid conflict with the following group (2): 
Meeting presenters: 
Paolo Manghi, Andrea Mannocci