Mapping the road ahead for the Data Discovery Paradigms IG

You are here

16 Jul 2021
Group(s) submitting the application: 
Meeting objectives: 
  • To update the group progress

  • To discuss new task forces

Meeting agenda: 

Collaborative session notes: https://docs.google.com/document/d/1IalqtclDzVRkYB63HQkTAofxymT_3S_2TSZi7y7E1gg/edit?usp=sharing

 

  1. Introduction of the group (10 minutes)

  2. Discussion on short-term activities on the open topics (60 minutes)

    1. Metadata enrichment for discovery, for example, using upper-level ontologies

      • Possible introductory talk: Mingfang Wu

      • Suggested scope:

        1. Identify existing resources, their purposes, problems that they are designed to address, domains where used, examples of successful application. 

        2. Challenges or difficulties, e.g. in applying specific controlled vocabularies or upper-level ontologies

      • Potential tasks:

        1. Identify existing surveys that could be used as input

        2. Might require the design of a new survey, tailored to the particular questions

      • Expected outcomes of the session discussion:

        1. Agreement on Scope

        2. Discuss possible tasks

        3. Potential leads

    2. Machine learning for data discovery, for example, topic modelling

      • Possible introductory talk: 

      • Suggested scope:

        1. Doing a landscape of ML solutions, either a survey of repositories or document/literature review, from perspectives (taking into consideration all stakeholders, i.e. researchers, librarians, citizen scientists, etc) / activities / efforts that are assisting / facilitating Data Discovery. Based on the landscape, identify best practices across multiple domains

        2. Examination of how ML models trained for data discovery can be described and shared.

        3. How to ensure that repositories can facilitate ML-based workflows while avoiding potential biases (resulting from only certain subsets of relevant data being discovered).

        4. Facilitate (semi-)automated processes (e.g. recommendation) in repositories for identifying items of interest from user perspective.  

      • Potential tasks:

        1. Landscape performed as a literature review, and/or a survey of repositories

        2. A best-practices document, that can be consequently transformed into a recommendations document.

      • Expected outcomes of the session discussion:

        1. Agreement on Scope

        2. Discuss possible tasks

        3. Potential leads

    3. User study / meta-research/analysis of data discovery interviews 

      • Possible introductory talk: Kathleen 

      • Suggested scope:

        1. Revisit the prior efforts on user study interviews/surveys 

        2. Meta analysis / research of the efforts.

      • Potential tasks:

        1. Organize a series of presentations of existing efforts.

        2. Create a list capturing all the existing/past/ongoing efforts

      • Expected outcomes of the session discussion:

        1. Agreement on Scope

        2. Discuss possible tasks

        3. Potential leads

  3. Next steps and wrap up (20 minutes)

Target Audience: 
  • Researchers who conduct user study for understanding more about user in user’s data discovery process 

  • Data managers/providers who are responsible in describing data and making data findable

  • Data managers to investigate whether any user studies have been performed at their site

  • Attendees with some prior preparation on insights for their Institutional / personal data discovery approaches would benefit more during the group discussion parts of the session.

Group chair serving as contact person: 
Brief introduction describing the activities and scope of the group: 

The objective of this IG is to provide a forum where representatives from across the spectrum of stakeholders and roles pertaining to data discovery can work together to identify, study and make recommendations concerning issues related to improving data discovery. The goal is to produce concrete deliverables that will be recognised and valued by the research and data communities.

This group was officially endorsed at RDA P9. The group has worked on the following task forces, namely:

  1. User study in data discovery (ongoing)

  2. Data/Metadata granularity (ongoing, a BoF has been submitted)

  3. Using schema.org for research dataset discovery (This task force has spun off to the Research Metadata Schemas Working Group, which was endorsed in Sept. 2019).
     

  4. Initial four task forces from the group:

    1. Relevancy ranking (completed)
       

    2. Use cases, prototyping tools and test collections (completed)
       

    3. Best practice for making data findable (completed)
       

    4. Metadata enrichment (closed)

Short Group Status: 

The DDPIG has been established and endorsed as an IG during P9. The group started with four task forces around target data discovery topics soon after P9. All task forces actively explored their topics, and reported progress and outputs at consequent plenaries. At P11, the first three task forces were officially closed, and a discussion on new Task Forces took place, focusing during P12 primarily on Schema.org and Data Granularity. After P13, a case statement for a Research Schemas WG was submitted, the case statement was endorsed in Sept. 2019, just before P14. 

Type of Meeting: 
Working meeting
Avoid conflict with the following group (1): 
Avoid conflict with the following group (2): 
Meeting presenters: 
DDPIG Co-chairs + Volunteers for leading the discussion of potential task forces