The RDA Data Discovery interest Group: Looking back and looking ahead

01 Dec 2017
Groups audience: 

Greetings, members of the Data Discovery Paradigms Interest Group!
We want to make sure we get this message out to you as the end of 2017 comes into view, to let you know of the developments we’ve been working on, as well as new opportunities for 2018.
In particular, we are interested in your thoughts on forming and joining a new set of Task Forces (see point 3), before the end of the year. But first, here’s what we’ve been up to!
1. Meetings.
1. RDA P10:
* We had a very successful meeting in Montreal, which many of you attended: the final notes are posted here https://docs.google.com/document/d/1Gwq3s06b4GZ_QxedHxUry_4iRkjS9582IJwP.... The main outcomes were twofold:
* We had a number of ideas for potential new Task Forces (more on that below)
* It was proposed to start a discussion on how to make data more discoverable by web search engines. There is an understanding of how entities are indexed by Google, but an analysis of requirements would help repositories assess whether search engine optimization is worthwhile (see next point)
2. Presentation by Natasha Noy of Google Research:
* Natasha presented on ‘Making Data Discoverable by Web Search Engines’ in a webinar on 2 November. A, a recording of her (very well-attended) presentation can be found here: https://www.rd-alliance.org/making-data-discoverable-web-search-engines
2. Task Force Update.
Three of the current Task Forces have now achieved their goals, and have been working on summarizing and submitting their findings:
1. The ‘Best Practices’ Task Force (https://www.rd-alliance.org/group/data-discovery-paradigms-ig/wiki/best-...) have summarized their findings into two documents: the first one focuses on ‘Best Practices for Data Seekers’, and hasve been submitted to PLoS Bio for their ‘Ten Simple Rules’ series of papers.
2. As another publishable outcome, the TF on Best Practices (https://www.rd-alliance.org/group/data-discovery-paradigms-ig/wiki/use-c...) combined their findings with the TF on User Requirements, and have summarized these joint findings into a document, which is being submitted to an appropriate journal.
3. The Relevancy Ranking Task Force (https://www.rd-alliance.org/group/data-discovery-paradigms-ig/wiki/relev...) has completed the survey on ranking practices among data repositories, and are in the process of analysing this data and writing up a final report.
3. New Task Forces.
Since the close of 2017 sees the close of three of our four Task Forces (only the Metadata Enrichment Task Force will continue https://www.rd-alliance.org/group/data-discovery-paradigms-ig/wiki/metad... ) we are looking for new Task Force proposals, and volunteers to lead these efforts. A few suggestions came up at RDA P10:
* Cataloging and Analysing Common Data Discovery APIs;
* Data Discovery for Institutional Repositories, in order to test recommendations as well as explore new insights when using data discovery technologies;
* Analysis of Search logs, which could be a follow-up activity for the existing relevancy ranking task force;
* Collection and Analysis of Data Needs, to identify what people usually want to find;
* Making research data more discoverable by search engines (as a follow up from Natasha Noye’s talk).
A further set of highly-ranked ideas from our previous brainstorm elicited these potential Task Forces:
* Granularity, domain-specific cross-domain issues
* De-duplication of search results
* Using upper-level ontologies
* Search personalisation
For further ideas and a historical perspective with how we came up with these task forces, please see the Task Forces page on the RDA website: https://www.rd-alliance.org/group/data-discovery-paradigms-ig/wiki/ddpig....
A request!
We would like to ask all of you to consider:
1. Whether any of these topics appeal to you;
2. If so, if you are interested in leading or joining this Task Force (leading a task force takes about 8 hours per month; joining, about 4 hours a month);
3. WhetherIf you have other ideas for Task Forces to join or lead.
We are collecting suggestions for topics and people over the coming month (due December 22!), and would like to announce the start a new set of activities at P11 in Berlin in March.
Please feel free to contact any of us with any further questions or suggestions.
Looking forward to hearing your views!
With kind regards, the IG Chairs:
SiriJodha Singh Khalsa, ***@***.***
Fotis E. Psomopoulos, ***@***.***
MingFeng Wu, ***@***.***
Anite de Waard, ***@***.***

  • Kerstin Lehnert's picture

    Author: Kerstin Lehnert

    Date: 02 Dec, 2017

    Hi Anita,
    Looks like fantastic outcomes of work so far.
    I would be really interested in the topic of granularity. As you know my goal is to get down to the level of sampling features, specifically samples, in searches.
    Let me know how to engage.
    Kerstin
    Sent from my iPhone

  • Siri Jodha Khalsa's picture

    Author: Siri Jodha Khalsa

    Date: 18 Dec, 2017

    charset=windows-1252">

    Howdy Everyone,
    In regard to this potential future topic:

    On 12/1/17 7:07 PM, Anita de Waard
    wrote:

    cite="mid:***@***.***">

  • Making research data more
    discoverable by search engines. 
  • An NSF EarthCube project has done a lot of work on guidelines for
    producing quality schema.org markup, with additional extensions to
    schema.org classes, that should help repositories produce markup
    that will pass the Google Structured Data Testing Tool with 0
    errors:
    https://github.com/earthcubearchitecture-project418/p418Vocabulary

    Cheers,
    SiriJodha
    --
    Siri-Jodha Singh KHALSA, Ph.D., SMIEEE
    National Snow and Ice Data Center
    University of Colorado
    Boulder, CO 80309-0449 Phone: 1-303-492-1445 GV: 1-303-736-9976
    http://cires.colorado.edu/~khalsa
    http://orcid.org/0000-0001-9217-5550

submit a comment