Increasing amounts of open data are being made available through a growing number of data repositories. Consequently, it is becoming more challenging for researchers to find relevant data, especially where desired data is dispersed across several repositories. On the other hand, web search tools have almost become a de-facto information and data discovery tool for researchers. This workshop will address the above issue of how to make data discoverable through aggregated search via existing web architecture.
To enable broad discovery and access to research data, some data repositories have begun leveraging data discovery services from commercial search engines like Google, Yahoo, Bing etc., by embedding structured data markup in metadata landing pages using vocabularies such as schema.org, DCAT, and specific extensions. As more data repositories implement this approach, there arise new challenges:
- First, there is a lack of consistent implementation across data repositories, and guidelines for those who would like to pursue this path.
- Second, vocabulary adopted or indexed by major web search engines are intentionally minimalistic, there is no evaluation if this minimum set of vocabulary meets the needs for research data. If not, how we can extend it and get extensions accepted.
- Thirdly, and most importantly, how can the research data community take advantage of this strategy. For instance, explore new methods for metadata syndication and data discovery via the web architecture with a common set of vocabulary. The research data community can discuss and agree on what can be indexed, linked and build knowledge graphs for new data discovery applications; applications such as aggregated search across resources of a specific domain or related domains relevant to a research need, applications that can support a spectrum of data search needs from free text search, JSON API to SPARQL queries.
This workshop will bring together data repository managers, data managers, developers, and data technologists to exchange experience and lessons learnt from publishing metadata through structured data markup, as well as determining the advantages for this research data infrastructure community to use this data publishing trend.
- Overview of the publishing process (Mingfang Wu, Australian Research Data Commons, Australia)
- The SeaDataNet’s European Directory of Marine Environmental Data and British Oceanographic Data Centre data library catalogue (Gwenaelle Moncoiffé, National Oceanography Centre - British Oceanographic Data Centre, UK)
- How Ontologies are Improving Data Discovery and Interoperability in Glycoscience (Matthew Campbell, Griffith University)
- Research Data Australia (Joel Benn, Australian Research Data Commons)
- Practice from Bioschema.org (Leyla Garcia, ZB MED Information Centre for Life Sciences, Germany)
- Practice from Science-on-schema.org (Adam Shepherd, Woods Hole Oceanographic Institution, USA)
- Visualising Schema.org crosswalks (Karen Payne, World Data System, Canada)
- Discussion
- Gleaner (https://gleaner.io) A review of tools developed as part of the Nation Science Foundation (US) EarthCube (Doug Fils, Consortium for Ocean Leadership
- An experiment of research data global (Joel Benn, ARDC)
- Knowledge network and its use in virtual laboratory (Jonathan Yu, CSIRO, Australia)
- Discussion
- Describe and markup data from your own data repository (Leyla Garcia)
- Best practice guidelines for publishing structured data markup (Mingfang Wu)
- Discussion: Any tabled topics for further discussion, and wrap up.
Workshop organiser:
Adrian Burton (Australian Research Data Commons, Australia)
Doug Fils (Consortium for Ocean Leadership, USA)
Leyla Garcia (ZB MED Information Centre for Life Sciences, Germany)
Nick Juty (The University of Manchester, UK)
Karen Payne (International Technology, World Data System, Canada)
Adam Shepherd (Woods Hole Oceanographic Institution, USA)
Mingfang Wu (Australian Research Data Commons, Australia)
- 1193 reads