Useful Documents
- Document listing the captured use-cases: Google Spreadsheet Link (obsolete, do not use: Google Doc Link)
- Document with the Ranked Requirements: Google Spreadsheet Link (new)
- ANDS Falling Water Project - User Interview Responses
- JISC Metadata focus group use cases dataset
Next Teleconference & Connection Details
Date: Tuesday August 22nd 2017
Time
- Albuquerque (USA - New Mexico), at 5:00:00 am MDT (UTC-6 hours)
- New York (USA - New York), at 7:00:00 am EDT (UTC-4 hours)
- Prague (Czech Republic), at 1:00:00 pm CEST (UTC+2 hours)
- Athens (Greece), at 2:00:00 pm EEST (UTC+3 hours)
- Melbourne (Australia - Victoria), at 9:00:00 pm AEST (UTC+10 hours)
How
Connection details to be send near the date.
Minutes
1st Teleconference, December 20th, 2016
Attendees:
- Mingfang Wu, ANDS (MFW)
- Fotis Psomopoulos, University of Thessaloniki, Greece (FP)
- Dom Fripp, Jisc, UK (DF)
- Anita de Waard, Elsevier (AdW)
- Siri Jodha Khalsa, NSIDC/Czech republic (SJK)
Notes
FP: Should we start with a specific disciplines and do a survey?
DF: Run a research discovery service, that could be a starting point for gathering user requirements. Need to capture the long tail of research and the user behaviour around smaller, less coherent sets of data (i.e. not so discipline specific)
AdW: there are a few existing RDA Groups that are domain specific, so we shouldn’t duplicate this effort
MFW: If this is the case, is it possible that these other groups have already identified this information?
AdW: A preliminary survey before the start of this IG, hasn’t revealed any such info.
FP: We should start with a list of questions we’d like the answers to, and then use these to connect with (a) RDA groups and (b) other user communities.
MFW: We can start to talk to researchers to understand what and how they discovery data, and also look at search logs from some data portals if available,
DF: Do we create user-profiles by generic questions on use cases, or do we gather the requirements from use-cases and then identify the profiles?
MFW: We can do both: if we already know some profiles, we can start with those, but also gather new use-cases.
DF: We can pool resources and start with those. We are working with Agile methods and we are essentially driven by use-cases. We generate some user-stories so we can start sharing those with the community.
MFW: Last year Australia did a similar survey so I can try and get the semi-structured data from them.
DF: Through those user - stories, but we can use them to identify the generic requirements and then branch-off to discipline-specific.
SJK: NSIDC has both “personna” and related use cases that were developed to guide recent improvements to our data search. They are, however, specific to the cryosphere. I will check to see if they can be shared. (with apologies for joining the telecon 1 hr late)
SJK: Also, I developed a google form for entering information about “science scenarios” that were used in an NSF EarthCube project. The url is https://goo.gl/forms/loJpTwchUd9I6QLI2. The entries automatically go into a spreadsheet, making the content easily searchable. The form could be modified to the purposes of this group.
DF: Will also check if UK Research Data Discovery Service use cases can be shared amongst the group.
DF: Metadata focus group use cases dataset https://zenodo.org/record/193011#.WFklCVOLS00
FP: Can we use a specific format for collecting use-cases?
DF: Although we are using a specific, open format that allows for long statements, we can use something as simple as possible. - mentioned user stories as purposed for the agile methodology, some background and e.g.s can be found at: https://www.scrumalliance.org/community/articles/2013/september/agile-us...
DF: Having this rich source of information / use-case, we can then use to provide a quicker bridge to user-profiles as well as a guide to survey/questions.
MFW: A good starting point is to start looking at these use-cases. Other tasks within this IG (such as best practices, relevance etc), will be informed through the use-cases.
AdW: We have a session in the RDA Plenary in Barcelona, so we should have some information by then.
Follow-up Actions
Action #0: Create a single GDoc where we can c&p our use-cases [see Useful documents, top of this page].
Action #1: Collect existing use-cases from within our groups/institutions.
Action #2: Identify generic questions on Data Discovery that we can use for surveys.
Action #3: Identify the user-profiles from the surveys/use-cases.
2nd Teleconference, January 24th, 2017
Attendees:
- Mingfang Wu, ANDS (MFW)
- Fotis Psomopoulos, University of Thessaloniki, Greece (FP)
- Siri Jodha Khalsa, NSIDC/Czech republic (SJK)
- Beth Huffer, Lingua Logica LLC/ US (BH)
Notes
MFW: What is the purpose of the use-cases?
FP: The ultimate goal is to go from Use-cases to User Profiles, generate the data discovery requirements from the profiles and finally condense this into best practices for discovery services
MFW: There are several similar use-case. We can collate/merge some use cases with the same requirements
MFW: Falling Water User Interview Responses to be used as additional use cases. There are also some use cases from JISC Metadata focus group use cases dataset: https://zenodo.org/record/193011#.WFklCVOLS00
FP: We have so far 5 disciplines:
- Life Science,
- Economics,
- Cultures and linguistics
- Beamlines, x-ray absorption spectroscopy
- Computer Science
Should we investigate for additional disciplines?
MFW: Some uses cases are relevant for other Task Forces, such as the best practices.
SJK: Some entries in the use-cases have empty “goal” (so-that column). This might prove problematic
MFW: That’s because we didn’t use this use case format when we interviewed people, it’s hard to go back and ask for the reason; some reasons can be reasonably inferred from the requested feature (and comments).
MFW: There are several occasions where multiple use-cases have similar requirements - we could collate them for easier process.
Possible Categories/Clusters of use cases:
- Data Availability
- Data Preview / Assessment
- Metadata enrichment
- Search interface
- ...
Follow-up Actions
Action #1: Continue gathering use-cases
Action #2: Collate use-cases based on requirements
3rd Teleconference, February 7th, 2017
Attendees:
- Mingfang Wu, ANDS (MFW)
- Fotis Psomopoulos, University of Thessaloniki, Greece (FP)
- Siri Jodha Khalsa, NSIDC/Czech republic (SJK)
- Antica Culina (NIOO-KNAW/NL) (AC)
- Jens Klump (CSIRO) (JK)
Notes
FP: Groups of categories based on the corresponding column values
FP: Should we merge relevant categories (such as Data Accessibility and Metadata Accessibility)
MFW: The number of metadata categories makes sense due to the question actually asked to the participants
MFW: We should keep all the categories, as it may be relevant to the other Task Forces
SJK: What is the purpose of the “data tool” subcategory?
MFW: It’s mostly to visualize data, as well as statistical evaluation/preview of the data
AC: What is the difference between Portal and Search functionality?
AC: Add also an “export” functionality
FP: Portal and Search functionality could be different
MFW: Search is part of a Portal, so a subcategory
FP: What is the difference between Data Accessibility and Metadata Accessibility?
MFW / AC: We need to filter first on Metadata accessibility (i.e. data accessible or not) and then get data (data accessibility)
SJK: Data Aggregators may have only links to the actual data (i.e. metadata) but not necessarily the data itself.
MFW: Interoperability corresponds to the ability to transfer data between resources.
SJK: Interoperability corresponds to connection between different datasets (i.e. data interoperability).
MFW: The current use-cases are more oriented to RDA Australia. We should extend this with additional use-cases.
AC: Add information on how well services/search functionality is described by the provider / portal
SJK: What is the source/authority of the data
AC: When you enter a keyword in search, where is the search space that the machine is looking into (e.g. abstract, database, full text?)
SJK: Add another layer of organization: the syndicate of queries (i.e. relevance of queries).
Follow-up Actions
Action #1: Transform the current use case document into a spreadsheet, in order to further analyse this (e.g. use pivot table to show use case numbers for each category)
Action #2: Add a categorization based on Intended Audience
Action #3: Add other activities from BioCADDIE and JISC to avoid bias towards RDA
4th Teleconference, February 21st, 2017
Attendees:
- Mingfang Wu, ANDS (MFW)
- Fotis Psomopoulos, University of Thessaloniki, Greece (FP)
- Anita de Waard, Elsevier (AdW)
- Siri Jodha Khalsa, NSIDC/Czech republic (SJK)
- Beth Huffer, Lingua Logica LLC/ US (BH)
- Antica Culina (NIOO-KNAW/NL) (AC)
Notes
General Discussion on the Use Cases Google Sheet. Need to continue adding new use-cases, especially from other perspectives. We should also start thinking of how we aim to use this information:
- Through communication with the other Task Force groups
- Through communication with Repository Managers
Ultimate goal is to create coherent user requirement profiles, that can be consequently used to identify the missing components in Data Discoverability (especially with regards to existing / missing functionality in Data Repositories).
Discussion
AC: In the use cases, add also the intended / used repository.
MFW: Add some more use-cases for data repository managers
MFW: Some of our existing use-cases where received from JISC/Data One. We should ask about the context
MFW: Check the “Intended Audience” info.
FP: Maintain also Categories / Clusters
SJK: Also it should be an issue to add/remove new categories
AdW: We should distinguish between repositories: primary repositories or aggregators
AC: That is exactly the point of adding the Repository column: add specific names of “data repositories” and then identify which categories they belong to.
SJK: Maybe we should be generic (so that it would be independent of the intended repositories).
MFW: Use cases could be used by “Best Practice for Data Providers” and “Best Practice for Data Repository”
Follow-up Actions
Action #1: Refine / Structure GSheet
Action #2: Go through the use cases for categories
Action #3: Agree on the definitions on the Intended Audience (see General Info tab on the Use Case spreadsheet)
5th Teleconference, March 7th, 2017
Attendees:
- Mingfang Wu, ANDS (MFW)
- Fotis Psomopoulos, University of Thessaloniki, Greece (FP)
- Anita de Waard, Elsevier (AdW)
- Siri Jodha Khalsa, NSIDC/Czech republic (SJK)
- Jennie Larkin, NIH (JL)
Notes
Data aggregator/data repository/metadata catalogue:
is responsible for running a data discovery service for human & machine, such as DataCite, DataOne, RDA (Research Data Australia), and data.gov etc. Generally, a data aggregator isn’t responsible for describing data (i.e. creating metadata), but may communicate with data providers what attributes a metadata should have to make it more discoverable.
Data provider:
is responsible for describing and curating data, ensures access to authentic, up-to-date, credential, fit-for-use, citable data; data providers also feed metadata to data catalogue.
Data collector:
is responsible for collecting data, who may or may not be responsible for describing data.
Data funder:
who fund research projects, mandate and/or recommend researchers to make their data resulting from their funded grants as openly available as possible.
Data user/consumer:
is seeking for data and uses data in his/her research; may cite data in his/her publication or report.
General Discussion on the Use Cases Google Sheet. Need to continue adding new use-cases, especially from other perspectives.
SJK: Keep the format used in https://www.w3.org/TR/sdw-ucr for presenting the use-cases for a wider audience, i.e. having collapsable requirements etc. e.g. 5.8 Crawlability, 5.13 Discoverability are two related requirements.
AdW: At the end of an IG there are very specific and short formats. Publish on the IG RDA page. Should we aim for a hybrid form.
MFW: Present each category as a top layer. Under each category we can have the related use-cases.
FP: We may have duplicates of each use-case, but as presenting the use-cases is not our primary goal, then it makes sense to highlight and put more focus on the categories/sub-categories structure.
MFW: See “Best Practice for Data Repository” and connect there?
AdW: Best Practices is a different output, but we shouldn’t connect them at this stage at least.
JL: I do agree with that - it will create more confusion that provide information.
SJK: We should analyse the use-cases first.
JL: There are 63 use-cases, but it has it covered all the aspects? But how can we go from this information-rich array to specific guides?
AdW: The target audience could be more repository managers. So essentially we want to show how data discovery is done.
AdW: Why are there all these different users? What is the difference between the user types? Maybe we should consolidate on major types: Funders, Researchers, Librarians?
AdW: Then we should generalize more. As these are interviews, we should abstract them and create the use-cases.
SJK: We should not throw out any of the data.
MFW: We agree we do not link to best practices, but we derive the requirements from the “interviews”/use-cases.
JL: We keep the current labels (Students, Researcher, Librarian, Funder), we accept that we have a bias towards Researchers, and then focus on underrepresented groups.
JL: Follow-up on the “interviews” in order to understand the overlap between the different roles.
JL: What are the outreach/products of the Task Force that we could aim for?
Follow-up Actions
Action #1: Create a first version of requirements
Action #2: Get extra use-cases from W3 to our “Interviews”
5th Teleconference, March 7th, 2017
Attendees:
- Mingfang Wu, ANDS (MFW)
- Fotis Psomopoulos, University of Thessaloniki, Greece (FP)
Notes
MFW: There are a lot of commonalities with the BioCADDIE Working Group 4 (Use Cases and Testing Benchmarks).
MFW: Regarding the plenary, we should have a slide with the 9 requirements we have come up with, and possibly another slide with the categories (maybe a pivot table).
FP: Question: what are the prototyping tools and the test collection aspects?
MFW: It could be something similar to this.
MFW: The test collection could be a set of very specific queries relevant to data discoverability.
FP: This makes sense. And the prototyping tool could be a first implementation of the gathered requirements.
MFW: Prototyping tool of what? For data relevance we had an approach similar to this (page 5)
Follow-up Actions
Action #1: Create ~6 slides for the P9 presentation of the Task Force, as follows:
- Goals / Aims /etc
- Gathering use cases (sources)
- Categories (chart?)
- Requirements Part 1
- Requirements Part 2
- Future steps
7th Teleconference, May 30th, 2017
Attendees:
- Mingfang Wu, ANDS (MFW)
- Fotis Psomopoulos, Institute of Applied Biosciences|CERTH, GR (FP)
- Kathleen Gregory, DANS NL (KG)
- Anita de Waard, Elsevier (AdW)
- Siri Jodha Khalsa, NSIDC/Czech republic (SJK)
Notes
FP: Quick overview of the ranking survey. Only ~12 responses so far which are not sufficient to capture the significance granularity between the 8 requirements.
FP: Next steps regarding the survey:
- Resend link aiming for more replies
- Start analyzing the data (create a pivot table, connect with the number of use-cases per requirement, connect with the user group)
MFW: Output of the ranking survey can be connected then with the Best Practices document
MFW: The Metadata TF used the use-cases Google Sheet. An action was for people to have a look at the use cases and identify the metadata that better fit (Metadata Extensions User Profiles). Essentially the question is “What type of metadata a user may want to facilitate discovery”.
FP: Regarding the gathering new use-cases survey; before sending out, get feedback
KG: Add a more informative context on top (set-up a narrative, describe the situation)
KG: Phrasing: “feel free to submit as many answers as you want.”
Follow-up Actions
Action #1: Resent the ranking survey, send out the gathering use-cases survey
Action #2: Sent an email template to Kathleen, for sending out to people outside RDA
Action #3: Start a draft “final output” document for the Use Cases TF
8th Teleconference, June 20th, 2017
Attendees:
- Mingfang Wu, ANDS (MFW)
- Fotis Psomopoulos, Institute of Applied Biosciences|CERTH, GR (FP)
- Jennie Larkin, NIH/NIDDK (JL)
- Siri Jodha Khalsa, NSIDC/Czech republic (SJK)
Notes
FP: Connected with BioSharing WG and the ELIXIR Bridging Force IG. Response from Jennie (within BioSharing but not officially representing the group per se), none from ELIXIR. Will wait from an official response from any of the chairs before submitting a joint session (but the deadline is this Friday, June 23rd).
MFW: BioSharing is probably on the wrap-up stage (already had the recommendation output). It would be good to have some feedback from the BioSharing in the Best Practices.
JL: Kathy Fontaine might have better understanding of their interest and how it might fit in to DDPIG work.
FP: Report on the analysis of the Requirements’ Ranking. Survey feedback was captured in this Google Sheet (approximately 30 responses were recorded). Other than the raw data, we did some processing in order to rank the requirements (list available here). The second tab has a simple analysis, providing both the average rank per requirement, as well as a distribution of the ranking for further understanding. Finally, the last tab contains the actual ranking of the 9 requirements, together with some information on the number of the original user interviews that produced the requirement and a distribution across the different user groups (Researcher, Student, Librarian, Funder). This particular document will also be considered the outcome of this TF.
FP: We have also captured ~15 new use cases (mostly thanks to the outreach efforts of Kathleen Gregory from DANS to the Librarian community). The big majority of the use cases fall squarely within the existing requirements (which is a very nice validation).
FP: Next steps are a) to incorporate the new use cases into the requirements’ ranking analysis, b) for each of the 9 requirements to find an example (or more) of existing repositories that provide this functionality, and c) start producing the final output of the TF in anticipation of P10.
JL: So far there is a focus on the user perspective (which is the goal as well). As a future/next What about connecting with funders and policies?
FP: This is a very interesting approach (but at this point it feels beyond by the scope of this TF). But we should definitely try and either create a new TF to explore this, or incorporate this into one of the existing TF.
JL: There are several existing RDA groups that focus on the repositories and their best practices/recommendations. Maybe connect with them for connecting requirements to repositories (instead of individual effort)?
FP: This is absolutely true, so it would make sense to connect with the other RDA groups on this. For this particular case however, the purpose of this TF is more to have a working example of a given requirement rather than providing guidelines to the other Groups.
SJK: Recipients of recommendations?:
https://www.rd-alliance.org/groups/rdawds-certification-digital-reposito...
https://www.rd-alliance.org/groups/domain-repositories-interest-group.html
Follow-up Actions
Action #1: Integrate the new use-cases into the requirements’ ranking google sheet
Action #2: Identify examples of repositories for each of the requirements
9th Teleconference, July 11th, 2017
Attendees:
- Kathleen Gregory, DANS, NL (KG)
- Fotis Psomopoulos, Institute of Applied Biosciences|CERTH, GR (FP)
- Uwe Schindler, PANGAEA / Apache Software Foundation, DE (US)
- Anusuriya Devaraju, CSIRO, AU (AD)
- Jens Klump, CSIRO, AU (JK)
Notes
Quick round of introductions / background
FP: Overview of the work done in the TF so far. Focus on the ranking requirements spreadsheet. Next goal is to produce the final outcome document for the TF, aiming for after the P10, as well as introduce repository examples for each of the capture requirements.
AD: How is the dataset identified? Can we define software as a dataset?
FP: In our current approach our definition of a dataset is generic and domain-independent. In this regard, in the case that the repository contains software, a dataset can be a single software entry.
Follow-up Actions
Action #1: Integrate the new use-cases into the requirements’ ranking google sheet
Action #2: Identify examples of repositories for each of the requirements
10th Teleconference, August 1st, 2017
Attendees:
- Fotis Psomopoulos, Institute of Applied Biosciences|CERTH, GR (FP)
- Mingfang Wu, ANDS, AU (MW)
- Anusuriya Devaraju, CSIRO, AU (AD)
- Elli Papadopoulou, ATHENA, GR (EP)
Notes
- Quick round of introductions / background
- Discussion on the additional Use-Cases and their relationship to the existing 9 requirements. Final document with the ranked requirements is now ready for output.
- Reviewed some of the example repositories that exhibit the identified requirements. List needs to be extended with additional use cases.
- Next steps should primarily focus on joining the Best Practices documents to the Use-Case requirements.
Follow-up Actions
Action #1: Put all three best practices documents and the ranked requirements into a single document / article.
- Best Practice for Data Repository on Data Discovery
- Best Practice for Data Providers on Data Discovery
- Best Practice for Data Seekers on Data Discovery
- Requirements document
- 4474 reads