Dear all,
I got a question for all the metadata experts out there:
If I understand DataCite right, you'll always have one metadata record
for one single resource and as a consequence one separate metadata
file for each resource. So if you have for instance a collections of
n datasets and you want to describe the collection as a whole and also
every single dataset, you will need n+1 metadata files. These
metadata records should refer to each other via the RelatedIdentifier
property with relationType IsPartOf and HasPart respectively.
Is that understanding correct? Or is there any way with DataCite to
put the metadata for the whole collection and for the individual
datasets in one single metadata file?
If the above is correct, we might need to amend or recommendations
document to allow for multiple metadata files using one single schema.
I'd assume that a collection of dataset is a rather common use case
that we need to take into account. Maybe allow something like:
datacite.xml
datacite-ds001.xml
datacite-ds002.xml
datacite-ds003.xml
datacite-ds004.xml
in the META-INF folder, where datacite.xml describes the collection
and datacite-
.xml describes dataset
.
Best regards,
Rolf
--
Rolf Krahl <***@***.***-berlin.de>
Helmholtz-Zentrum Berlin für Materialien und Energie (HZB)
Albert-Einstein-Str. 15, 12489 Berlin
Tel.: +49 30 8062 12122
- Log in to post comments
- 7633 reads
Author: Stefan E. Funk
Date: 13 Sep, 2017
Dear Rolf, dear all,
that indeed is a good question. If we defined our bags to be
"single-objected", as in the DARIAH-DE Repository BagIt bags will be
used, we have no problem: we have got one content file, and a bunch of
metadata files (and one DataCite metadata file).
How are all the other repositories using the BagIt bags? Are they all
singel-objected, too? If yes, we would only need one more metadata file,
the DataCite one.
All the best,
Stefan
--
Stefan E. Funk
Abteilung Forschung & Entwicklung
Georg-August-Universität Göttingen
Niedersächsische Staats- und Universitätsbibliothek Göttingen
D-37070 Göttingen
Papendiek 14 (Historisches Gebäude, Raum 2.409)
+49 551 39-7700 (Tel)
+49 551 39-3856 (Fax)
***@***.***-goettingen.de
http://www.sub.uni-goettingen.de
http://www.rdd.sub.uni-goettingen.de
Author: Thomas Jejkal
Date: 13 Sep, 2017
Dear all,
that’s really a good point and I see the same ‘limitations’ if using DataCite. In our case statement and in the primer document we are talking about ‘Migration/Replication of a Digital Object[…]’ which (for me) implies a single resource. Of course, this depends on the definition of ‘Digital Object’. If we define a Collection to be a Digital Object/resource, too, we are fine as long as we describe only the collection itself as bag content within datacite.xml, which should be possible.
If we want to describe also single resources of the collection we’ll need some hierarchical approach for providing generic metadata, e.g. in the form suggested by Rolf. The problem here is, that the single part-ids (ds001 – ds004) must be mapped to the single collection resources as well as to the payload located in the ‘data’ folder. Thus, adopters of our recommendations may have to change the structure/naming of the payload which could break existing implementations. However, we should continue this discussion in a couple of minutes.
Regards,
Thomas.
—
Karlsruhe Institute of Technology (KIT)
Institute for Data Processing and Electronics
Hermann-von-Helmholtz-Platz 1
76344 Eggenstein-Leopoldshafen
Germany
fon : +49 721 608-24042
fax : +49 721 608-23560
ORCID: http://orcid.org/0000-0003-2804-688X
---------------------------------------------------------
Macht es, kosmisch betrachtet, wirklich was aus, wenn ich nicht aufstehe und arbeiten gehe?
-Douglas Adams-
Am 13.09.17, 14:09 schrieb "funk=***@***.***-groups.org im Auftrag von StefanFunk" <***@***.***-groups.org im Auftrag von ***@***.***-goettingen.de>:
Dear Rolf, dear all,
that indeed is a good question. If we defined our bags to be
"single-objected", as in the DARIAH-DE Repository BagIt bags will be
used, we have no problem: we have got one content file, and a bunch of
metadata files (and one DataCite metadata file).
How are all the other repositories using the BagIt bags? Are they all
singel-objected, too? If yes, we would only need one more metadata file,
the DataCite one.
All the best,
Stefan
--
Stefan E. Funk
Abteilung Forschung & Entwicklung
Georg-August-Universität Göttingen
Niedersächsische Staats- und Universitätsbibliothek Göttingen
D-37070 Göttingen
Papendiek 14 (Historisches Gebäude, Raum 2.409)
+49 551 39-7700 (Tel)
+49 551 39-3856 (Fax)
***@***.***-goettingen.de
http://www.sub.uni-goettingen.de
http://www.rdd.sub.uni-goettingen.de
Author: Claire Herbert
Date: 13 Sep, 2017
What about using the definition from the Research Data Collection WG as the
definition(s) for collection, to keep inline with other WG?
https://www.rd-alliance.org/group/research-data-collections-wg/wiki/coll...
Claire
Author: Tobias Weigel
Date: 14 Sep, 2017
fwiw, you could also look into our draft RDA recommendation:
https://github.com/RDACollectionsWG/specification
Collections are just a specific kind of DO, and the recommendation
reflects that. So you can indeed hide much of the hierarchy complexity.
Best, Tobias
Author: Thomas Jejkal
Date: 14 Sep, 2017
Thank you for this comment, Tobias. I’ve checked your definition of ‘Collection’ today in the morning and added it to our recommendations document. Following this definition, we are also supporting collections as they are defined as digital objects.
However, the discussion we’ve started yesterday was related to the use case of packaging a ‘local’ collection of datasets stored in one repository instance as multiple zip files in one package with the possibility to add (DataCite) metadata for the entire collection as well as for the single datasets at a defined location in the package following a defined naming scheme. Thus, we must be able to reflect, identify and address the single elements of the collection within the package.
Regards,
Thomas
—
Karlsruhe Institute of Technology (KIT)
Institute for Data Processing and Electronics
Hermann-von-Helmholtz-Platz 1
76344 Eggenstein-Leopoldshafen
Germany
fon : +49 721 608-24042
fax : +49 721 608-23560
ORCID: http://orcid.org/0000-0003-2804-688X
---------------------------------------------------------
Macht es, kosmisch betrachtet, wirklich was aus, wenn ich nicht aufstehe und arbeiten gehe?
-Douglas Adams-
Am 14.09.17, 09:11 schrieb "Tobias Weigel" <***@***.***>:
fwiw, you could also look into our draft RDA recommendation:
https://github.com/RDACollectionsWG/specification
Collections are just a specific kind of DO, and the recommendation
reflects that. So you can indeed hide much of the hierarchy complexity.
Best, Tobias
--
Dr. Tobias Weigel
Abteilung Datenmanagement
Deutsches Klimarechenzentrum GmbH (DKRZ)
Bundesstraße 45 a • 20146 Hamburg • Germany
Phone: +49 40 460094-104
Email: ***@***.***
URL: http://www.dkrz.de
ORCID: orcid.org/0000-0002-4040-0215
Geschäftsführer: Prof. Dr. Thomas Ludwig
Sitz der Gesellschaft: Hamburg
Amtsgericht Hamburg HRB 39784
Author: Rolf Krahl
Date: 14 Sep, 2017
Dear Claire, Tobias & all,
Thank you for the pointers! Both definitions match very well the
particular use case I had in mind. However, this is rather orthogonal
to what we are doing in our recommendations document. This document
considers best practices on how to package digital objects for the
transport from on repository to another and how to add the metadata to
these packages such that the receiving end will be able to find them.
We are pretty agnostic on what these objects are after all.
Best regards,
Rolf
--
Rolf Krahl <***@***.***-berlin.de>
Helmholtz-Zentrum Berlin für Materialien und Energie (HZB)
Albert-Einstein-Str. 15, 12489 Berlin
Tel.: +49 30 8062 12122
Author: Ulrich Schwardmann
Date: 14 Sep, 2017
Dear Rolf,
even if I don't exactly know, what your current recommondations say, I
would assume, that the collection definition is not really orthogonal.
Because a collection is a DO, you can handle it as the others, you
ingest it in the new rep with the recommended packaging. In the new
environment it gets a new PID and is a new collection, refering to all
its old members. The new collection with its new PID has just the same
tree structure of other collections as before, finally with
non-collection DOs as its leaves as before.
Things become more expensive, if you want to actually transfer the whole
tree. This would be an iterative ingest in the new rep by backtracking
through the collections tree. After each ingest one replaces the old
member PID in the collection structure with the new one created by
ingest. Finally for each member in the tree you would have a new DO in
the new rep and you have all references in the new collection set properly.
However if you are looking for a one step ingest for this whole process,
you need a relatively complicated packaging schema, and you have to
setup the PID structure for all contained DOs after the ingest of that
package anyway. So this seems to me rather expensive and without a mayor
advantage.
If you see here the orthogonality of these approaches, then you might be
right. The collection definition we have in the coloction WG is more
intended for getting an overview but for step by step processing.