Type examples from RDA P3

04 Apr 2014
Groups audience: 

Dear PIT,
please find attached the type examples we collected at the P3 PIT
session in digital form. Thanks again for your input, this was very
creative and helpful work! Perhaps we can repeat this at a later point.
I've done some minor editing while digitizing the papers, but did
intentionally not streamline the different property types that actually
denote the same things. I've added some comments, but this is by far not
a fully-fledged analysis.
There are a couple of rough observations I'd make at this point:
- the elemental property types are: String, URL, PID, DateTime, Boolean,
Integer, Float
- there is a need for complex value types (tuples/lists of the
elementals, denoted with a +), which are very useful to applications,
but need not be interpreted at the PIT API level
- the 'checksum' type may be an example for specialization of the
'String' value type. The PIT API needs not to care, but the different
semantics should be available to the consumer applications.
- some properties have innate dependencies on other properties, e.g.
checksum date and checksum (MD5). Again, the PIT API needs not to know
to do its business, but the consumer wants to know (either by hard-wired
logics or e.g. through information obtained from the type registry).
Best, Tobias
--
Tobias Weigel
Department of Data Management
Deutsches Klimarechenzentrum GmbH (German Climate Computing Center)
Bundesstr. 45a
20146 Hamburg
Germany
Tel.: +49 40 460094 104
E-Mail: ***@***.***
Website: www.dkrz.de
Managing Director: Prof. Dr. Thomas Ludwig
Sitz der Gesellschaft: Hamburg
Amtsgericht Hamburg HRB 39784

File Attachment: 
AttachmentSize
File PIT_type_examples.docx24.3 KB
  • Lars G. Svensson's picture

    Author: Lars G. Svensson

    Date: 16 Apr, 2014

    Dear Tobias,
    thanks for the collection of type examples. Some observations on your observations:
    Dear Tobias,
    thanks for the collection of type examples. Some observations on your observations:
    > There are a couple of rough observations I'd make at this point:
    > - the elemental property types are: String, URL, PID, DateTime, Boolean,
    > Integer, Float
    I think that some of the types we have as strings really are controlled vocabularies or enumerations (e. g. mime types). From the point of view of the API they can of course be strings, but semantically they belong to a controlled vocabulary since the consuming application need to know how to handle the information.
    Dear Tobias,
    thanks for the collection of type examples. Some observations on your observations:
    > There are a couple of rough observations I'd make at this point:
    > - the elemental property types are: String, URL, PID, DateTime, Boolean,
    > Integer, Float
    I think that some of the types we have as strings really are controlled vocabularies or enumerations (e. g. mime types). From the point of view of the API they can of course be strings, but semantically they belong to a controlled vocabulary since the consuming application need to know how to handle the information.
    > - there is a need for complex value types (tuples/lists of the
    > elementals, denoted with a +), which are very useful to applications,
    > but need not be interpreted at the PIT API level
    Yes, and here we need to differentiate between two cases: repeatable properties and properties where we need to preserve order. Looking at e. g. "data reference" in the EUDAT profile, that might be just an (unordered) list of URLs, so we could write that as
    data_reference http://example.com/12345
    data_reference http://example.com/98765
    which could be equivalent to
    data_reference [http://example.com/12345,http://example.com/98765]
    On the other hand there might be cases where the preferred reference is always at the beginning of the list and thus order is important.
    Dear Tobias,
    thanks for the collection of type examples. Some observations on your observations:
    > There are a couple of rough observations I'd make at this point:
    > - the elemental property types are: String, URL, PID, DateTime, Boolean,
    > Integer, Float
    I think that some of the types we have as strings really are controlled vocabularies or enumerations (e. g. mime types). From the point of view of the API they can of course be strings, but semantically they belong to a controlled vocabulary since the consuming application need to know how to handle the information.
    > - there is a need for complex value types (tuples/lists of the
    > elementals, denoted with a +), which are very useful to applications,
    > but need not be interpreted at the PIT API level
    Yes, and here we need to differentiate between two cases: repeatable properties and properties where we need to preserve order. Looking at e. g. "data reference" in the EUDAT profile, that might be just an (unordered) list of URLs, so we could write that as
    data_reference http://example.com/12345
    data_reference http://example.com/98765
    which could be equivalent to
    data_reference [http://example.com/12345,http://example.com/98765]
    On the other hand there might be cases where the preferred reference is always at the beginning of the list and thus order is important.
    > - the 'checksum' type may be an example for specialization of the
    > 'String' value type. The PIT API needs not to care, but the different
    > semantics should be available to the consumer applications.
    Right.
    Dear Tobias,
    thanks for the collection of type examples. Some observations on your observations:
    > There are a couple of rough observations I'd make at this point:
    > - the elemental property types are: String, URL, PID, DateTime, Boolean,
    > Integer, Float
    I think that some of the types we have as strings really are controlled vocabularies or enumerations (e. g. mime types). From the point of view of the API they can of course be strings, but semantically they belong to a controlled vocabulary since the consuming application need to know how to handle the information.
    > - there is a need for complex value types (tuples/lists of the
    > elementals, denoted with a +), which are very useful to applications,
    > but need not be interpreted at the PIT API level
    Yes, and here we need to differentiate between two cases: repeatable properties and properties where we need to preserve order. Looking at e. g. "data reference" in the EUDAT profile, that might be just an (unordered) list of URLs, so we could write that as
    data_reference http://example.com/12345
    data_reference http://example.com/98765
    which could be equivalent to
    data_reference [http://example.com/12345,http://example.com/98765]
    On the other hand there might be cases where the preferred reference is always at the beginning of the list and thus order is important.
    > - the 'checksum' type may be an example for specialization of the
    > 'String' value type. The PIT API needs not to care, but the different
    > semantics should be available to the consumer applications.
    Right.
    > - some properties have innate dependencies on other properties, e.g.
    > checksum date and checksum (MD5). Again, the PIT API needs not to know
    > to do its business, but the consumer wants to know (either by hard-wired
    > logics or e.g. through information obtained from the type registry).
    Well the Java API might want to encapsulate the date information in a java.util.Date object, and this is an example where more explicit typing can be beneficial.
    When it comes to the identification of e.g. persons or organisations I strongly prefer the use of URIs over URLs or plain strings. After all we talk about long-term references and the more location-independent those are, the better.
    Best,
    Lars
    *** Lesen. Hören. Wissen. Deutsche Nationalbibliothek ***
    --
    Dr. Lars G. Svensson
    Deutsche Nationalbibliothek
    Informationstechnologie
    Telefon: +49-69-1525-1752
    mailto:***@***.***
    http://www.dnb.de

  • Tobias Weigel's picture

    Author: Tobias Weigel

    Date: 16 Apr, 2014

    Dear Lars,
    thanks for your observations - comments inline..
    -------- Original Message --------
    Subject: Re: [rda-pid-wg] Type examples from RDA P3
    From: larsgsvensson <***@***.***>
    To: ***@***.***-groups.org <***@***.***-groups.org>
    Date: 16 Apr 2014, 13:52
    Dear Lars,
    thanks for your observations - comments inline..
    -------- Original Message --------
    Subject: Re: [rda-pid-wg] Type examples from RDA P3
    From: larsgsvensson <***@***.***>
    To: ***@***.***-groups.org <***@***.***-groups.org>
    Date: 16 Apr 2014, 13:52
    > Dear Tobias,
    >
    > thanks for the collection of type examples. Some observations on your observations:
    >
    >> There are a couple of rough observations I'd make at this point:
    >> - the elemental property types are: String, URL, PID, DateTime, Boolean,
    >> Integer, Float
    >
    > I think that some of the types we have as strings really are controlled vocabularies or enumerations (e. g. mime types). From the point of view of the API they can of course be strings, but semantically they belong to a controlled vocabulary since the consuming application need to know how to handle the information.
    Absolutely true. We've had discussions on this prior to the P3, and also
    at the P3 - this is an important but also complex topic. Who/how are
    such CVs governed? It was questions such as these which eventually led
    us to stating that the PIT API will see them as dumb strings and leave
    the concrete semantics to a higher layer. Again, this is a clear
    indication why type inheritance would be very beneficial.
    If we are lucky, we may find some clever mechanism during the
    implementation phase that supports CVs indirectly without getting
    tangled up in governance issues. But I wouldn't count on it, unfortunately..
    Dear Lars,
    thanks for your observations - comments inline..
    -------- Original Message --------
    Subject: Re: [rda-pid-wg] Type examples from RDA P3
    From: larsgsvensson <***@***.***>
    To: ***@***.***-groups.org <***@***.***-groups.org>
    Date: 16 Apr 2014, 13:52
    > Dear Tobias,
    >
    > thanks for the collection of type examples. Some observations on your observations:
    >
    >> There are a couple of rough observations I'd make at this point:
    >> - the elemental property types are: String, URL, PID, DateTime, Boolean,
    >> Integer, Float
    >
    > I think that some of the types we have as strings really are controlled vocabularies or enumerations (e. g. mime types). From the point of view of the API they can of course be strings, but semantically they belong to a controlled vocabulary since the consuming application need to know how to handle the information.
    Absolutely true. We've had discussions on this prior to the P3, and also
    at the P3 - this is an important but also complex topic. Who/how are
    such CVs governed? It was questions such as these which eventually led
    us to stating that the PIT API will see them as dumb strings and leave
    the concrete semantics to a higher layer. Again, this is a clear
    indication why type inheritance would be very beneficial.
    If we are lucky, we may find some clever mechanism during the
    implementation phase that supports CVs indirectly without getting
    tangled up in governance issues. But I wouldn't count on it, unfortunately..
    >
    >> - there is a need for complex value types (tuples/lists of the
    >> elementals, denoted with a +), which are very useful to applications,
    >> but need not be interpreted at the PIT API level
    >
    > Yes, and here we need to differentiate between two cases: repeatable properties and properties where we need to preserve order. Looking at e. g. "data reference" in the EUDAT profile, that might be just an (unordered) list of URLs, so we could write that as
    >
    > data_reference http://example.com/12345
    > data_reference http://example.com/98765
    >
    > which could be equivalent to
    >
    > data_reference [http://example.com/12345,http://example.com/98765]
    >
    > On the other hand there might be cases where the preferred reference is always at the beginning of the list and thus order is important.
    Also correct. We discussed this issue during our VCs in
    December/January: do we offer list- and set-based models to capture the
    ordered/unordered viewpoints? Our answer was in line with the Controlled
    Vocab issue - leave it out of the PIT API for now because it gets
    complex very quickly.
    Rest assured, I aim to have a solution, because I have many pressing use
    cases; it may however not be part of this WG. (Hint!)
    Dear Lars,
    thanks for your observations - comments inline..
    -------- Original Message --------
    Subject: Re: [rda-pid-wg] Type examples from RDA P3
    From: larsgsvensson <***@***.***>
    To: ***@***.***-groups.org <***@***.***-groups.org>
    Date: 16 Apr 2014, 13:52
    > Dear Tobias,
    >
    > thanks for the collection of type examples. Some observations on your observations:
    >
    >> There are a couple of rough observations I'd make at this point:
    >> - the elemental property types are: String, URL, PID, DateTime, Boolean,
    >> Integer, Float
    >
    > I think that some of the types we have as strings really are controlled vocabularies or enumerations (e. g. mime types). From the point of view of the API they can of course be strings, but semantically they belong to a controlled vocabulary since the consuming application need to know how to handle the information.
    Absolutely true. We've had discussions on this prior to the P3, and also
    at the P3 - this is an important but also complex topic. Who/how are
    such CVs governed? It was questions such as these which eventually led
    us to stating that the PIT API will see them as dumb strings and leave
    the concrete semantics to a higher layer. Again, this is a clear
    indication why type inheritance would be very beneficial.
    If we are lucky, we may find some clever mechanism during the
    implementation phase that supports CVs indirectly without getting
    tangled up in governance issues. But I wouldn't count on it, unfortunately..
    >
    >> - there is a need for complex value types (tuples/lists of the
    >> elementals, denoted with a +), which are very useful to applications,
    >> but need not be interpreted at the PIT API level
    >
    > Yes, and here we need to differentiate between two cases: repeatable properties and properties where we need to preserve order. Looking at e. g. "data reference" in the EUDAT profile, that might be just an (unordered) list of URLs, so we could write that as
    >
    > data_reference http://example.com/12345
    > data_reference http://example.com/98765
    >
    > which could be equivalent to
    >
    > data_reference [http://example.com/12345,http://example.com/98765]
    >
    > On the other hand there might be cases where the preferred reference is always at the beginning of the list and thus order is important.
    Also correct. We discussed this issue during our VCs in
    December/January: do we offer list- and set-based models to capture the
    ordered/unordered viewpoints? Our answer was in line with the Controlled
    Vocab issue - leave it out of the PIT API for now because it gets
    complex very quickly.
    Rest assured, I aim to have a solution, because I have many pressing use
    cases; it may however not be part of this WG. (Hint!)
    >
    >> - the 'checksum' type may be an example for specialization of the
    >> 'String' value type. The PIT API needs not to care, but the different
    >> semantics should be available to the consumer applications.
    >
    > Right.
    >
    >> - some properties have innate dependencies on other properties, e.g.
    >> checksum date and checksum (MD5). Again, the PIT API needs not to know
    >> to do its business, but the consumer wants to know (either by hard-wired
    >> logics or e.g. through information obtained from the type registry).
    >
    > Well the Java API might want to encapsulate the date information in a java.util.Date object, and this is an example where more explicit typing can be beneficial.
    Rephrase: The Java API *will* encapsulate this in java.util.Date. ;)
    Dear Lars,
    thanks for your observations - comments inline..
    -------- Original Message --------
    Subject: Re: [rda-pid-wg] Type examples from RDA P3
    From: larsgsvensson <***@***.***>
    To: ***@***.***-groups.org <***@***.***-groups.org>
    Date: 16 Apr 2014, 13:52
    > Dear Tobias,
    >
    > thanks for the collection of type examples. Some observations on your observations:
    >
    >> There are a couple of rough observations I'd make at this point:
    >> - the elemental property types are: String, URL, PID, DateTime, Boolean,
    >> Integer, Float
    >
    > I think that some of the types we have as strings really are controlled vocabularies or enumerations (e. g. mime types). From the point of view of the API they can of course be strings, but semantically they belong to a controlled vocabulary since the consuming application need to know how to handle the information.
    Absolutely true. We've had discussions on this prior to the P3, and also
    at the P3 - this is an important but also complex topic. Who/how are
    such CVs governed? It was questions such as these which eventually led
    us to stating that the PIT API will see them as dumb strings and leave
    the concrete semantics to a higher layer. Again, this is a clear
    indication why type inheritance would be very beneficial.
    If we are lucky, we may find some clever mechanism during the
    implementation phase that supports CVs indirectly without getting
    tangled up in governance issues. But I wouldn't count on it, unfortunately..
    >
    >> - there is a need for complex value types (tuples/lists of the
    >> elementals, denoted with a +), which are very useful to applications,
    >> but need not be interpreted at the PIT API level
    >
    > Yes, and here we need to differentiate between two cases: repeatable properties and properties where we need to preserve order. Looking at e. g. "data reference" in the EUDAT profile, that might be just an (unordered) list of URLs, so we could write that as
    >
    > data_reference http://example.com/12345
    > data_reference http://example.com/98765
    >
    > which could be equivalent to
    >
    > data_reference [http://example.com/12345,http://example.com/98765]
    >
    > On the other hand there might be cases where the preferred reference is always at the beginning of the list and thus order is important.
    Also correct. We discussed this issue during our VCs in
    December/January: do we offer list- and set-based models to capture the
    ordered/unordered viewpoints? Our answer was in line with the Controlled
    Vocab issue - leave it out of the PIT API for now because it gets
    complex very quickly.
    Rest assured, I aim to have a solution, because I have many pressing use
    cases; it may however not be part of this WG. (Hint!)
    >
    >> - the 'checksum' type may be an example for specialization of the
    >> 'String' value type. The PIT API needs not to care, but the different
    >> semantics should be available to the consumer applications.
    >
    > Right.
    >
    >> - some properties have innate dependencies on other properties, e.g.
    >> checksum date and checksum (MD5). Again, the PIT API needs not to know
    >> to do its business, but the consumer wants to know (either by hard-wired
    >> logics or e.g. through information obtained from the type registry).
    >
    > Well the Java API might want to encapsulate the date information in a java.util.Date object, and this is an example where more explicit typing can be beneficial.
    Rephrase: The Java API *will* encapsulate this in java.util.Date. ;)
    >
    > When it comes to the identification of e.g. persons or organisations I strongly prefer the use of URIs over URLs or plain strings. After all we talk about long-term references and the more location-independent those are, the better.
    That's why we have a value type PID - it implies
    actionability/resolvability. DataCite made a good point by including
    more than a dozen species of relatedIdentifierTypes.
    Also, I can see how the questions you asked must be answered in our
    final deliverables, explaining why we did not do certain things - and
    which strategies we suggest to address them in the future.
    Best, Tobias

submit a comment