Data Types - Data Model - Revision 2.0

10 May 2014

All,

It's been a while since we saw some activity on this mailing list. However, outside of this mailing list, some of us had several discussions with various DTR members and PIT working group members regarding the scope and data model for capturing type records. The goal of our discussions with the PIT working group members was to align their definition of types with ours. In the end, I'd say that we were successful in aligning our vision and scope - if you ignore a few terminology differences. The latest revision of the DTR data model (revision 2.0) is in the file depot here: https://rd-alliance.org/filedepot?cid=101&fid=506

The section at the end of the document lists issues and items we would have to consider in future revisions of the data model. That section includes one or two items that PIT working group already assumes to be present in their types. Since type records are open-ended, it is still possible to have PIT type definitions recorded in a DTR instance. However, the DTR may not have standard vocabulary for capturing some of those values yet.

CNRI plans to release a DTR instance in a week or so that implements the revision 2.0 of the data model. Any quick feedback will be included in the new prototype as far as possible, but please send in feedback whenever you can for discussion with the group and to inform future releases.

Giridhar, Christophe, and Larry

  • Simon Cox's picture

    Author: Simon Cox

    Date: 12 May, 2014

    I've added a few comments to the document - see https://rd-alliance.org/filedepot?cid=101&fid=507 

  • Thomas Zastrow's picture

    Author: Thomas Zastrow

    Date: 12 May, 2014

    Dear all,
    At first, thanks for the new definition and the integration of things we
    discussed the last weeks in scope of the PIT API. Still, I have some
    comments on the doc, especially I would need some clarification when it
    comes to nested types / inheritance etc.
    I took Simon's doc (hopefully that was OK) and added my comments to it.
    Unfortunately, it seems so that I'm not allowed to upload files to the
    file depot anymore, so I'm adding the file as attachment to this mail.
    Best regards from Munich,
    Tom

  • Larry Lannom's picture

    Author: Larry Lannom

    Date: 13 May, 2014

    I added the Tom/Simon commented doc to the File Depot (a somewhat confusing place)

    https://rd-alliance.org/filedepot?fid=508

    Larry

     

     

  • Simon Cox's picture

    Author: Simon Cox

    Date: 12 May, 2014

    One immediate response to Tom (before it spirals out of control!)
    In response to “Assigning unique IDs to properties seems like a good long-term solution to these problems[TZ1] ” Tome made the marginal comment “Do we really need that? If a property is fully qualified by its type and its name (“wine.age”, “person.age”) and the type has a unique identifier and it will be not allowed to define more than one property with the same name per type, we are fine ☺”
    I would agree with Tom that, provided the scope is uniquely identified, the local-name for a property does not need to be complicated. But I had not inferred a unique _local_ name anyway. I’m aware I’m tangling with the handle community here, but as a Cool-URI kind of guy myself, I’m totally comfortable with a long unique fully-qualified identifier having a short, memorable final fragment! Follow my links to see a few examples:
    http://environment.data.gov.au/def/property/ and http://environment.data.gov.au/def/object/
    Simon
    - Show quoted text -From: thomas.zastrow=***@***.***-groups.org [mailto:***@***.***-groups.org] On Behalf Of ThomasZastrow
    Sent: Monday, 12 May 2014 6:14 PM
    To: ***@***.***-groups.org
    Subject: Re: [rda-dtr-wg] Data Types - Data Model - Revision 2.0
    Dear all,
    At first, thanks for the new definition and the integration of things we discussed the last weeks in scope of the PIT API. Still, I have some comments on the doc, especially I would need some clarification when it comes to nested types / inheritance etc.
    I took Simon's doc (hopefully that was OK) and added my comments to it. Unfortunately, it seems so that I'm not allowed to upload files to the file depot anymore, so I'm adding the file as attachment to this mail.
    Best regards from Munich,
    Tom
    Am 12.05.2014 03:55, schrieb simon.cox:
    I've added a few comments to the document - see https://rd-alliance.org/filedepot?cid=101&fid=507
    --
    Full post: https://rd-alliance.org/data-types-data-model-revision-20.html
    Manage my subscriptions: https://rd-alliance.org/mailinglist
    Stop emails for this post: https://rd-alliance.org/mailinglist/unsubscribe/5506
    --
    Dr. Thomas Zastrow
    Rechenzentrum Garching (RZG) der Max-Planck-Gesellschaft / MPI für Plasmaphysik
    Boltzmannstrasse 2, D-85748 Garching
    Tel +49-89-3299-1457
    http://www.rzg.mpg.de
    ________________________________
    [TZ1]Do we really need that? If a property is fully qualified by its type and its name (“wine.age”, “person.age”) and the type has a unique identifier and it will be not allowed to define more than one property with the same name per type, we are fine ☺

  • Thomas Zastrow's picture

    Author: Thomas Zastrow

    Date: 12 May, 2014

    Dear Simon,
    I had something like the EPIC part identifiers in my mind:
    http://www.clarin.eu/faq/how-do-i-issue-part-identifier-epic-handle
    Best,
    Tom

  • Jeff Braswell's picture

    Author: Jeff Braswell

    Date: 12 May, 2014

    Is this not mainly a (very significant) matter of the ontological harmonization and concordance of name spaces and the semantics of controlled vocabularies ? For which the solutions and resolutions are either, if "global", non-trivial or, if not, then operationally (locally) decoupled ?

  • Simon Cox's picture

    Author: Simon Cox

    Date: 13 May, 2014

    Else you accept that there is a marketplace out there.
    Many ontologies and vocabularies are being published.
    E.g. http://lov.okfn.org/dataset/lov/
    Some are used, others not so will just fade away.
    It might just be most practical to use a popularity contest as the route to 'harmonization'.
    Pragmatic, imperfect, but realistic.
    - Show quoted text -From: LJB=***@***.***-groups.org [mailto:***@***.***-groups.org] On Behalf Of TahoeBlue
    Sent: Monday, 12 May 2014 11:52 PM
    To: ***@***.***-groups.org
    Subject: Re: [rda-dtr-wg] Data Types - Data Model - Revision 2.0
    Is this not mainly a (very significant) matter of the ontological harmonization and concordance of name spaces and the semantics of controlled vocabularies ? For which the solutions and resolutions are either, if "global", non-trivial or, if not, then operationally (locally) decoupled ?
    On May 12, 2014, at 3:43 AM, simon.cox wrote:
    One immediate response to Tom (before it spirals out of control!)
    In response to "Assigning unique IDs to properties seems like a good long-term solution to these problems[TZ1] " Tome made the marginal comment "Do we really need that? If a property is fully qualified by its type and its name ("wine.age", "person.age") and the type has a unique identifier and it will be not allowed to define more than one property with the same name per type, we are fine :)"
    I would agree with Tom that, provided the scope is uniquely identified, the local-name for a property does not need to be complicated. But I had not inferred a unique _local_ name anyway. I'm aware I'm tangling with the handle community here, but as a Cool-URI kind of guy myself, I'm totally comfortable with a long unique fully-qualified identifier having a short, memorable final fragment! Follow my links to see a few examples:
    http://environment.data.gov.au/def/property/ and http://environment.data.gov.au/def/object/
    Simon
    From: thomas.zastrow=***@***.***-groups.org [mailto:***@***.***-groups.org] On Behalf OfThomasZastrow
    Sent: Monday, 12 May 2014 6:14 PM
    To: ***@***.***-groups.org
    Subject: Re: [rda-dtr-wg] Data Types - Data Model - Revision 2.0
    Dear all,
    At first, thanks for the new definition and the integration of things we discussed the last weeks in scope of the PIT API. Still, I have some comments on the doc, especially I would need some clarification when it comes to nested types / inheritance etc.
    I took Simon's doc (hopefully that was OK) and added my comments to it. Unfortunately, it seems so that I'm not allowed to upload files to the file depot anymore, so I'm adding the file as attachment to this mail.
    Best regards from Munich,
    Tom
    Am 12.05.2014 03:55, schrieb simon.cox:
    I've added a few comments to the document - see https://rd-alliance.org/filedepot?cid=101&fid=507
    --
    Full post: https://rd-alliance.org/data-types-data-model-revision-20.html
    Manage my subscriptions: https://rd-alliance.org/mailinglist
    Stop emails for this post: https://rd-alliance.org/mailinglist/unsubscribe/5506
    --
    Dr. Thomas Zastrow
    Rechenzentrum Garching (RZG) der Max-Planck-Gesellschaft / MPI für Plasmaphysik
    Boltzmannstrasse 2, D-85748 Garching
    Tel +49-89-3299-1457
    http://www.rzg.mpg.de
    ________________________________
    [TZ1]Do we really need that? If a property is fully qualified by its type and its name ("wine.age", "person.age") and the type has a unique identifier and it will be not allowed to define more than one property with the same name per type, we are fine :)
    --
    Full post: https://www.rd-alliance.org/data-types-data-model-revision-20.html
    Manage my subscriptions: https://www.rd-alliance.org/mailinglist
    Stop emails for this post: https://www.rd-alliance.org/mailinglist/unsubscribe/5506
    Else you accept that there is a marketplace out there.
    Many ontologies and vocabularies are being published.
    E.g. http://lov.okfn.org/dataset/lov/
    Some are used, others not so will just fade away.
    It might just be most practical to use a popularity contest as the route to 'harmonization'.
    Pragmatic, imperfect, but realistic.
    From: LJB=***@***.***-groups.org [mailto:***@***.***-groups.org] On Behalf Of TahoeBlue
    Sent: Monday, 12 May 2014 11:52 PM
    To: ***@***.***-groups.org
    Subject: Re: [rda-dtr-wg] Data Types - Data Model - Revision 2.0
    Is this not mainly a (very significant) matter of the ontological harmonization and concordance of name spaces and the semantics of controlled vocabularies ? For which the solutions and resolutions are either, if "global", non-trivial or, if not, then operationally (locally) decoupled ?
    On May 12, 2014, at 3:43 AM, simon.cox wrote:
    One immediate response to Tom (before it spirals out of control!)
    In response to "Assigning unique IDs to properties seems like a good long-term solution to these problems[TZ1] " Tome made the marginal comment "Do we really need that? If a property is fully qualified by its type and its name ("wine.age", "person.age") and the type has a unique identifier and it will be not allowed to define more than one property with the same name per type, we are fine :)"
    I would agree with Tom that, provided the scope is uniquely identified, the local-name for a property does not need to be complicated. But I had not inferred a unique _local_ name anyway. I'm aware I'm tangling with the handle community here, but as a Cool-URI kind of guy myself, I'm totally comfortable with a long unique fully-qualified identifier having a short, memorable final fragment! Follow my links to see a few examples:
    http://environment.data.gov.au/def/property/ and http://environment.data.gov.au/def/object/
    Simon
    - Show quoted text -From: thomas.zastrow=***@***.***-groups.org [mailto:***@***.***-groups.org] On Behalf OfThomasZastrow
    Sent: Monday, 12 May 2014 6:14 PM
    To: ***@***.***-groups.org
    Subject: Re: [rda-dtr-wg] Data Types - Data Model - Revision 2.0
    Dear all,
    At first, thanks for the new definition and the integration of things we discussed the last weeks in scope of the PIT API. Still, I have some comments on the doc, especially I would need some clarification when it comes to nested types / inheritance etc.
    I took Simon's doc (hopefully that was OK) and added my comments to it. Unfortunately, it seems so that I'm not allowed to upload files to the file depot anymore, so I'm adding the file as attachment to this mail.
    Best regards from Munich,
    Tom
    Am 12.05.2014 03:55, schrieb simon.cox:
    I've added a few comments to the document - see https://rd-alliance.org/filedepot?cid=101&fid=507
    --
    Full post: https://rd-alliance.org/data-types-data-model-revision-20.html
    Manage my subscriptions: https://rd-alliance.org/mailinglist
    Stop emails for this post: https://rd-alliance.org/mailinglist/unsubscribe/5506
    --
    Dr. Thomas Zastrow
    Rechenzentrum Garching (RZG) der Max-Planck-Gesellschaft / MPI für Plasmaphysik
    Boltzmannstrasse 2, D-85748 Garching
    Tel +49-89-3299-1457
    http://www.rzg.mpg.de
    ________________________________
    [TZ1]Do we really need that? If a property is fully qualified by its type and its name ("wine.age", "person.age") and the type has a unique identifier and it will be not allowed to define more than one property with the same name per type, we are fine :)
    --
    Full post: https://www.rd-alliance.org/data-types-data-model-revision-20.html
    Manage my subscriptions: https://www.rd-alliance.org/mailinglist
    Stop emails for this post: https://www.rd-alliance.org/mailinglist/unsubscribe/5506

  • Larry Lannom's picture

    Author: Larry Lannom

    Date: 13 May, 2014

    I had better let the authors of the 2.0 document respond to Simon and Tom in detail, lest I mess it up, but I wanted to make a few general comments of my own in passing.

    In terms of vocabulary harmonization -- it will never be the case that everyone will always mean the same thing with the same terms or that different terms won't sometimes reference the same thing. And by 'terms' I include abstract concepts and the components composing them, all potentially identified by non-semantic identifiers. All we can do is try to introduce clarity and as much precision as possible by providing tools that let people define what they mean. And to make that as easy as possible. One of the driving ideas behind a system of type registries is to make it easy to define a type even if much of the rest of the planet thinks your type isn't worth the bits used to describe it. If you use it, we want you to be able to define it and reference that definition in such a way that others can understand you, both now and into the future. I take most of the comments from Simon and Tom to address the mechanisms for doing that.

    Identifiers -- one way to get precision is to use unique identifiers to point to the types and/or components of types. That's one way to look at Jeff's global/local comment -- we probably can't agree on a universal vocabulary, but we can identify all the pieces unambiguously. This group is not the right place to start a new front on the identifier wars (and here I thought Simon was such a nice guy) but it will have to be addressed in individual registries and also in considering federation of registries.

     Larry

     

  • Thomas Zastrow's picture

    Author: Thomas Zastrow

    Date: 13 May, 2014

    Thanks Larry!

  • Stephen Richard's picture

    Author: Stephen Richard

    Date: 13 May, 2014

    Colleagues—
    I studied the document (v2), and have edited it to a v3 to try and express my understanding of what seems to be the intention. I missed the meeting in Dublin and discussions, so these comments are based on what I read and the comments from Simon Cox and Thomas Zastrow.
    Am I correct to infer that the idea is to register information models as named ‘Data Types’? Each data type specifies a collection of properties. Each property must have a label and a data type required for valid instance values. Each property should have an assigned unique identifier and cardinality (multiplicity, optionality). Each property may have an associated range restriction on possible instance values.
    This concept of a Data Type corresponds to an abstract data model. To be implemented and used, an encoding scheme and syntax must be defined to represent the information in data instances for people or machines. Example implementations might be (at different levels of abstraction…) XML, JSON, CSV, RDF, GML, html, GeoSciML, WaterML. Is the implementation of the information model part of the data type registration?
    In the big picture of data interoperability, we need to be able to register models at all of these various levels of abstraction, and record the relationships between them, so I hope the answers are both ‘yes’.
    steve
    Stephen M Richard
    Arizona Geological Survey
    416 W. Congress #100
    Tucson, AZ
    AZGS: 520-770-3500
    Office: 520-209-4127
    FAX: 520-770-3505
    - Show quoted text -From: llannom=***@***.***-groups.org [mailto:***@***.***-groups.org] On Behalf Of llannom
    Sent: Tuesday, May 13, 2014 6:43 AM
    To: Data Type Registries WG
    Subject: Re: [rda-dtr-wg] Data Types - Data Model - Revision 2.0
    I added the Tom/Simon commented doc to the File Depot (a somewhat confusing place)
    https://rd-alliance.org/filedepot?fid=508
    Larry
    --
    Full post: https://rd-alliance.org/data-types-data-model-revision-20.html
    Manage my subscriptions: https://rd-alliance.org/mailinglist
    Stop emails for this post: https://rd-alliance.org/mailinglist/unsubscribe/5506

  • Christophe Blanchi's picture

    Author: Christophe Blanchi

    Date: 13 May, 2014

    Hi
    I am including a commented version of the V2.0 of the document with
    Simon and Thomas's comments.
    Hi
    I am including a commented version of the V2.0 of the document with
    Simon and Thomas's comments.
    I have a couple of inline comments to Steve's email;
    >
    > Am I correct to infer that the idea is to register information models
    > as named ‘Data Types’? Each data type specifies a collection of
    > properties. Each property must have a label and a data type required
    > for valid instance values. Each property should have an assigned
    > unique identifier and cardinality (multiplicity, optionality). Each
    > property may have an associated range restriction on possible instance
    > values.
    >
    Hi
    I am including a commented version of the V2.0 of the document with
    Simon and Thomas's comments.
    I have a couple of inline comments to Steve's email;
    >
    > Am I correct to infer that the idea is to register information models
    > as named ‘Data Types’? Each data type specifies a collection of
    > properties. Each property must have a label and a data type required
    > for valid instance values. Each property should have an assigned
    > unique identifier and cardinality (multiplicity, optionality). Each
    > property may have an associated range restriction on possible instance
    > values.
    >
    Yes.
    >
    > This concept of a Data Type corresponds to an abstract data model. To
    > be implemented and used, an encoding scheme and syntax must be defined
    > to represent the information in data instances for people or machines.
    > Example implementations might be (at different levels of abstraction…)
    > XML, JSON, CSV, RDF, GML, html, GeoSciML, WaterML. Is the
    > implementation of the information model part of the data type
    > registration?
    >
    Yes. How instance of this data model are stored in the type registry
    will be specific to its implementation but the registry would make those
    Yes.
    Thanks
    Christophe

  • Christophe Blanchi's picture

    Author: Christophe Blanchi

    Date: 14 May, 2014

    Hi, 

     

    I added the Data Types Data Model 2.0 document with my comments at:

    https://rd-alliance.org/filedepot?cid=101&fid=510

     

    Thanks

     

    Christophe

  • Christophe Blanchi's picture

    Author: Christophe Blanchi

    Date: 14 May, 2014

    Hi, 

     

    I added the Data Types Data Model 2.0 document with my comments at:

    https://rd-alliance.org/filedepot?cid=101&fid=510

     

    Thanks

     

    Christophe

submit a comment