Development of Data Publishing Vocabulary

02 Jul 2015

In light of this WGs progress on data publishing concepts perhaps it is time to update terms in the DFT term tool.

see http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page

 

We have an existing definition of Data Publishing that seems to a be 9-10 months old (prior to winter ESIP meeting)

The RDA/WDS Publishing Data Workflows WG proposes a definition as Data Publication as:
 
"The process whereby data are subjected to an assessment process to determine whether they should be acquired by a repository; followed by a rigorous acquisition and ingest process that results in products being publicly made available and supported for the long-term by that repository."
 
Recent work by some members of the group had a more of a definition which started with:
 
 
"Data Publishing describes the process of making research data and other research objects available on the Web so that they can be discovered and referred to in a unique and persistent way."
 
Should the existing definition in the DFT tool be updated ?  These may be small additions but of interest to the group.
 
One question is whether publication is only to the Web as noted in this definition or whether it is a broader concept.
 
Note, we don't yet have a definition of "Data Publishing Workflows" in the tool and that might be a place to start with some related terms flowing from there.
  • Leonardo Candela's picture

    Author: Leonardo Candela

    Date: 02 Jul, 2015

    A key decision to be taken is whether to "characterise" the term data or not and then use this decision accordingly. 

    If the aim is to "deal" with any "data", then it seems OK to try to define "data publishing". If this is the case the definition of "data publishing" should contain the term "data" referring to its definition (rather than "... making research data and other research objects ...").

    However, since the project name suggests (RDA stands for "Research Data"), I would like to use this as "main concept". Thus the term you are aiming to define should be "Research Data Publishing" rather than "Data Publishing".

    In a paper my colleagues and I are working on we are using something like:

    • Research Data: entities used as evidence of phenomena for the purpose of research or scholarship;
      • This definition is borrowed from Borgman, C. L., 2015. Big Data, Little Data, No Data: Scholarship in the Networked World. The MIT Press.
    • Research Data Publishing: the release of research data for (re)use by others. 
      • The key aspects are
        • "release", i.e. "making available to the public";
        • "(re)use", i.e. the motivation of publishing is to make it possible for others to use the "product";
      • The how publishing is expected to be implement should not be part of the definition. Publishing can happen by a "data paper", by the deposition in a repository, etc.

    It is also useful to use the concept of dataset here to introduce the concept of "unit of information". Thus, a "dataset" is a unit of "research data" (no matter how many files and entities) subject of a "research data publishing" act.

     

  • Gary Berg-Cross's picture

    Author: Gary Berg-Cross

    Date: 02 Jul, 2015

    Leonardo,
    Thanks for the prompt response. I have a few (2 cents worth of) thoughts
    on what you said and I expect that some of the main folks involved in the
    WG will too, such as on the idea that how publishing is implemented (say to
    the Web) is now part of the definition. I wonder of types of
    publishing/release such as to the web or a journal are distinguished as
    publishing sub-types. They appear in workflows for examples and may be
    distinguished there. So perhaps as work terms for the workflow are
    discussed there will be suitable definitions to make things clear.
    >A key decision to be taken is whether to "characterise" the term data or
    not and then use this decision accordingly.
    >If the aim is to "deal" with any "data", then it seems OK to try to define
    "data publishing". If this is the case the definition of "data publishing"
    should contain the term "data" referring to its definition (rather >than
    "... making research data and other research objects ...").
    >However, since the project name suggests (RDA stands for "Research Data"),
    I would like to use >this as "main concept". Thus the term you are aiming
    to define should be "Research Data >Publishing" rather than "Data
    Publishing".
    Leonardo,
    Thanks for the prompt response. I have a few (2 cents worth of) thoughts
    on what you said and I expect that some of the main folks involved in the
    WG will too, such as on the idea that how publishing is implemented (say to
    the Web) is now part of the definition. I wonder of types of
    publishing/release such as to the web or a journal are distinguished as
    publishing sub-types. They appear in workflows for examples and may be
    distinguished there. So perhaps as work terms for the workflow are
    discussed there will be suitable definitions to make things clear.
    >A key decision to be taken is whether to "characterise" the term data or
    not and then use this decision accordingly.
    >If the aim is to "deal" with any "data", then it seems OK to try to define
    "data publishing". If this is the case the definition of "data publishing"
    should contain the term "data" referring to its definition (rather >than
    "... making research data and other research objects ...").
    >However, since the project name suggests (RDA stands for "Research Data"),
    I would like to use >this as "main concept". Thus the term you are aiming
    to define should be "Research Data >Publishing" rather than "Data
    Publishing".
    This is a point that would potentially apply across all the RDA groups. I
    agree that research data is the main type of data that RDA might focus on,
    but wonder if this distinction is relevant. Much data is generated by
    research activities while other "data" may come from non-research sources
    and then applied to a research question. It then becomes research data in
    Leonardo,
    Thanks for the prompt response. I have a few (2 cents worth of) thoughts
    on what you said and I expect that some of the main folks involved in the
    WG will too, such as on the idea that how publishing is implemented (say to
    the Web) is now part of the definition. I wonder of types of
    publishing/release such as to the web or a journal are distinguished as
    publishing sub-types. They appear in workflows for examples and may be
    distinguished there. So perhaps as work terms for the workflow are
    discussed there will be suitable definitions to make things clear.
    >A key decision to be taken is whether to "characterise" the term data or
    not and then use this decision accordingly.
    >If the aim is to "deal" with any "data", then it seems OK to try to define
    "data publishing". If this is the case the definition of "data publishing"
    should contain the term "data" referring to its definition (rather >than
    "... making research data and other research objects ...").
    >However, since the project name suggests (RDA stands for "Research Data"),
    I would like to use >this as "main concept". Thus the term you are aiming
    to define should be "Research Data >Publishing" rather than "Data
    Publishing".
    This is a point that would potentially apply across all the RDA groups. I
    agree that research data is the main type of data that RDA might focus on,
    but wonder if this distinction is relevant. Much data is generated by
    research activities while other "data" may come from non-research sources
    and then applied to a research question. It then becomes research data in
    a similar way that something might become metadata although not consciously
    generated for that role. This does fit your definition below since it
    becomes evidence for a research activity etc.
    >In a paper my colleagues and I are working on we are using something like:
    - *Research Data*: entities used as evidence of phenomena for the
    purpose of research or scholarship;
    - This definition is borrowed from Borgman, C. L., 2015. Big Data,
    Little Data, No Data: Scholarship in the Networked World. The MIT Press.
    Leonardo,
    Thanks for the prompt response. I have a few (2 cents worth of) thoughts
    on what you said and I expect that some of the main folks involved in the
    WG will too, such as on the idea that how publishing is implemented (say to
    the Web) is now part of the definition. I wonder of types of
    publishing/release such as to the web or a journal are distinguished as
    publishing sub-types. They appear in workflows for examples and may be
    distinguished there. So perhaps as work terms for the workflow are
    discussed there will be suitable definitions to make things clear.
    >A key decision to be taken is whether to "characterise" the term data or
    not and then use this decision accordingly.
    >If the aim is to "deal" with any "data", then it seems OK to try to define
    "data publishing". If this is the case the definition of "data publishing"
    should contain the term "data" referring to its definition (rather >than
    "... making research data and other research objects ...").
    >However, since the project name suggests (RDA stands for "Research Data"),
    I would like to use >this as "main concept". Thus the term you are aiming
    to define should be "Research Data >Publishing" rather than "Data
    Publishing".
    This is a point that would potentially apply across all the RDA groups. I
    agree that research data is the main type of data that RDA might focus on,
    but wonder if this distinction is relevant. Much data is generated by
    research activities while other "data" may come from non-research sources
    and then applied to a research question. It then becomes research data in
    a similar way that something might become metadata although not consciously
    generated for that role. This does fit your definition below since it
    becomes evidence for a research activity etc.
    Leonardo,
    Thanks for the prompt response. I have a few (2 cents worth of) thoughts
    on what you said and I expect that some of the main folks involved in the
    WG will too, such as on the idea that how publishing is implemented (say to
    the Web) is now part of the definition. I wonder of types of
    publishing/release such as to the web or a journal are distinguished as
    publishing sub-types. They appear in workflows for examples and may be
    distinguished there. So perhaps as work terms for the workflow are
    discussed there will be suitable definitions to make things clear.
    >A key decision to be taken is whether to "characterise" the term data or
    not and then use this decision accordingly.
    >If the aim is to "deal" with any "data", then it seems OK to try to define
    "data publishing". If this is the case the definition of "data publishing"
    should contain the term "data" referring to its definition (rather >than
    "... making research data and other research objects ...").
    >However, since the project name suggests (RDA stands for "Research Data"),
    I would like to use >this as "main concept". Thus the term you are aiming
    to define should be "Research Data >Publishing" rather than "Data
    Publishing".
    This is a point that would potentially apply across all the RDA groups. I
    agree that research data is the main type of data that RDA might focus on,
    but wonder if this distinction is relevant. Much data is generated by
    research activities while other "data" may come from non-research sources
    and then applied to a research question. It then becomes research data in
    a similar way that something might become metadata although not consciously
    generated for that role. This does fit your definition below since it
    becomes evidence for a research activity etc.
    >In a paper my colleagues and I are working on we are using something like:
    - *Research Data*: entities used as evidence of phenomena for the
    purpose of research or scholarship;
    - This definition is borrowed from Borgman, C. L., 2015. Big Data,
    Little Data, No Data: Scholarship in the Networked World. The MIT Press.
    - *Research Data Publishing*: the release of *research data* for (re)use
    by others.
    - The key aspects are
    - "release", i.e. "making available to the public";
    - "(re)use", i.e. the motivation of publishing is to make it
    possible for others to use the "product";
    - The how publishing is expected to be implement should not be part
    of the definition. Publishing can happen by a "*data paper*", by
    the *deposition
    in a repository*, etc.
    >It is also useful to use the concept of *dataset* here to introduce the
    concept of "unit of information". >Thus, a "dataset" is a unit of "research
    data" (no matter how many files and entities) subject of a >"research data
    publishing" act.
    The Data Foundations and Terminology WG (and now the IG) touched on this
    issue of Collections and Aggregations but we didn't have enough converging
    input from members to arrive as agreed upon and useful definitions. Some
    members talk in terms of Digital Objects as the Unit of Info (or as the
    carrier of Info, which I prefer). But distinctions between datasets and
    data collections is not settled although I lean towards a dataset being a
    unit from which data collections are assembled for particular collection
    purposes.
    Gary Berg-Cross, Ph.D.
    ***@***.***
    http://ontolog.cim3.net/cgi-bin/wiki.pl?GaryBergCross
    SOCoP Executive Secretary
    Independent Consultant
    Potomac, MD
    240-426-0770
    On Thu, Jul 2, 2015 at 1:04 PM, leonardo.candela <
    ***@***.***> wrote:
    > A key decision to be taken is whether to "characterise" the term data or
    > not and then use this decision accordingly.
    >
    > If the aim is to "deal" with any "data", then it seems OK to try to define
    > "data publishing". If this is the case the definition of "data publishing"
    > should contain the term "data" referring to its definition (rather than
    > "... making research data and other research objects ...").
    >
    > However, since the project name suggests (RDA stands for "Research Data"),
    > I would like to use this as "main concept". Thus the term you are aiming to
    > define should be "Research Data Publishing" rather than "Data Publishing".
    >
    > In a paper my colleagues and I are working on we are using something like:
    >
    > - *Research Data*: entities used as evidence of phenomena for the
    > purpose of research or scholarship;
    > - This definition is borrowed from Borgman, C. L., 2015. Big Data,
    > Little Data, No Data: Scholarship in the Networked World. The MIT Press.
    > - *Research Data Publishing*: the release of *research data* for
    > (re)use by others.
    > - The key aspects are
    > - "release", i.e. "making available to the public";
    > - "(re)use", i.e. the motivation of publishing is to make it
    > possible for others to use the "product";
    > - The how publishing is expected to be implement should not be part
    > of the definition. Publishing can happen by a "*data paper*", by
    Leonardo,
    Thanks for the prompt response. I have a few (2 cents worth of) thoughts
    on what you said and I expect that some of the main folks involved in the
    WG will too, such as on the idea that how publishing is implemented (say to
    the Web) is now part of the definition. I wonder of types of
    publishing/release such as to the web or a journal are distinguished as
    publishing sub-types. They appear in workflows for examples and may be
    distinguished there. So perhaps as work terms for the workflow are
    discussed there will be suitable definitions to make things clear.
    >A key decision to be taken is whether to "characterise" the term data or
    not and then use this decision accordingly.
    >If the aim is to "deal" with any "data", then it seems OK to try to define
    "data publishing". If this is the case the definition of "data publishing"
    should contain the term "data" referring to its definition (rather >than
    "... making research data and other research objects ...").
    >However, since the project name suggests (RDA stands for "Research Data"),
    I would like to use >this as "main concept". Thus the term you are aiming
    to define should be "Research Data >Publishing" rather than "Data
    Publishing".
    This is a point that would potentially apply across all the RDA groups. I
    agree that research data is the main type of data that RDA might focus on,
    but wonder if this distinction is relevant. Much data is generated by
    research activities while other "data" may come from non-research sources
    and then applied to a research question. It then becomes research data in
    a similar way that something might become metadata although not consciously
    generated for that role. This does fit your definition below since it
    becomes evidence for a research activity etc.
    >In a paper my colleagues and I are working on we are using something like:
    - *Research Data*: entities used as evidence of phenomena for the
    purpose of research or scholarship;
    - This definition is borrowed from Borgman, C. L., 2015. Big Data,
    Little Data, No Data: Scholarship in the Networked World. The MIT Press.
    - *Research Data Publishing*: the release of *research data* for (re)use
    by others.
    - The key aspects are
    - "release", i.e. "making available to the public";
    - "(re)use", i.e. the motivation of publishing is to make it
    possible for others to use the "product";
    - The how publishing is expected to be implement should not be part
    of the definition. Publishing can happen by a "*data paper*", by
    the *deposition
    in a repository*, etc.
    >It is also useful to use the concept of *dataset* here to introduce the
    concept of "unit of information". >Thus, a "dataset" is a unit of "research
    data" (no matter how many files and entities) subject of a >"research data
    publishing" act.
    The Data Foundations and Terminology WG (and now the IG) touched on this
    issue of Collections and Aggregations but we didn't have enough converging
    input from members to arrive as agreed upon and useful definitions. Some
    members talk in terms of Digital Objects as the Unit of Info (or as the
    carrier of Info, which I prefer). But distinctions between datasets and
    data collections is not settled although I lean towards a dataset being a
    unit from which data collections are assembled for particular collection
    purposes.
    Gary Berg-Cross, Ph.D.
    ***@***.***
    http://ontolog.cim3.net/cgi-bin/wiki.pl?GaryBergCross
    SOCoP Executive Secretary
    Independent Consultant
    Potomac, MD
    240-426-0770
    On Thu, Jul 2, 2015 at 1:04 PM, leonardo.candela <
    ***@***.***> wrote:

submit a comment