Google spreadsheet for your contributions

19 Apr 2019
Groups audience: 

Dear members of the RDA FAIR Data Maturity Model Working Group,
As announced earlier this week, we have created a Google spreadsheet [1] to
support the further work of the Working Group in developing the core
assessment criteria. This spreadsheet includes four worksheets:
1. Introduction, with an overview of context, objective and approach
2. Landscaping exercise, with the results of the initial analysis of
existing approaches
3. Development, with sections for each of the principles where you can enter
suggestions for indicators and maturity levels per principle
4. Outstanding issues, where any problems encountered, or suggestions,
questions, etc. can be submitted
The important sheet is 3. Development where you are kindly invited to make
contributions by proposing indicators and possible maturity levels per
indicator. We have already included one proposed indicator for each
principle, and all three indicators for principle R1.1 as presented in the
meeting in Philadelphia. For each principle, we included a link to the
definition and explanation on the GO-FAIR site, and a link to the set of
questions for that principle that we found in existing approaches in the
landscaping exercise. Don't worry about duplicating or contradicting
indicators or maturity levels proposed by others - the editor team will use
all of your contributions to make a proposal for a consolidated set of
indicators and maturity levels to be discussed at the next online meeting.
We would like to ask that you make your contributions by the 31st of May
2019 so that the editorial team has time to analyse the proposals and to
prepare the discussion at the online meeting on 18 June 2019 (07:00-08:30
UTC and 15:00-16:30 UTC).
Many thanks,
Makx Dekkers and the editorial team
[1]
https://docs.google.com/spreadsheets/d/1gvMfbw46oV1idztsr586aG6-teSn2cPW...
ZG0U4Hg/edit#gid=0

  • Ge Peng's picture

    Author: Ge Peng

    Date: 27 May, 2019

    Dear Makx,
    I have entered the proposed indicators and their maturity levels for F1
    (PI/PIL_11), F2/R1 (PI/PIL_20), A1 (PI/PIL_46), and I1 (PI/PIL_85) (also
    see the details below). I’ll be happy to work with the editorial team and
    the WG members to further improve those indicators and their maturity
    levels if needed.
    With respect to describing the maturity of the FAIR Data Maturity
    Assessment, one potentially way to do so is to use the following maturity
    assessment categories from Peng et al. (2018), Data Science Journal:
    Category Number
    Description
    Category 1
    No assessment done.
    Category 2
    Self-assessment—preliminary evaluation carried out by an individual for
    internal or personal use; abiding to non-disclosure agreement.
    Category 3
    Internal assessment—complete evaluation carried out by an individual
    non-certified entity (person, group, or institution) and reviewed
    internally with the assessment results (ratings and justifications)
    publicly available for transparency.
    Category 4
    Independent assessment—Level 3 + reviewed by an independent entity, that
    has expertise in the maturity model utilized for the evaluation.
    Category 5
    Certified assessment—Level 4 + reviewed and certified by an established
    authoritative entity. Maturity update frequency is defined and implemented.
    Hope it helps. Please feel free to let me know if I need to modify the way
    I input my entries or there is anything else I can do to help.
    Looking forward to seeing the outcomes of this team effort.
    Best regards,
    Ge Peng (Peng)
    -----------------
    Maturity levels for
    F1. (meta)data are assigned a globally unique and eternally persistent
    identifier.
    Proposed Indicator: PI_11: The state of meta(data) assigned a globally
    unique and eternally persistent identifier.
    Maturity Levels: PIL_11:
    Level 1: No unique identifiers assigned for dataset-level metadata record
    and dataset, or information unknown;
    Level 2: Internal unique identifiers assigned for dataset-level metadata
    record and dataset;
    Level 3: Dataset assigned a globally unique, persistent identifier but not
    resolvable (e.g., UUID);
    Level 4: Dataset assigned a globally unique, persistent, and resolvable
    identifier (e.g., DOI);
    Level 5: Level 4 + capturing dataset versioning
    Maturity levels for:
    F2. Data are described with rich metadata (defined by R1);
    R1. Meta(data) are richly described with a plurality of accurate and
    relevant attributes
    Proposed Indicator: PI_20: The state of metadata
    Maturity Levels: PIL_20
    Level 1: Dataset-level metadata not publicly available, discoverable,
    and/or integrable;
    Level 2: Dataset-level metadata discoverable with a landing page displaying
    basic characteristics of dataset and information on data accessibility,
    conforming to domain-specific metadata standards and integrable;
    Level 3: Dataset-level metadata discoverable with a resolvable dataset DOI
    landing page displaying complete characteristics of the dataset, capturing
    or linking to descriptive data product information including data
    collection and processing steps, error sources and uncertainty information,
    conforming to national metadata standards;
    Level 4: Level 3 + Provenance and quality descriptive information,
    conforming to international metadata standards; Software package available
    and linked for transparency;
    Level 5: Level 3 + standard-based and interoperable provenance and quality
    descriptive information; Version-controlled software package publicly
    available and linked for traceability (e.g., at a GitHub) plus complete
    run-time system information for reproducibility.
    (Metadata entities for capturing basic and complete characteristics of
    datasets will likely be domain-specific and potentially defined by
    individual disciplines until a consensus can be reached universally -
    across global and disciplines. Example of international metadata standards
    on geographic information are ISO 19115-* and ISO 19157-*. An example of
    provenance standard is W3C PROV. The current definitions do not address the
    file-level metadata which may need to be included at Level 3 or higher
    maturity levels.)
    Maturity levels for:
    A1: Meta(data) are retrievable using by their identifier using a
    standardised communications protocol
    Proposed Indicator: PI_46: The state of data and relevant information being
    retrievable
    Maturity Levels: PIL_46
    Level 1: Person to person or via a private URL link (e.g., email, portable
    drive, private ftp site); not publicly available; not searchable;
    Level 2: Data publicly available and searchable at the dataset level using
    basic domain-specific facets; Basic online services available for data
    access in its original format/file(s) (e.g., FTP/HTTP(S) direct file
    download);
    Level 3: Extensive data services conforming to domain standards available
    for data access; conforming to community search and discovery metadata
    convention standards; capable of providing other domain-specified output
    data format options;
    Level 4: Level 3 + visualization or subsetting and aggregation capability
    available; data descriptive information (e.g., data collection and/or
    processing steps and error sources) including software package available
    and accessible;
    Level 5: Level 4 + standard-based provenance and quality descriptive
    information available, accessible, and interoperable.
    Maturity levels for:
    I1. (meta)data use a formal, accessible, shared, and broadly applicable
    language for knowledge representation
    Proposed Indicator: PI_85: The state of data being portable
    Maturity Levels: PIL_85
    Level 1: Not machine readable;
    Level 2: Domain-specific or proprietary machine readable file format;
    Level 3: Standard-based, non-proprietary machine readable file format;
    Level 4: Level 3 + machine independent, self-describing, and interoperable
    file format;
    Level 5: Level 4 + analysis ready

  • Makx Dekkers's picture

    Author: Makx Dekkers

    Date: 29 May, 2019

    Dear Ge Peng,
    Many thanks for you contributions. We will take them into account in the analysis of contributions that we have started working on.
    In our preparations for the next meeting, the editorial team is looking first at the proposed maturity levels for the individual indicators. We are going to propose a consolidated set of those in step 3 in the collaborative document.
    If I understand correctly, the maturity levels you propose, e.g. in PIL11, PIL20 and PIL85, combine some of the more ‘atomic’ indicators. These two approaches were discussed at the last meeting (slides 19 and 20 in the slide deck at https://www.rd-alliance.org/system/files/documents/20190403_FAIR_WG_slid...).
    We start working with the approach to look at the individual indicators first (slide 20) and then see how we can derive a set of levels across indicators for a principle (slide 19). We will try to represent the two approaches in the result of our analysis to be presented in the next call on 18 June to see which one the WG prefers.
    Any further comments and contributions are most welcome!
    Kind regards,
    Makx Dekkers
    Editorial team
    From: Ge Peng - NOAA Affiliate <***@***.***>
    Sent: 27 May 2019 14:37
    To: makxdekkers <***@***.***>
    Cc: FAIR Data Maturity Model WG <***@***.***-groups.org>; Ge Peng <***@***.***>; Yarmey, Lynn Rees <***@***.***>; Shelley Stall <***@***.***>
    Subject: Re: [fair_maturity] Google spreadsheet for your contributions
    Dear Makx,
    I have entered the proposed indicators and their maturity levels for F1 (PI/PIL_11), F2/R1 (PI/PIL_20), A1 (PI/PIL_46), and I1 (PI/PIL_85) (also see the details below). I’ll be happy to work with the editorial team and the WG members to further improve those indicators and their maturity levels if needed.
    With respect to describing the maturity of the FAIR Data Maturity Assessment, one potentially way to do so is to use the following maturity assessment categories from Peng et al. (2018), Data Science Journal:
    Category Number
    Description
    Category 1
    No assessment done.
    Category 2
    Self-assessment—preliminary evaluation carried out by an individual for internal or personal use; abiding to non-disclosure agreement.
    Category 3
    Internal assessment—complete evaluation carried out by an individual non-certified entity (person, group, or institution) and reviewed internally with the assessment results (ratings and justifications) publicly available for transparency.
    Category 4
    Independent assessment—Level 3 + reviewed by an independent entity, that has expertise in the maturity model utilized for the evaluation.
    Category 5
    Certified assessment—Level 4 + reviewed and certified by an established authoritative entity. Maturity update frequency is defined and implemented.
    Hope it helps. Please feel free to let me know if I need to modify the way I input my entries or there is anything else I can do to help.
    Looking forward to seeing the outcomes of this team effort.
    Best regards,
    Ge Peng (Peng)
    -----------------
    Maturity levels for
    F1. (meta)data are assigned a globally unique and eternally persistent identifier.
    Proposed Indicator: PI_11: The state of meta(data) assigned a globally unique and eternally persistent identifier.
    Maturity Levels: PIL_11:
    Level 1: No unique identifiers assigned for dataset-level metadata record and dataset, or information unknown;
    Level 2: Internal unique identifiers assigned for dataset-level metadata record and dataset;
    Level 3: Dataset assigned a globally unique, persistent identifier but not resolvable (e.g., UUID);
    Level 4: Dataset assigned a globally unique, persistent, and resolvable identifier (e.g., DOI);
    Level 5: Level 4 + capturing dataset versioning
    Maturity levels for:
    F2. Data are described with rich metadata (defined by R1);
    R1. Meta(data) are richly described with a plurality of accurate and relevant attributes
    Proposed Indicator: PI_20: The state of metadata
    Maturity Levels: PIL_20
    Level 1: Dataset-level metadata not publicly available, discoverable, and/or integrable;
    Level 2: Dataset-level metadata discoverable with a landing page displaying basic characteristics of dataset and information on data accessibility, conforming to domain-specific metadata standards and integrable;
    Level 3: Dataset-level metadata discoverable with a resolvable dataset DOI landing page displaying complete characteristics of the dataset, capturing or linking to descriptive data product information including data collection and processing steps, error sources and uncertainty information, conforming to national metadata standards;
    Level 4: Level 3 + Provenance and quality descriptive information, conforming to international metadata standards; Software package available and linked for transparency;
    Level 5: Level 3 + standard-based and interoperable provenance and quality descriptive information; Version-controlled software package publicly available and linked for traceability (e.g., at a GitHub) plus complete run-time system information for reproducibility.
    (Metadata entities for capturing basic and complete characteristics of datasets will likely be domain-specific and potentially defined by individual disciplines until a consensus can be reached universally - across global and disciplines. Example of international metadata standards on geographic information are ISO 19115-* and ISO 19157-*. An example of provenance standard is W3C PROV. The current definitions do not address the file-level metadata which may need to be included at Level 3 or higher maturity levels.)
    Maturity levels for:
    A1: Meta(data) are retrievable using by their identifier using a standardised communications protocol
    Proposed Indicator: PI_46: The state of data and relevant information being retrievable
    Maturity Levels: PIL_46
    Level 1: Person to person or via a private URL link (e.g., email, portable drive, private ftp site); not publicly available; not searchable;
    Level 2: Data publicly available and searchable at the dataset level using basic domain-specific facets; Basic online services available for data access in its original format/file(s) (e.g., FTP/HTTP(S) direct file download);
    Level 3: Extensive data services conforming to domain standards available for data access; conforming to community search and discovery metadata convention standards; capable of providing other domain-specified output data format options;
    Level 4: Level 3 + visualization or subsetting and aggregation capability available; data descriptive information (e.g., data collection and/or processing steps and error sources) including software package available and accessible;
    Level 5: Level 4 + standard-based provenance and quality descriptive information available, accessible, and interoperable.
    Maturity levels for:
    I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation
    Proposed Indicator: PI_85: The state of data being portable
    Maturity Levels: PIL_85
    Level 1: Not machine readable;
    Level 2: Domain-specific or proprietary machine readable file format;
    Level 3: Standard-based, non-proprietary machine readable file format;
    Level 4: Level 3 + machine independent, self-describing, and interoperable file format;
    Level 5: Level 4 + analysis ready
    On Fri, Apr 19, 2019 at 4:58 PM makxdekkers <***@***.*** > wrote:
    Dear members of the RDA FAIR Data Maturity Model Working Group,
    As announced earlier this week, we have created a Google spreadsheet [1] to support the further work of the Working Group in developing the core assessment criteria. This spreadsheet includes four worksheets:
    1. Introduction, with an overview of context, objective and approach
    2. Landscaping exercise, with the results of the initial analysis of existing approaches
    3. Development, with sections for each of the principles where you can enter suggestions for indicators and maturity levels per principle
    4. Outstanding issues, where any problems encountered, or suggestions, questions, etc. can be submitted
    The important sheet is 3. Development where you are kindly invited to make contributions by proposing indicators and possible maturity levels per indicator. We have already included one proposed indicator for each principle, and all three indicators for principle R1.1 as presented in the meeting in Philadelphia. For each principle, we included a link to the definition and explanation on the GO-FAIR site, and a link to the set of questions for that principle that we found in existing approaches in the landscaping exercise. Don’t worry about duplicating or contradicting indicators or maturity levels proposed by others – the editor team will use all of your contributions to make a proposal for a consolidated set of indicators and maturity levels to be discussed at the next online meeting.
    We would like to ask that you make your contributions by the 31st of May 2019 so that the editorial team has time to analyse the proposals and to prepare the discussion at the online meeting on 18 June 2019 (07:00-08:30 UTC and 15:00-16:30 UTC).
    Many thanks,
    Makx Dekkers and the editorial team
    [1] https://docs.google.com/spreadsheets/d/1gvMfbw46oV1idztsr586aG6-teSn2cPW...
    --
    Full post: https://www.rd-alliance.org/group/fair-data-maturity-model-wg/post/googl...
    Manage my subscriptions: https://www.rd-alliance.org/mailinglist
    Stop emails for this post: https://www.rd-alliance.org/mailinglist/unsubscribe/62892
    --
    Ge Peng, PhD
    Research Scholar
    Cooperative Institute for Climate and Satellites - NC (CICS-NC)/NCSU at
    NOAA’s National Centers for Environmental Information (NCEI)
    Center for Weather and Climate (CWC)
    151 Patton Ave, Asheville, NC 28801
    +1 828 257 3009; ***@***.***
    ORCID: http://orcid.org/0000-0002-1986-9115
    Following CICS-NC on Facebook

  • Ge Peng's picture

    Author: Ge Peng

    Date: 29 May, 2019

    Dear Makx,
    Thank you for your response. Yes, my proposed indicators and therefore the
    maturity levels can be further modified to be more granular. For examples,
    PI/PIL_11 can be modified to evaluate metadata and data separately, while
    PI/PIL_46 can be modified to evaluate data and associated information
    separately; and the associated information can be further separated into
    software package, provenance, quality descriptive information, etc.
    I am, however, concerned that we could potentially end up with too many
    individual indicators for practical reason. On the other hand, it is
    necessary to have sufficient indicators to cover all aspects to ensure the
    compliance to the FAIR data principles.
    Many of those individual indicators may be related to each other to some
    degrees. Striking a balance between sufficient but *not too many*
    indicators will likely be challenging but it is something we may have to
    do.
    Having said all that, I am fine with both approaches and willing to work
    with the WG and the editorial team towards finalizing the maturity levels
    for selected indicators if needed.
    Best regards,
    --- Peng

submit a comment