Google spreadsheet for your contributions

Dear members of the RDA FAIR Data Maturity Model Working Group,
As announced earlier this week, we have created a Google spreadsheet [1] to
support the further work of the Working Group in developing the core
assessment criteria. This spreadsheet includes four worksheets:
1. Introduction, with an overview of context, objective and approach
2. Landscaping exercise, with the results of the initial analysis of
existing approaches
3. Development, with sections for each of the principles where you can enter
suggestions for indicators and maturity levels per principle
4. Outstanding issues, where any problems encountered, or suggestions,
questions, etc. can be submitted
The important sheet is 3. Development where you are kindly invited to make
contributions by proposing indicators and possible maturity levels per
indicator. We have already included one proposed indicator for each
principle, and all three indicators for principle R1.1 as presented in the
meeting in Philadelphia. For each principle, we included a link to the
definition and explanation on the GO-FAIR site, and a link to the set of
questions for that principle that we found in existing approaches in the
landscaping exercise. Don't worry about duplicating or contradicting
indicators or maturity levels proposed by others - the editor team will use
all of your contributions to make a proposal for a consolidated set of
indicators and maturity levels to be discussed at the next online meeting.
We would like to ask that you make your contributions by the 31st of May
2019 so that the editorial team has time to analyse the proposals and to
prepare the discussion at the online meeting on 18 June 2019 (07:00-08:30
UTC and 15:00-16:30 UTC).
Many thanks,
Makx Dekkers and the editorial team
[1]
https://docs.google.com/spreadsheets/d/1gvMfbw46oV1idztsr586aG6-teSn2cPW...
ZG0U4Hg/edit#gid=0

Log in to post comments
2415 reads

Author: Ge Peng

Date: 27 May, 2019

Dear Makx,
I have entered the proposed indicators and their maturity levels for F1
(PI/PIL_11), F2/R1 (PI/PIL_20), A1 (PI/PIL_46), and I1 (PI/PIL_85) (also
see the details below). I’ll be happy to work with the editorial team and
the WG members to further improve those indicators and their maturity
levels if needed.
With respect to describing the maturity of the FAIR Data Maturity
Assessment, one potentially way to do so is to use the following maturity
assessment categories from Peng et al. (2018), Data Science Journal:
Category Number
Description
Category 1
No assessment done.
Category 2
Self-assessment—preliminary evaluation carried out by an individual for
internal or personal use; abiding to non-disclosure agreement.
Category 3
Internal assessment—complete evaluation carried out by an individual
non-certified entity (person, group, or institution) and reviewed
internally with the assessment results (ratings and justifications)
publicly available for transparency.
Category 4
Independent assessment—Level 3 + reviewed by an independent entity, that
has expertise in the maturity model utilized for the evaluation.
Category 5
Certified assessment—Level 4 + reviewed and certified by an established
authoritative entity. Maturity update frequency is defined and implemented.
Hope it helps. Please feel free to let me know if I need to modify the way
I input my entries or there is anything else I can do to help.
Looking forward to seeing the outcomes of this team effort.
Best regards,
Ge Peng (Peng)
-----------------
Maturity levels for
F1. (meta)data are assigned a globally unique and eternally persistent
identifier.
Proposed Indicator: PI_11: The state of meta(data) assigned a globally
unique and eternally persistent identifier.
Maturity Levels: PIL_11:
Level 1: No unique identifiers assigned for dataset-level metadata record
and dataset, or information unknown;
Level 2: Internal unique identifiers assigned for dataset-level metadata
record and dataset;
Level 3: Dataset assigned a globally unique, persistent identifier but not
resolvable (e.g., UUID);
Level 4: Dataset assigned a globally unique, persistent, and resolvable
identifier (e.g., DOI);
Level 5: Level 4 + capturing dataset versioning
Maturity levels for:
F2. Data are described with rich metadata (defined by R1);
R1. Meta(data) are richly described with a plurality of accurate and
relevant attributes
Proposed Indicator: PI_20: The state of metadata
Maturity Levels: PIL_20
Level 1: Dataset-level metadata not publicly available, discoverable,
and/or integrable;
Level 2: Dataset-level metadata discoverable with a landing page displaying
basic characteristics of dataset and information on data accessibility,
conforming to domain-specific metadata standards and integrable;
Level 3: Dataset-level metadata discoverable with a resolvable dataset DOI
landing page displaying complete characteristics of the dataset, capturing
or linking to descriptive data product information including data
collection and processing steps, error sources and uncertainty information,
conforming to national metadata standards;
Level 4: Level 3 + Provenance and quality descriptive information,
conforming to international metadata standards; Software package available
and linked for transparency;
Level 5: Level 3 + standard-based and interoperable provenance and quality
descriptive information; Version-controlled software package publicly
available and linked for traceability (e.g., at a GitHub) plus complete
run-time system information for reproducibility.
(Metadata entities for capturing basic and complete characteristics of
datasets will likely be domain-specific and potentially defined by
individual disciplines until a consensus can be reached universally -
across global and disciplines. Example of international metadata standards
on geographic information are ISO 19115-* and ISO 19157-*. An example of
provenance standard is W3C PROV. The current definitions do not address the
file-level metadata which may need to be included at Level 3 or higher
maturity levels.)
Maturity levels for:
A1: Meta(data) are retrievable using by their identifier using a
standardised communications protocol
Proposed Indicator: PI_46: The state of data and relevant information being
retrievable
Maturity Levels: PIL_46
Level 1: Person to person or via a private URL link (e.g., email, portable
drive, private ftp site); not publicly available; not searchable;
Level 2: Data publicly available and searchable at the dataset level using
basic domain-specific facets; Basic online services available for data
access in its original format/file(s) (e.g., FTP/HTTP(S) direct file
download);
Level 3: Extensive data services conforming to domain standards available
for data access; conforming to community search and discovery metadata
convention standards; capable of providing other domain-specified output
data format options;
Level 4: Level 3 + visualization or subsetting and aggregation capability
available; data descriptive information (e.g., data collection and/or
processing steps and error sources) including software package available
and accessible;
Level 5: Level 4 + standard-based provenance and quality descriptive
information available, accessible, and interoperable.
Maturity levels for:
I1. (meta)data use a formal, accessible, shared, and broadly applicable
language for knowledge representation
Proposed Indicator: PI_85: The state of data being portable
Maturity Levels: PIL_85
Level 1: Not machine readable;
Level 2: Domain-specific or proprietary machine readable file format;
Level 3: Standard-based, non-proprietary machine readable file format;
Level 4: Level 3 + machine independent, self-describing, and interoperable
file format;
Level 5: Level 4 + analysis ready

Author: Makx Dekkers

Date: 29 May, 2019

Dear Ge Peng,
Many thanks for you contributions. We will take them into account in the analysis of contributions that we have started working on.
In our preparations for the next meeting, the editorial team is looking first at the proposed maturity levels for the individual indicators. We are going to propose a consolidated set of those in step 3 in the collaborative document.
If I understand correctly, the maturity levels you propose, e.g. in PIL11, PIL20 and PIL85, combine some of the more ‘atomic’ indicators. These two approaches were discussed at the last meeting (slides 19 and 20 in the slide deck at https://www.rd-alliance.org/system/files/documents/20190403_FAIR_WG_slid...).
We start working with the approach to look at the individual indicators first (slide 20) and then see how we can derive a set of levels across indicators for a principle (slide 19). We will try to represent the two approaches in the result of our analysis to be presented in the next call on 18 June to see which one the WG prefers.
Any further comments and contributions are most welcome!
Kind regards,
Makx Dekkers
Editorial team
From: Ge Peng - NOAA Affiliate <***@***.***>
Sent: 27 May 2019 14:37
To: makxdekkers <***@***.***>
Cc: FAIR Data Maturity Model WG <***@***.***-groups.org>; Ge Peng <***@***.***>; Yarmey, Lynn Rees <***@***.***>; Shelley Stall <***@***.***>
Subject: Re: [fair_maturity] Google spreadsheet for your contributions
Dear Makx,
I have entered the proposed indicators and their maturity levels for F1 (PI/PIL_11), F2/R1 (PI/PIL_20), A1 (PI/PIL_46), and I1 (PI/PIL_85) (also see the details below). I’ll be happy to work with the editorial team and the WG members to further improve those indicators and their maturity levels if needed.
With respect to describing the maturity of the FAIR Data Maturity Assessment, one potentially way to do so is to use the following maturity assessment categories from Peng et al. (2018), Data Science Journal:
Category Number
Description
Category 1
No assessment done.
Category 2
Self-assessment—preliminary evaluation carried out by an individual for internal or personal use; abiding to non-disclosure agreement.
Category 3
Internal assessment—complete evaluation carried out by an individual non-certified entity (person, group, or institution) and reviewed internally with the assessment results (ratings and justifications) publicly available for transparency.
Category 4
Independent assessment—Level 3 + reviewed by an independent entity, that has expertise in the maturity model utilized for the evaluation.
Category 5
Certified assessment—Level 4 + reviewed and certified by an established authoritative entity. Maturity update frequency is defined and implemented.
Hope it helps. Please feel free to let me know if I need to modify the way I input my entries or there is anything else I can do to help.
Looking forward to seeing the outcomes of this team effort.
Best regards,
Ge Peng (Peng)
-----------------
Maturity levels for
F1. (meta)data are assigned a globally unique and eternally persistent identifier.
Proposed Indicator: PI_11: The state of meta(data) assigned a globally unique and eternally persistent identifier.
Maturity Levels: PIL_11:
Level 1: No unique identifiers assigned for dataset-level metadata record and dataset, or information unknown;
Level 2: Internal unique identifiers assigned for dataset-level metadata record and dataset;
Level 3: Dataset assigned a globally unique, persistent identifier but not resolvable (e.g., UUID);
Level 4: Dataset assigned a globally unique, persistent, and resolvable identifier (e.g., DOI);
Level 5: Level 4 + capturing dataset versioning
Maturity levels for:
F2. Data are described with rich metadata (defined by R1);
R1. Meta(data) are richly described with a plurality of accurate and relevant attributes
Proposed Indicator: PI_20: The state of metadata
Maturity Levels: PIL_20
Level 1: Dataset-level metadata not publicly available, discoverable, and/or integrable;
Level 2: Dataset-level metadata discoverable with a landing page displaying basic characteristics of dataset and information on data accessibility, conforming to domain-specific metadata standards and integrable;
Level 3: Dataset-level metadata discoverable with a resolvable dataset DOI landing page displaying complete characteristics of the dataset, capturing or linking to descriptive data product information including data collection and processing steps, error sources and uncertainty information, conforming to national metadata standards;
Level 4: Level 3 + Provenance and quality descriptive information, conforming to international metadata standards; Software package available and linked for transparency;
Level 5: Level 3 + standard-based and interoperable provenance and quality descriptive information; Version-controlled software package publicly available and linked for traceability (e.g., at a GitHub) plus complete run-time system information for reproducibility.
(Metadata entities for capturing basic and complete characteristics of datasets will likely be domain-specific and potentially defined by individual disciplines until a consensus can be reached universally - across global and disciplines. Example of international metadata standards on geographic information are ISO 19115-* and ISO 19157-*. An example of provenance standard is W3C PROV. The current definitions do not address the file-level metadata which may need to be included at Level 3 or higher maturity levels.)
Maturity levels for:
A1: Meta(data) are retrievable using by their identifier using a standardised communications protocol
Proposed Indicator: PI_46: The state of data and relevant information being retrievable
Maturity Levels: PIL_46
Level 1: Person to person or via a private URL link (e.g., email, portable drive, private ftp site); not publicly available; not searchable;
Level 2: Data publicly available and searchable at the dataset level using basic domain-specific facets; Basic online services available for data access in its original format/file(s) (e.g., FTP/HTTP(S) direct file download);
Level 3: Extensive data services conforming to domain standards available for data access; conforming to community search and discovery metadata convention standards; capable of providing other domain-specified output data format options;
Level 4: Level 3 + visualization or subsetting and aggregation capability available; data descriptive information (e.g., data collection and/or processing steps and error sources) including software package available and accessible;
Level 5: Level 4 + standard-based provenance and quality descriptive information available, accessible, and interoperable.
Maturity levels for:
I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation
Proposed Indicator: PI_85: The state of data being portable
Maturity Levels: PIL_85
Level 1: Not machine readable;
Level 2: Domain-specific or proprietary machine readable file format;
Level 3: Standard-based, non-proprietary machine readable file format;
Level 4: Level 3 + machine independent, self-describing, and interoperable file format;
Level 5: Level 4 + analysis ready
On Fri, Apr 19, 2019 at 4:58 PM makxdekkers <***@***.*** > wrote:
Dear members of the RDA FAIR Data Maturity Model Working Group,
As announced earlier this week, we have created a Google spreadsheet [1] to support the further work of the Working Group in developing the core assessment criteria. This spreadsheet includes four worksheets:
1. Introduction, with an overview of context, objective and approach
2. Landscaping exercise, with the results of the initial analysis of existing approaches
3. Development, with sections for each of the principles where you can enter suggestions for indicators and maturity levels per principle
4. Outstanding issues, where any problems encountered, or suggestions, questions, etc. can be submitted
The important sheet is 3. Development where you are kindly invited to make contributions by proposing indicators and possible maturity levels per indicator. We have already included one proposed indicator for each principle, and all three indicators for principle R1.1 as presented in the meeting in Philadelphia. For each principle, we included a link to the definition and explanation on the GO-FAIR site, and a link to the set of questions for that principle that we found in existing approaches in the landscaping exercise. Don’t worry about duplicating or contradicting indicators or maturity levels proposed by others – the editor team will use all of your contributions to make a proposal for a consolidated set of indicators and maturity levels to be discussed at the next online meeting.
We would like to ask that you make your contributions by the 31st of May 2019 so that the editorial team has time to analyse the proposals and to prepare the discussion at the online meeting on 18 June 2019 (07:00-08:30 UTC and 15:00-16:30 UTC).
Many thanks,
Makx Dekkers and the editorial team
[1] https://docs.google.com/spreadsheets/d/1gvMfbw46oV1idztsr586aG6-teSn2cPW...
--
Full post: https://www.rd-alliance.org/group/fair-data-maturity-model-wg/post/googl...
Manage my subscriptions: https://www.rd-alliance.org/mailinglist
Stop emails for this post: https://www.rd-alliance.org/mailinglist/unsubscribe/62892
--
Ge Peng, PhD
Research Scholar
Cooperative Institute for Climate and Satellites - NC (CICS-NC)/NCSU at
NOAA’s National Centers for Environmental Information (NCEI)
Center for Weather and Climate (CWC)
151 Patton Ave, Asheville, NC 28801
+1 828 257 3009; ***@***.***
ORCID: http://orcid.org/0000-0002-1986-9115
Following CICS-NC on Facebook

Author: Ge Peng

Date: 29 May, 2019

Dear Makx,
Thank you for your response. Yes, my proposed indicators and therefore the
maturity levels can be further modified to be more granular. For examples,
PI/PIL_11 can be modified to evaluate metadata and data separately, while
PI/PIL_46 can be modified to evaluate data and associated information
separately; and the associated information can be further separated into
software package, provenance, quality descriptive information, etc.
I am, however, concerned that we could potentially end up with too many
individual indicators for practical reason. On the other hand, it is
necessary to have sufficient indicators to cover all aspects to ensure the
compliance to the FAIR data principles.
Many of those individual indicators may be related to each other to some
degrees. Striking a balance between sufficient but *not too many*
indicators will likely be challenging but it is something we may have to
do.
Having said all that, I am fine with both approaches and willing to work
with the WG and the editorial team towards finalizing the maturity levels
for selected indicators if needed.
Best regards,
--- Peng