Dear Data Fabric colleagues,
a group of people engaged in RDA got together and wrote a paper during the last weeks that describes trends in data management, refers to the data principles such as established by the G8 ministers and based on these discusses consequences and components that are seen as important. The components part is widely based on Use Cases descriptions presented to RDA Data Fabric IG and on the authors' expertise. We uploaded this document to the DFIG wiki to open discussions on it. We will also upload all use cases today which we received so far and continue to motivate people to come up with additional Use Case descriptions.
This paper is NOT meant as a FINAL statement, but much more intended to motivate broad discussions about what needs to be done next. We chose to present lists of points without setting priorities knowing that there will be debates about those being mentioned and that there will be gaps. This document will be presented in various meetings with different stakeholders with the aim to get comments. Its embedding in the DFIG wiki will guarantee that the discussion process will be kept within RDA which we find as being important. It might be necessary in a few months to come up with a new snapshot document that summarizes the state of agreement and disagreement. Of course credits must be given to all who take the time to comment and contribute.
You can find the document under this URL where also the discussion should take place:
https://rd-alliance.org/groups/data-fabric-ig/wiki/data-fabric-ig-compon...
In case you want to cite the document I uploaded it to a permanent store and it got a Handle:
http://hdl.handle.net/11304/f638f422-f619-11e4-ac7e-860aa0063d1f
Apologies for this period of silence which was due to meetings and the preparation of a couple of documents.
Best
Peter
- Log in to post comments
- 14870 reads
Author: Keith Jeffery
Date: 10 May, 2015
Congratulations of a large amount of very good work. Comments word change track on. Keith
Author: Larry Lannom
Date: 10 May, 2015
Thanks Keith,
One comment on your comment on the ‘hourglass’ figure. You point out
Thanks Keith,
One comment on your comment on the ‘hourglass’ figure. You point out
> IP addresses need not be unique over time and may not be persistent
perhaps suggesting that the analogy needs to be made more clearly.
My understanding of the ‘narrow neck’ metaphor of IP addresses is that they allow many different kinds of network services to be made available across many different kinds of networks. The analogy for PIDs is that they allow many different kinds of data management services to be made available across many different kinds of data sources.
Larry
Author: Keith Jeffery
Date: 11 May, 2015
Larry -
Thanks for taking the time and expanding on the analogy - of course this interpretation makes sense.
Perhaps such wording could be included to avoid others picking up the discrepancy that I did? I believe it is important because the characteristics/properties of PIDs (i.e. the intrinsic properties associated with the character string) are different from those of IP addresses.
Best
Keith
------------------------------------------------------------------------------------------------------------------
Keith G Jeffery Consultants
Prof Keith G Jeffery
E: ***@***.***
T: +44 7768 446088
S: keithgjeffery
Past President ERCIM www.ercim.eu (***@***.***)
Past President euroCRIS www.eurocris.org
Past Vice President VLDB www.vldb.org
Fellow (CITP, CEng) BCS www.bcs.org
Co-chair RDA MIG https://rd-alliance.org/internal-groups/metadata-ig.html
Co-chair RDA MSDWG https://rd-alliance.org/working-groups/metadata-standards-directory-work...
Co-chair RDA DICIG https://rd-alliance.org/internal-groups/data-context-ig.html
----------------------------------------------------------------------------------------------------------------------------------
The contents of this email are sent in confidence for the use of the
intended recipient only. If you are not one of the intended
recipients do not take action on it or show it to anyone else, but
return this email to the sender and delete your copy of it.
----------------------------------------------------------------------------------------------------------------------------------
-----Original Message-----
From: llannom=***@***.***-groups.org [mailto:***@***.***-groups.org] On Behalf Of llannom
Sent: 10 May 2015 18:56
To: Keith Jeffery; Data Fabric IG
Subject: Re: [rda-datafabric-ig] new paper including components
Thanks Keith,
One comment on your comment on the 'hourglass' figure. You point out
Larry -
Thanks for taking the time and expanding on the analogy - of course this interpretation makes sense.
Perhaps such wording could be included to avoid others picking up the discrepancy that I did? I believe it is important because the characteristics/properties of PIDs (i.e. the intrinsic properties associated with the character string) are different from those of IP addresses.
Best
Keith
------------------------------------------------------------------------------------------------------------------
Keith G Jeffery Consultants
Prof Keith G Jeffery
E: ***@***.***
T: +44 7768 446088
S: keithgjeffery
Past President ERCIM www.ercim.eu (***@***.***)
Past President euroCRIS www.eurocris.org
Past Vice President VLDB www.vldb.org
Fellow (CITP, CEng) BCS www.bcs.org
Co-chair RDA MIG https://rd-alliance.org/internal-groups/metadata-ig.html
Co-chair RDA MSDWG https://rd-alliance.org/working-groups/metadata-standards-directory-work...
Co-chair RDA DICIG https://rd-alliance.org/internal-groups/data-context-ig.html
----------------------------------------------------------------------------------------------------------------------------------
The contents of this email are sent in confidence for the use of the
intended recipient only. If you are not one of the intended
recipients do not take action on it or show it to anyone else, but
return this email to the sender and delete your copy of it.
----------------------------------------------------------------------------------------------------------------------------------
-----Original Message-----
From: llannom=***@***.***-groups.org [mailto:***@***.***-groups.org] On Behalf Of llannom
Sent: 10 May 2015 18:56
To: Keith Jeffery; Data Fabric IG
Subject: Re: [rda-datafabric-ig] new paper including components
Thanks Keith,
One comment on your comment on the 'hourglass' figure. You point out
> IP addresses need not be unique over time and may not be persistent
perhaps suggesting that the analogy needs to be made more clearly.
My understanding of the 'narrow neck' metaphor of IP addresses is that they allow many different kinds of network services to be made available across many different kinds of networks. The analogy for PIDs is that they allow many different kinds of data management services to be made available across many different kinds of data sources.
Larry
Author: Larry Lannom
Date: 11 May, 2015
Keith,
Sounds like a good idea. Thanks again.
Best,
Larry
Author: Ralph Müller-Pf...
Date: 02 Jun, 2015
Hi there,
thanks for the good and concise paper. I added some comments in the technical components section.
Regards,
Ralph
Author: Andrew Maffei
Date: 03 Jun, 2015
I agree. Very good and concise paper. Thanks all. I posted my suggestions to the RDA website here —
https://rd-alliance.org/comment/3556#comment-3556
Thanks again,
Andrew Maffei
Author: Leonardo Candela
Date: 12 Jun, 2015
This “RFC” document aims at identifying a number of “components” that have to be put in place to support proper data practices. These components are actually described in Sec. 5 after a very long discussion about trends, principles and consequences (!) about these principles.
Overall comments:
Per section specific comments are reported below.
Re Introduction:
Re Section 2:
Re Sec. 3
Re Sec. 5
Re Sec. 6
App. A
App. B
Please, add a References Section rather than using footnotes. This will provide more readable information than often “anonymous” links.
Minor typos:
- page 1 last paragraph “the the type”
- page 6 "diagram 6” should be Figure 7;
Author: Leonardo Candela
Date: 12 Jun, 2015
I'm wondering whether there is any plan to have a virtual space to collect "all" the comments and discussions about this document.
Author: Donatella Castelli
Date: 12 Jun, 2015
- The title is very ambitious and refers to a topic that has been / is largely studied. I am wondering whether indeed the document aims at addressing “Data Management” in its more general terms or if its objective is more confined, for example, to the management of data in the scientific/research contexts and/or in an infrastructural framework (still wide, but closer to the RDA aims), etc..
- The introduction section states:
“RDA aims to be a neutral place where experts from different scientific fields come together to determine common ground in a domain which is fragmented and, by agreeing on "common data solutions", liberate resources to focus on scientific aspects.”
The RDA Mission reported on the RDA website is:
“The Research Data Alliance (RDA) builds the social and technical bridges that enable open sharing of data.”
If my interpretaion is correct then this mission does not imply that RDA is looking for “common” solutions for data management. It’s well known that given the heterogeneity of the Data Universe different solutions are unavoidable. The RDA mission refers instead to “bridges” (some of which may also be based on common resources, e.g. registries) whose implementation requires a more articulated approach than identifying “common solutions”.
- The document refers to data sharing and re-use tasks as major objectives to be achieved. These aims cannot be achieved without including in the scene also the actors producing and consuming data. If you introduce also these elements then aspects like usage policies, controlled access, access monitoring, credits, quality of service, collaborative management enter in the picture and these largely influence the vision, the list of required technical components and their characteristics. The last part includes one components (e.g. authentication system) that may be related to the “actors” aspect but it is not clear why it has been introduced there.
- I have the impression that the “layers of enabling technologies” derive from a very high-level conceptualization of a data-centric research process. If this is the case, you should not forget to include also “Publishing”. This steps, as the others, requires suitable tecnologies & components.
- The link between the content of the different main sections should be improved. Currently it is very difficult to understand how they are related and how the final list of technical components is a logical consequence of trends and principles.
Author: Franco Zoppi
Date: 12 Jun, 2015
Overall comments on how to improve the document message
Used terminology
The document seems to suffer from a problem in the used terminology. Terms are sometimes unclear (in many cases definitions would help) or even wrong or mis-used. I guess that most of these problems could be avoided with a correct use of Computer Science/ICT well established and consolidated terminology.
This is particularly evident in Sections 2.2, 2.3 and 2.6.
Document perspective
The document adopts a single perspective: the “user” perspective (from a Computer Science/ICT point of view).
Themes are faced more according to a “User perception of the problem and requirements specification” approach than to a comprehensive and multi-faceted approach, trying to identify general scope, different views from different stakeholders, different level of abstraction, etc.
This impression is reinforced when reading Appendix A (Roles), where just one of the roles seems to refer to CS/ICT figures - and even low-level (!) ones.
To sum up, I feel that adding a sound “CS/ICT perspective” could be an added value for the whole document and could reinforce its message.
Common Trends
They are very heterogeneous, ranging from simple observations (e.g. Sect. 2.1), to a sort of “historical overview” (e.g. Sect.2.6), to “visions” (e.g. Sect. 2.3). Homogenizing the description and putting them on the appropriate abstraction/description level would clarify the message.
I guess that an overview picture of the general scope of the RFC, highlighting the positioning of each of such trends might help. It would be great to have a “fil rouge” starting from that, going though Sections 3 and 4 and leading to the components in Sect. 5. This could reinforce the rationale of the whole document.
Principles
This section should be improved. Principles are introduced just by covering them via some references and a list of quite common postulates, which are correct indeed, but do not adequately “match” with the rest of the document (partially apart from Sect. 4).
Technical Components
This section largely suffers from not being each component properly positioned in a general picture (call it “model” or “architecture” or whatever else).
I guess this should be the core part of the document, hence it’s fundamental to have a clear perception of “what, why and how” you are proposing this solution.
Here again, proper abstraction levels should be identified and links/relationships to current best practices, standards, technologies, etc. should be highlighted.