Array Database Assessment Working Group

19 Oct 2015

Array Database Assessment Working Group

The Array Database Assessment WG will inspect the emerging technology of Array Databases to provide support for technologists and decision makers considering Big Data services in academic and industrial environments (such as in large-scale data centers) by establishing best-practice guidelines on how to optimally serve multi-dimensional gridded Big Data through Array Databases. This will be accomplished through a neutral, thorough hands-on evaluation assessing available Array Database systems and comparable technology
  • based on relevant standards, such as the NIST Big Data Reference Architecture, ISO “Array SQL”, and OGC Web Coverage Processing Service (WCPS) for the geo domain;
  • comparing technical criteria like functionality, thereby eliciting the state of the art;
  • establishing and running a combination of domain-driven and domain-neutral benchmarks that will be run on each platform;
  • as well as real-life, publicly accessible deployments at scale.
The result, consisting of the ADA-WG report together with the open-source benchmarking software and the services established, will establish a hitherto non-existing overview on the state of the art and best use of Array Databases in science, engineering, and beyond.
 
Review period start: 
Monday, 19 October, 2015
Documents : 
  • Dawn Wright's picture

    Author: Dawn Wright

    Date: 19 Oct, 2015

    I think this is a very exciting WG proposal and such a group will be able to make important linkages to a host of existing groups. The proposal sites a final report as a major deliverable with results and recommendations that I'm sure will be useful for adoption. However, what I find the most exciting, from an adoption standpoint, is the publicly-accessible deployments of multi-dimensional gridded arrays at scale. I hope this software environment will indeed be very closely coupled with the WG report itself. I wonder if this WG will be considering unstructured (big) data as well? Best wishes to the group for the success of this proposal.

     

  • Matthew Turk's picture

    Author: Matthew Turk

    Date: 21 Oct, 2015

    I find this proposal to be extremely interesting, and very relevant.  I could see this being an incredibly useful working group with important outcomes.  I'd be keen to participate.

  • Peter Wittenburg's picture

    Author: Peter Wittenburg

    Date: 04 Nov, 2015

    Hallo Peter, all,

    I read your Case Statement with great interest. and hope indeed that we will come to a WG within RDA.

    Let me make some comments to your proposal.

    • Having guidelines for Array Databases is certainly a valuable goal in my eyes since we often see that practitioners hesitate to make use of something, since the investments are high and the benefits not clear. So we need to overcome hesitations and that can only be done with the help of balanced testing and reporting.
    • Therefore I think that the benefits (performance gain, management easiness, etc.) should be compared with the costs (investments in time and money) and as you state strengths and weaknesses.
    • Therefore it would be good if potential users can extract from your results whether it makes sense to use array db or better not. Comparisons are essential, the question is comparison between what? If I just compare between different database concepts not so much is gained, what people in general using is some form of sliding window across files. A comparison against optimal procedures of traditional type would be excellent. No idea how this can be done, but ... Here you need to be a bit more specific I guess and of course examples from different communities would be great.
    • Part of the overall costs for users is, I assume, to bootstrap things, i.e. how to upload all lengthy time series into a DB, establishing or using a certain powerful framework (computer, storage, etc.)
    • I think that your adoption plan is not yet satisfying. It is important to know who is going to test things etc. I think that it is important to mention scientific communities that have an interest and will participate in this. In the membership list you have quite a number of experts partly engaged in communities, but it is not clear what their role in adoptions, testing etc is.

    There is a small formal point: I think you just should start with two co-chairs which can be temporary and be replaced by new ones later. For the interaction it is good to have names – one will be you I assume.
     

    best

    peter W

submit a comment