Impressions of RDA Plenary 4 and Big Data Analytics IG

27 Oct 2014

By Fabio Benedetti

Hi researchers, my name is Fabio Benedetti, a Phd. student of the International Doctorate School in Information and Communication Technologies of the University of Modena and Reggio Emilia, and I am writing this post to share my experience at the Fourth RDA Plenary Meeting in Amsterdam as one of the winners of the Early Career Support Programmes.

I was assigned to the Big Data Analytics Interest Group, but first I would like to spend some time talking about the RDA Meeting in general. The meeting was held in a beautiful theatre in Amsterdam, a very beautiful city that I will certainly visit again soon. I would like to congratulate with the organizers of the event for the wonderful work done. An engaging plenary session started the session on Monday morning in which some influential figures, such as Robert Jan Smits (Director General DG Research, European Commission) and Neelie Kroes (Vice-President of the European Commission), stood in front of an audience of around four hundred scientists. They introduced the meeting in a way which really hit home just how important this event was.

After that, the first poster session for early career scientist began. To be honest I was a bit nervous, but knowing other young researchers were in the same situation made me feel less on edge. In general, the poster session went smoothly and I got some interesting hints and had some interesting discussions. The only minor issue was that the location of the poster was not in a good vantage point; the session was partially hidden from the core of the meeting, so it did not get the reaction or attention that was expected or intended.

After the first poster session, the carousel of Working and Interest groups started, so I attended the first meeting of the Big Data Analytics Interest Group. Morris Riedel, a chair member, started the works underlining the aims of the Big Data Interest Group:

· To clarify some foundational terminologies in the context of data analytics.

· To develop a recommendation document that can serve as a best practice guide for scientific communities interested in working in Big Data technologies

The group agreed that the best way to proceed is to collect a heterogeneous group of uses cases for trying to define the best practice in the data analytics process.

Geoffrey Fox made the first presentation titled "Big Data Infrastructure Use Cases". He proposed two interesting use cases regarding his research work. In the first, he proposed a model for the Integration of High performance computing with the Apache Big Data Stack; in the second, he presented SPIDAL, a scalable analytics framework that could be used in different research domains. In the following discussion, it emerged the objective of the group to adopt the Mapreduce ecosystem as the main infrastructure, leaving aside other new paradigms like SPARK, etc.

After that, Morris Riedel started a very interesting presentation titled "Classification Techniques in Remote Sensing Research using Smart Data Analytics". In this presentation, some valuable examples were shown and the main topic was: techniques for smart Data Analytics.

Finally, Shahbaz Memon carried out the last presentation of the day, titled "Ultrasound and Brain Analytics". Techniques for large scale image processing were presented. It emerged from the final discussion that some fields had not been covered by the use-case examples (human science, life science), so the chairs asked the audience if there was anyone that wanted to cover these topics in the second meeting of the group. Moreover, the Interest Group decided to adopt the ISO Reference Model for Open Distributed Processing (RM-ODP) model for structuring this analysis of existing use cases. RM-ODP appears particularly useful in situations where multiple stakeholders with divergent background need to converge.

A huge number of use cases, covering different domains, characterized the second day:

· Wo Chang – Big Data Infrastructure and Use Cases

· Kuo Kwo-Sen – Automated Identification of Episodes of Earth Science Phenomena

· Peter Baumann – In Array Database Analytics

· Hugh Shanahan - NGS Transcriptomic Workflows

· Frank Seinstra - Big Compute meets Big Data

In my opinion all these talks were incredibly valuable and very useful solutions had been proposed. Unfortunately, the number of speakers was very high and there was insufficient time for discussion at the end of the second meeting.

I think that the work that the Big Data Analytics IG is doing is very valuable and looking deeply at the problems raised from the discussion it is a really hard task to define general guidelines in this wide field. In general, the RDA 4th Plenary meeting was for me a really exciting because it gives me the possibility to discuss with some of the most important figures in the research field. All, from the talks to the organization, was great and I would like to thank RDA for the opportunity that was given to me.

Fabio Benedetti