High-Security Data Platform supporting Data Visiting for sensitive data

    You are here

03 Jul 2020
Groups audience: 

Dear colleagues,
Some of you might be interested in this initially ad-hoc activity
emerging from the Austrian COVID19 Future Operations Clearing Board
that is now being supported by the CO-Creationg fund of the EOSC
Secretariat to turn it into a proper and re-usable Open-Source based
solution:
Already very early on during the rise of the COVID19 pandemic, the need
for solid, data-driven decision was recognized. During meetings of the
COVID19 Future Operations Clearing Board, a national expert platform in
Austria, it became evident that access to essential data was missing.
This was primarily due to the impossibility of data owners to share
their data with experts, either due to privacy reasons (medical, social
science), but also due to the massive risk involved in sharing
commercially sensitive data. In order to break this deadlock, TU Wien
within a timeframe of two weeks set up a high-security data
infrastructure and according processes to allow data owners to provide
- highly selective access (data visiting)
- to specific (fine-granular or aggregated, fingerprinted)
subsets of data
- for identified individuals
- for limited periods of time
- to answer precisely defined questions accepted by the data owner
As this infrastructure was set-up and deployed in less than two weeks,
it was obviously highly "handcrafted" and targeted to the specific needs
and local set-up. Yet, the availability of such an infrastructure has
met the interest of several parties who would like to provide such a
data hosting / data visiting solution on their own. We have thus been
granted funding to package and release this infrastructure components
and according documentation to allow other parties to set-up and operate
this infrastructure (or parts of it) in their own environment, thus
being able to maintain complete control over who is able to work with
which parts of their data within their own server infrastructure, and
without having to hand over data to external partners.
More details on this activity will be released via the (currently rather
rudimentary) webpage at
http://www.ifs.tuwien.ac.at/~andi/secure_data_infrastructure.html
We are, of course, very interested in feedback on the current
architecture, issues discovered with the architecture, design, and -
once the components are being provided (planed for late summer / early
autumn) - their usefulnees and limitations. If you have any questions,
pleaselet me know.
Best regards,
Andreas Rauber

  • Kheeran Dharmawardena's picture

    Author: Kheeran Dharmaw...

    Date: 03 Jul, 2020

    Hi Andreas,
    It is great to hear of this initiative and it is truly inspiring to hear
    that it was conceived of and developed in 2 weeks. Secure analytics
    platforms such as this is something that is very much needed and fills a
    gap between the private data and public data.
    From what you describe, the model used seems very similar to the SAIL
    databank platform in the UK. It might be worth reaching out to them to swap
    notes as well.
    I am particularly interested in how you managed to solve the social
    dynamics of this socio-technical problem in such away that it balanced the
    various competing social and institutional needs. I'm interested in this
    because I think it forms a nice case study for the Social Dynamics of Data
    Interoperability IG. Being able to extract and compile problem-solution
    patterns for such challenges is something the IG is interested in and would
    complement and benefit anyone looking to adopt the open source solution
    that is being funded by the EOSC.
    Kind regards
    Kheeran
    On Fri, 3 Jul. 2020, 6:40 pm rauber via RDA-COVID19, <
    ***@***.***-groups.org> wrote:

  • Andreas Rauber's picture

    Author: Andreas Rauber

    Date: 04 Jul, 2020

    Hi Kheeran,
    Thanks a lot for the pointer to SAIL and the UKSerP. This infrastructure is, indeed, conceptually very similar to ours - but several orders of magnitudes larger. It very likely took a bit longer to set-up: Some places seem to have been a bit more forward-looking than others in setting up according infrastructures. We, unfortunately, had (and still have) nothing like that in place at this scale.
    While setting up a solution like UKSerP was out of the question given the time and budget constraints, we luckily are also facing a somewhat simpler setting with the current project: rather than setting up a central system we want to enable data owners directly to deploy the solution within their own environment - which should make it easier (both from a legal perspective as well as concerning trust) - to make their data accessible. It also keeps the data where the domain expertise is available. This, however, also limits interlinking between data, integrating data from different sources. But we hope that once trust has been gained in the individual systems operated by the data owners themselves, they will also be more willing to trust a third-party system providing the linkage of data while still keeping the data owners in control.
    Thus, I wouldn't dare to say in any way that we have managed to * solve * the social dynamics. On the contrary, as long as benefits are to be gained from having control over the data, and as long as handing over data to somebody else would risk reducing these benefits, we will see all kind of activities happening to impede access to data. Our hope is that via a bottom-up process institutions can learn about the opportunities, assess the risks that each such infrastructure carries, in spite of all protection mechanisms themselves, understand the trade-offs and implications, and to thus gain trust and see the benefits. It's a rather organic process, allowing institutions (which, in our case, also include industry as a key stakeholder: a lot of data that was relevant in our discussions was commercial data (supply chain analysis, logistics data, etc.) which is held by companies which seem less eager to feed such sensitive data into centralized repositories) to learn from examples and eventually try it themselves. At least that's our hope and vision in preparing these building blocks and set-up descriptions.
    Andi

  • Kheeran Dharmawardena's picture

    Author: Kheeran Dharmaw...

    Date: 08 Jul, 2020

    Hi Andi,
    Thank you for clarifying the difference in approaches. From the SDDI-IG
    perspective, this is a wonderful example of two different approaches to the
    same problem. The SDDI-IG is looking to compile a corpus of knowledge of
    such problem-solution patterns, so it would be useful to explore this
    further with you. As this is off-topic for the RDA-COVID19 group, I will
    email separately to you to discuss further.
    Regards
    Kheeran
    On Sat, Jul 4, 2020 at 6:29 PM Andreas Rauber <***@***.***>
    wrote:

submit a comment