You are here
Data Access and Management in the EUDAT Collaborative Data Infrastructure
Introduction: A European and Global Policy Context
EUDAT, the European Data e-Infrastructure Initiative, is working at a European level, in a global context. Maximising the value of twenty-first century digital research is no longer a regional, nor even a national, challenge, and Europe is leading by example in the construction and realisation of global research data infrastructure.
EUDAT welcomes the recent policy statements from the European Commission on open access to scientific publications and research data and the importance of data management planning in the forthcoming Horizon 2020 research programme, and endorses the research data guiding principles from the recent G8+O6 working group on data. As a cross-community, data-driven project EUDAT is strongly positioned to support the Horizon 2020 open access data pilot aiming at a reliable and high-performance infrastructures for data management – because EUDAT shares the same goals.
EUDAT's Mission and Goals
EUDAT’s goal is to build a Collaborative Data Infrastructure (CDI) as a pan-European solution to the challenge of data proliferation in Europe’s scientific and research communities. The CDI will allow researchers to share data within and between communities and enable them to carry out their research effectively. Our mission is to provide a solution that will be affordable, trustworthy, robust, persistent, open and easy to use.
In short, researchers can rely on the EUDAT CDI as a repository for their data.
In building the CDI, EUDAT is guided by a handful of principles, founded in a fundamental belief in open access to research data, and which overlap strongly with those in the EC’s Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020:
- Data deposited with the EUDAT CDI will be preserved long-term. The CDI will, in time, become the data infrastructure layer supporting long-term archiving and preservation of European research data.
- Data are best curated in their own communities. EUDAT will always seek to include data producers from a particular community in the long-term preservation of those data.
- Access to data in the EUDAT CDI is free at the point of use. Where data are unfettered by licence or ownership conditions, EUDAT will offer them to registered users free of charge at the point of use. Charging researchers for each access or use of data will create a barrier to use that will run counter to the fundamental ideas of data sharing that EUDAT seeks to promote.
- For an EUDAT community repository to be designated a Trustworthy Digital Repository (TDR), it follows that EUDAT services and infrastructure must be a suitable target for “TDR outsourcing”. EUDAT plans for a future where community TDRs will join the EUDAT federation, so CDI “back-office” services are being constructed to have the necessary properties to be regarded as legitimate targets for “TDR outsourcing” as discussed at datasealofapproval.org.
- EUDAT will not assert ownership of any data it holds. The EUDAT CDI is a vehicle to promote data sharing, not data hoarding. Ownership of data will remain with the contributor, although EUDAT will encourage openness from all participants and contributors.
EUDAT’s Collaborative Data Infrastructure Network
The principle of involving research communities as curators leads to the EUDAT CDI being a logical federation of data producers with a shared “back-office” rather than a monolithic entity trying to do everything itself. The CDI operates this way in a partnership model because the underlying infrastructure must be distributed, with a designated set of service-provider sites hosting common services for the wider research community. The CDI is thus a connected network of European research institutions (“community sites”) and data centres, each offering one or more common EUDAT data services to both participating research communities and independent researchers.
We illustrate this network model of the CDI below:
- generic data centre nodes are heavily connected between themselves: each is connected to at least one other;
- community sites are less heavily connected; typically a community site is connected to one partner data centre node, although it may be connected to more;
- all connected nodes run some version of the CDI node software suite, the set of software components which delivers the common EUDAT data services PID Registration, B2SAFE, B2STAGE, B2SHARE and B2FIND. Nodes will run all software necessary to deliver the services they offer;
- core operational services (service registry, monitoring, etc.) run at some (a few) of the generic data centres.
Connecting with the CDI
To join the EUDAT CDI, community sites or research institutions partner with their preferred data centre of choice. This one-to-one local partnership, governed by standard service-level agreements, keeps the interaction between research communities and data centres as straightforward as possible while giving communities access to the reach and strength of the CDI network.
To make use of CDI services (finding and accessing data, for instance, or storing smaller data sets), users can simply interact with one of the CDI public front-end services B2FIND or B2SHARE or, via the CDI Common Service Layer API interfaces (i.e. GridFTP, iRODS and, in due course, the HTTP API currently in development for 2014).
EUDAT and Open Access
EUDAT believes fundamentally in open access. By open access we mean the free availability of data on the public Internet, permitting any user to reproduce and redistribute them for any purpose, and in particular for the purpose of non-commercial research, without financial, legal or technical barriers. The only allowable constraint on reproduction and redistribution should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.
One of the prime motivations for the CDI is to create a single domain of registered, well-described, cross-disciplinary data, connecting collections and data centres across Europe and harmonising access to them – harmonising access not just in the technical sense but in the policy sense.
All nodes joining the CDI are strongly encouraged to adopt open access policies towards their collections in return for the benefits of EUDAT replication and management services. In this, EUDAT follows two further guiding principles:
- all data in the CDI should, in time, become full open access. Open access is the norm for CDI data;
- embargo periods for original producers are fully supported, on condition that such data become openly accessible when the embargo period expires.
In the first instance EUDAT seeks to harmonise open access policy across the CDI through adoption of common persistent identification schemes and common data access licences.
EUDAT adopts globally unique Handles to identify digital objects within the CDI. The Handle System, the system behind DOI and other well-known identification mechanisms, is administered by the new Digital Object Naming Authority (DONA) and is used worldwide. EUDAT works with the European PID Consortium (EPIC) to ensure all data objects registered in the CDI receive a unique, persistent Handle.
EUDAT encourages all participating sites and centres to adopt a common metadata core for digital objects based on version 3.0 of the DataCite standard. Coupled with the EPIC Handle service this creates a first level of harmonisation across the CDI domain.
Open data licences
All participating sites in the CDI are encouraged to adopt open licences for access to their data collections. Combining data released under different licences can be challenging, and because of this EUDAT intends in time to converge on one common licensing scheme for participating sites. Currently, EUDAT recommends two main licensing schemes for those planning to join:
- Creative Commons v4.0, particularly:
- the Creative Commons Attribution License 4.0 International (“CC BY 4.0”).
- Open Data Commons, particularly:
- the Open Data Commons Open Database License (ODbL) v1.0;
- the Open Data Commons Attribution License v1.0.
EUDAT Support For Data Management Planning
Data management planning – thinking in advance about what will happen to data produced during the research process – is increasingly required by national research funding agencies, and new guidelines for Horizon 2020 research projects were released by the EU in December 2013 (Guidelines on Data Management in Horizon 2020).
EUDAT exists in part to disseminate and promote best practice in data management for twenty-first century research, and to provide support for communities in adopting basic principles such as PID registration, metadata creation, replication.
As part of its mission to help researchers and research communities manage and preserve their data, EUDAT has begun work with the world-recognised Digital Curation Centre on a version of their widely-used DMPonline tool which will capture the H2020 guidelines in a data management planning tool tailored to the emerging needs of European research. First versions of this tool will be available in early 2014, and will guide and support researchers through the process of creating effective data management plans for digital research in Horizon 2020.