Open science has become a pillar of the policies of national and international research-funding bodies. The ambition is to increase scientific value by sharing data and transferring knowledge within and across scientific communities. To this end, in 2015 the European Union (EU) launched the European Open Science Cloud (EOSC) to support research based on open-data science.
To help European research infrastructures adapt to this future, in 2019 the domains of astrophysics, nuclear and particle physics joined efforts to create an open scientific analysis infrastructure to support the principles of data “FAIRness” (Findable, Accessible, Interoperable and Reusable) through the EU Horizon 2020 project ESCAPE (European Science Cluster of Astronomy & Particle physics ESFRI research infrastructures). The ESCAPE international consortium brings together ESFRI projects (CTA, ELT, EST, FAIR, HL-LHC, KM3NeT and SKA) and other pan-European research infrastructures (RIs) and organisations (CERN, ESO, JIVE and EGO), linking them to EOSC.
Launched in February 2019, the €16M ESCAPE project recently passed its mid-point, with less than 24 months remaining to complete the work programme. Several milestones have already been achieved, with much more in store.
Swimming in data
ESCAPE has implemented the first functioning pilot ‘Data Lake’ infrastructure, which is a new model for federated computing and storage to address the exabyte-scale of data volumes expected from the next generation of RIs and experiments. The Data Lake consists of several components that work together to provide a unified namespace to users who wish to upload, download or access data. Its architecture is based on existing and proven technologies: the Rucio platform for data management; the CERN-developed File Transfer Service for data movement and transfer; and connection to heterogenous storage systems in use across scientific data centres. These components are deployed and integrated in a service that functions seamlessly regardless of which RI the data belong to.
ESCAPE aims to deploy an integrated open “virtual research environment”
The Data Lake is an evolution of the current Worldwide LHC Computing Grid model for the advent of HL-LHC. For the first time, thanks to ESCAPE, it is the product of a cross-domain and cross-project collaboration, where scientists from HL-LHC, SKA, CTA, FAIR and others co-develop and co-operate from the beginning. The first data orchestration tests have been successfully accomplished, and the pilot phase demonstrated a robust architecture that serves the needs and use-cases of the participant experiments and facilities. Monitoring and dashboard services have enabled user access and selection of datasets. A new data challenge also including scientific data-analysis workflows in the Data Lake is planned for later this year.
ESCAPE is also setting up a sustainable open-access repository for deployment, exposure, preservation and sharing of scientific software and services. It will house software and services for data processing and analysis, as well as test datasets of the partner ESFRI projects, and provide user-support documentation, tutorials, presentations and training.
The collaborative, open-innovation environment and training actions provided by ESCAPE have already enabled the development of original open-source software. High-performance programming methods and deep-learning approaches have been developed, benchmarked and in some cases included in the official analysis pipelines of partner RIs. Definition of data formats has been pursued as well as the harmonisation of approaches for innovative workflows. A common meta-data description of the software packages, community implementation based on an available standard (CodeMeta) and standard guidelines (including licensing) for the full software development lifecycles have been gathered to enable interoperability and re-use.
Following the lead of the HEP Software Foundation (HSF), the community-based foundation of ESCAPE embraces a large community. Establishing a cooperative framework with the HSF will enable HSF packages to be added to the ESCAPE catalogue, and to align efforts.
From the user-access point of view, ESCAPE aims to build a prototype ‘science analysis platform’ that supports data discovery and integration, provides access to the repository, enables user-customised processing and workflows, interfaces with the underlying distributed Data Lake and links to existing infrastructures such as the Virtual Observatory. It also enables researchers’ participation in large citizen-powered research projects such as Zooniverse. Every ESFRI project customizes the analysis platform for their own users on top of some common lower-level services such as JupyterHub, a pre-defined Jupyter Notebook environment and Kubernetes deployment application that ESCAPE is building. First prototypes are under evaluation for SKA, CTA and for the Vera C. Rubin Observatory.
In summary, ESCAPE aims to deploy an integrated open “virtual research environment” through its services for multi-probe data research, guaranteeing and boosting scientific results while providing a mechanism for acknowledgement and rewarding of researchers committing to open science. In this respect, together with four other thematic clusters (ENVRI-Fair, EOSC-Life, PANOSC and SSHOC), ESCAPE is partner of a new EU funded project ‘EOSC Future’ which aims to gather the efforts of more researchers in some cross-domain open-data ‘Test Science Projects’ (TSP). TSPs are collaborative projects, including two named Dark Matter and Extreme Universe, in which data, results and potential discoveries from a wealth of astrophysics, particle-physics and nuclear-physics experiments, combined with theoretical models and interpretations, will increase our understanding of the universe. This requires the engagement of all scientific communities, as already recommended by the 2020 update of the European Strategy for Particle Physics.
Open-data science projects
In particular, the Dark Matter TSP aims at further understanding the nature of dark matter by performing new analyses within the experiments involved, and collecting all the digital objects related to those analyses (data, metadata and software) on a broad open-science platform that will allow these analyses to be reproducible by the entire community wherever possible.
The Extreme Universe TSP, meanwhile, intends to develop a platform to enable multi-messenger/multi-probe astronomy (MMA). There are many studies of transient astrophysical phenomena that benefit from the combined use of multiple instruments at different wavelengths and different probe types. Many of these are based on the trigger of one instrument generating follow-ups from others at different timescales, from seconds to days. Such observations could lead to images of strong gravitational effects that are expected near a black hole, for example. Extreme energetic astrophysical pulsing phenomena such as gamma-ray bursts, active galactic nuclei and fast radio bursts are also high-energy phenomena not yet fully understood. The intention within ESCAPE is to build such a platform for MMA science in such a way as to make it sustainable.
ESCAPE is also setting up a sustainable open-access repository for deployment, exposure, preservation and sharing of scientific software and services
The idea in both of these TSPs is to exploit for validation purposes all the prototype services developed by ESCAPE and the uptake of its virtual research environment. At the same time the TSPs aim to promote the innovative impact of data analysis in open science, validate the reward scheme acknowledging scientists’ participation, and demonstrate the increased scientific value implied by sharing data. This approach was discussed at the last JENAS 2019 workshop and will be linked to two homologue joint ECFA-NuPECC-APPEC actions (iDMEu and gravitational-wave probes of fundamental physics).
Half-way through, ESCAPE is clearly proving itself as a powerful catalyst to make the world’s leading research infrastructures in particle physics and astronomy as open as possible. The next two years will see the consolidation of the cluster programme and the inclusion of further world-class RIs in astrophysics, nuclear and particle physics. Through the TSPs and further science projects, the ESCAPE community will continue to engage in building within EOSC the open-science virtual research environment of choice for European researchers. In the even longer term, ESCAPE and the other science clusters are exploring how to evolve into sustained “Platform Infrastructures” federating large domain-based RIs. The platforms would operate to study, define and set up a series of new focuses around which they engage with the European Commission and national research institutes to take part in the European data strategy at large.