Plans for the next generation of network-based information-handling systems took a major step forward when the European Union’s Fifth Framework Information Society Technologies programme concluded negotiations to fund the Data Grid research and development project. The project was submitted to the EU by a consortium of 21 bodies involved in a variety of sciences, from high-energy physics to Earth observation and biology, as well as computer sciences and industry. CERN is the leading and coordinating partner in the project.
Starting from this year, the Data Grid project will receive in excess of Ý9.8 million for three years to develop middleware (software) to deploy applications on widely distributed computing systems. In addition to receiving EU support, the enterprise is being substantially underwritten by funding agencies from a number of CERN’s member states. Due to the large volume of data that it will produce, CERN’s LHC collider will be an important component of the Data Grid (see The grid is set to grapple with large computations).
As far as CERN is concerned, this programme of work will integrate well into the computing testbed activity already planned for the LHC. Indeed, the model for the distributed computing architecture that Data Grid will implement is largely based on the results of the MONARC (Models of Networked Analysis at Regional Centres for LHC experiments) project. CERN’s part in the Data Grid project will be integrated into its ongoing programme of work and will be jointly staffed by EU- and CERN-funded personnel.
The work that the project will involve has been divided into numbered subsections, or “work packages”. CERN’s main contribution will be to three of these work packages: WP 2, dedicated to data management and data replication; WP 4, which will look at computing fabric management; and WP 8, which will deal with high-energy physics applications. Most of the resources for WP 8 will come from the four major LHC experimental collaborations: ATLAS, CMS, ALICE and LHCb.
Other work will cover areas such as workload management (coordinated by the INFN in Italy), monitoring and mass storage (coordinated in the UK by the PPARC funding authority and the UK Rutherford Appleton Laboratory) and testbed and networking (coordinated in France by IN2P3 and the CNRS). CERN is also contributing to the work on testbeds and networking, and it is responsible for the overall management and administration of the project with resources partially funded by the EU.
The data management work package will develop and demonstrate the necessary middleware to ensure secure access to petabyte databases, enabling the efficient movement of data between Grid sites with caching and replication of data. Strategies will be developed for optimizing and costing queries on the data, including the effect of dynamic usage patterns. A generic interface to various mass storage management systems in use at different Grid sites will also be provided.
The objective of the fabric management work package is to develop new automated system management techniques. This will enable the deployment of very large computing fabrics constructed from tens of thousands of mass-market components, with reduced systems administration and operations costs. All aspects of management will be covered, from system installation and configuration through monitoring, alarms and troubleshooting.
WP 8 aims to deploy and run distributed simulation, reconstruction and analysis programs using Grid technology. This package is central to the project because it is among those that enable the large-scale testing of the middleware being developed by the other work package groups and it provides the user requirements that drive the definition of the architecture of the project.
Dozens of physicists, mostly from Europe, will participate in the endeavour while continuing to perform their day-to-day research activities.
A project architecture task force has recently been appointed, with participants from the relevant middleware work packages and a representative from the applications. Leading US computer scientists are also participating in this effort to ensure that developments in the US continue in parallel with work being carried out in Europe. Data Grid is hosting the first Global Grid Forum in Amsterdam in March, which will aim to coordinate Grid activity on a worldwide scale.
Visit http://www.cern.ch/grid for further information on Data Grid.