The LHC Computing Grid gets started…

This summer the IT division at CERN was a hive of activity as dozens of young software engineers worked round the clock to launch the LHC Computing Grid (LCG) into its first phase of operations. Meanwhile, similar hectic preparations were going on at other major computing centres around the world.

The LCG project, which was launched last year, has a mission to integrate thousands of computers worldwide into a global computing resource. This technological tour de force will rely on novel Grid software, called middleware, and will also benefit from new hardware developments in the IT industry.

The challenge facing the LCG project can be summarized in terms of two large numbers. The LHC will produce more than 10 petabytes of data a year and require around 100,000 of today's PCs to analyse that data.

The LCG project has been rapidly gearing up for this challenge, with more than 50 computer scientists and engineers from partner centres around the world joining the effort over the past year. The first version of the LCG, called LCG-1, is now up and running on a restricted number of sites and with limited functionality.

October 2003 p9 (abridged).


…while the EGEE gets ready

The success of the European Union (EU)-funded European Data Grid (EDG) project (The Grid gets EU funds) – a three-year effort led by CERN, which is due to finish in spring 2004 – has generated strong support for a follow-up project. The objective is to build a permanent European Grid infrastructure that can serve a broad spectrum of applications reliably and continuously. So CERN has established a pan-European consortium called Enabling Grids for E-science in Europe (EGEE) to build and operate such a production Grid infrastructure, providing round-the-clock Grid service to scientists throughout Europe.

A proposal for the project was submitted to the EU 6th Framework Programme in May 2003. This proposal, again led by CERN, involves some 70 partners, encompassing all major computer centres in Europe, as well as leading American and Russian centres.

The LHC Computing Grid will provide the springboard for EGEE and in turn benefit from Grid software engineering that is part of the EGEE project. However, the mission of EGEE is also to extend the potential benefits of a Grid infrastructure beyond high-energy physics.

October 2003 p9 (abridged).


CERN's computer centre prepares for LHC

A major upgrade of CERN's computer centre has been underway for the past year to increase capacity for the facility's role as the heart of the LHC Computing Grid (LCG). Since services must be kept running round the clock during the upgrade, a rolling approach is needed. In a major migration last year many systems, including five StorageTek tape silos, were moved to the newly created machine room in the basement. This allowed an upgrade of the electrical distribution in half of the main machine room to be done during the autumn.

Since this upgrade, the centre can now cope with a demand of up to 1 MW in this area of the machine room, which is equivalent to about 5000 PCs. With all this power being turned into heat, adequate air-conditioning is a major concern and the first stages of a new underfloor cold-air distribution system have been installed to cope with increased demand. During the spring of 2004 equipment began to be moved over from the other half of the machine room, starting with the servers for CERN's administrative applications. These were moved to a dedicated area equipped with dual power supplies to ensure that these crucial services can be maintained even during an extended power cut – although full protection will only become available once a new substation is commissioned for the centre in early 2005.

To manage all of the equipment moves, close control over the configuration of the different systems and high levels of automation are essential. These are taken care of by ELFms, CERN's Extremely Large Farm management system. Two ELFms components, quattor (a system administration toolkit for automated installation, configuration and management of clusters and farms running UNIX derivatives) and the Hardware Management System have a particularly important role. The quattor Configuration Database, developed as part of the European DataGrid project, now holds information about more than 95% of the systems in the computer centre – information ranging from the precise details of the software installed to the location of the system in the computer centre.

Using the ELFms Hardware Management System (developed with support from the UK's GridPP as part of the LCG project) and information from quattor, the computer centre operations manager can produce a list of systems to be moved and know that the right people will be contacted in the correct order to shut down and move systems, reinstall the operating system if required, and then restart them on schedule. With almost 1500 machines moved in four months the newly deployed software has been given a thorough workout and has performed according to expectations.

September 2004 p5.