Last autumn's unplanned shutdown of the Large Hadron Collider (LHC) was a disappointment for physicists around the world. But for organizers of the computing Grid supporting the collider's detectors, it was an opportunity to keep working hard. For the first two weeks of June, instead of flooding the Grid with data from actual particle collisions, experiment collaborators at CERN and remote computing sites in Europe, Asia, and North America joined up to test the ability of the collider's Worldwide LHC Computing Grid (WLCG) to record, transfer and analyse simulated data in a step-by-step "production demonstration".

Scientists conducted a series of challenges, collectively called the Scale Test of the Experimental Program 2009 (STEP09). All four LHC experiments participated in the test. For example, at the CMS experiment, they first tested the archiving of older recorded data from CERN to CMS' seven Tier 1 computing sites. There, scientists checked the Tier 1 central processing power as they shuttled data to Tier 2 sites. Finally, they challenged the full physics analysis capacity of the Tier 2 sites. On 15 June, as the curtains closed on STEP09, Oliver Gutsche, a Fermilab physicist who was one of those participating in the effort for the CMS experiment, declared the overall performance "very good".

While the CMS portion of this Grid – like the rest of the WLCG – was ready to take data last September, says Gutsche, the test "gave us an opportunity to test parts that could not be tested on the previous schedule". It also showed how the system will function under simultaneous demands from the LHC's three other detectors.

A primary STEP09 goal was testing the tape systems at CERN and Tier 1 computing centres. When the LHC is operating, computers at CERN will need to record – "write to tape" – at least 15 Petabytes of data per year. Thanks to this run-through, Gutsche said: "We are confident that CERN could write to tape at the speeds needed", when data from collisions begin pouring in.

Another key goal was gauging the analysis capabilities of Tier 2 computing centres. CMS aimed to employ 50% of the Grid's analytical power and while only an ongoing study can prove that it succeeded, Gutsche says that the prognosis looks good. During STEP09's 13-day run, Tier 2 centres performed more than 900,000 analysis jobs. However, the test revealed that there is room for improvement.

Making a good thing better

Operators at CERN and the remote computing sites were forced to work long hours, particularly in the pre-staging process. But their efforts revealed principles that will ease the future automation of these procedures. "Sites are happy because we stressed them and they learned how to run more efficiently," said Gutsche. "Now they have ideas for what they can improve."

Echoing this observation was Ian Fisk, a CMS collaborator at Fermilab. "We wanted to show that we could run on 'non-hero-mode', " he said. "We want to finish a test saying, 'That was easy. We could run for a year at that level.' "

Jamie Shiers of CERN, who organized the computing tests, including STEP09, said: "Many of the Tier 1s, and the Tier 0, sustained a load that was artificially high – certainly higher than early data taking – with generally smooth and sustainable operations. But a few sites did not and this has triggered us to undertake a perhaps overdue analysis of the root causes with a clear desire to fix and retest. We saw significant progress since a year ago."

Shiers added: "For Tier 2s, the results were more variable: Monte Carlo production is clearly a largely solved problem. As for analysis, some sites – even very large ones – did extremely well, while others did not. Once again, we need to understand the root causes and fix them. In some cases, this may be hard: there has been a feeling for quite some time that the external network bandwidth for at least some sites is not large enough and that the internal bandwidth all the way to the data is also too small. Most likely they will need major configuration changes."

Starting in July, CMS scientists have been using the Grid to analyse cosmic ray data, which stream into the detector even when the accelerator is off. When the LHC turns on – in November, says CERN's director-general – the real challenge will begin.

Useful links

WLCG: http://cern.ch/LCG/
CMS FNAL Remote Operations Centre: www.uscms.org/roc

This article was published online in iSGTW on 8 July.