Handling record amounts of data, particle physicists are increasingly relying on international collaboration to secure adequate computing power. The D0 collaboration at the Fermi National Accelerator Laboratory in the US is a case in point. Since the beginning of the Tevatron Collider Run II in March 2001, D0 scientists have recorded more than 1000 million particle collisions. The data fill 10 stacks of CDs as high as the Eiffel Tower - storage cases not included - and the stacks are growing daily.
"The Fermilab farms can process four million events per day," said Mike Diesburg, who manages the Fermilab cluster of 600 PCs for D0. This level of computing power can "handle the daily flow of incoming events".
Yet when the D0 collaboration decided to re-examine the entire set of collision data, encompassing more than 500 terabytes, scientists had to look for computing power beyond Fermilab. For the first time ever, D0 scientists had to send actual collision data - the crown jewels of their experiment - offsite.
"In the past, D0 and other particle-physics collaborations have used remote computing sites to carry out Monte Carlo simulations of their experiments," said D0 scientist Daniel Wicke from the University of Wuppertal, Germany, at Fermilab on sabbatical. "We are now one of the first experiments to process real collision data at remote sites. The effort has opened up many new computing resources for our collaboration. The evaluation of our experience will provide valuable input to the worldwide development of computer Grids."
The reprocessing of the D0 collision data, coordinated by Diesburg and Wicke, so far involves computing resources in Canada, France, Germany, the Netherlands, the UK and the US. (Many other countries contribute to the computing of simulated D0 data and the analysis of processed data.) From November 2003 to January 2004, D0 groups in each of the six countries used local PC clusters and Grid networks, ranging from 100 to more than 1000 PCs, to reprocess data. The largest amount of offsite computing (36 million collisions) took place at the Centre de Calcul in Lyon, France.
To provide participating centres with data, the D0 collaboration relied on SAM, the Sequential Access Manager, which was developed at Fermilab.
Before the end of the year the D0 collaboration will again begin to reprocess Run II data, old and new, to apply further refined analysis tools. The new round will need even more offsite computing power, providing ample opportunity to develop the Grid further.