During the polarized proton-proton run that ended in June at the Relativistic Heavy Ion Collider (RHIC) at Brookhaven, Grid tools were used by the PHENIX experiment to send recently acquired data to a regional computing centre for the experiment in Japan. Brookhaven National Laboratory, on Long Island, New York, is home to the RHIC/ATLAS Computing Facility (RCF/ACF), which is the main computing centre for experiments at RHIC and a Tier-1 computing centre for ATLAS. The PHENIX regional computing centre in Japan (CCJ) is at the RIKEN research centre on its Wako campus close to Tokyo.
Going into the polarized proton-proton run, PHENIX faced the challenge that the RCF would be busy reconstructing and analysing gold-gold and copper-copper data recorded in 2004 and 2005. The enormous polarized proton-proton data set was transferred to Japan to make use of the substantial computing resources at CCJ, which is comparable to the PHENIX portion of the RCF.
The PHENIX data acquisition can sustain a peak data rate of up to 600 MB/s, and runs at a typical rate of 250 MB/s while beam is stored in RHIC. The data were buffered at the experimental site before being transferred and archived in the RCF tape library. A 35 TB disk-storage system (about 60 h at typical data rates) allowed PHENIX to archive and transfer data at a lower steady rate, taking advantage of various breaks in the flood of data. A transfer rate of 60 MB/s sustained steadily around the clock was able to keep up with the incoming data stream.
Initially, PHENIX had planned to transfer the polarized proton-proton data by physically transporting tape cartridges to CCJ. During the early part of the run, however, it was found that network transfer rates of 700-750 Mbits/s could be achieved. A dedicated network path was established from the PHENIX counting house to the BNL perimeter network, and the tape option became a fall-back solution. In the end, not a single tape was shipped.
The principal tool used for the transfer was GridFtp, which proved to be very stable. Brookhaven has a high-speed connection (OC48) to ESNET, which is connected to a transpacific line (10 Gbit/s) served by SINET in Japan. Apart from two half-day outages of ESNET, the transfers continued around the clock for the entire 11 week run.
Approximately 270 TB of data (representing 6.8 billion polarized proton-proton collisions) were transferred to CCJ. After a few days of fine-tuning the transfer parameters, the transfers became part of the regular data-handling operation of the PHENIX shift crews, requiring experts to intervene only occasionally.
This seems to be the first time that a data transfer of such magnitude was sustained over many weeks in actual production, and was handled as part of routine operation by non-experts. The successful completion of this large-scale transfer project demonstrates both the maturity of today's Grid tools and the real feasibility of integrating remote resources into the data-handling and processing chain of large experiments.
Author:
Compiled by Hannelore Hämmerle and Nicole Crémel