Topics

RHIC Mock Data Challenge successfully completed at Brookhaven

29 January 1999

With Brookhaven’s RHIC relativistic heavy-ion collider scheduled to be commissioned this year, preparations for its experimental programme gather momentum. The RHIC Mock Data Challenge 1 (MDC-1) began on 8 September and finished successfully on 19 October.

With installed capacities amounting to approximately 25% of that which will be available at the start of the first RHIC physics run, this six-week exercise involved the RHIC Computing Facility, the US Department of Energy’s (DOE’s) High Energy and the Nuclear Physics Computational Grand Challenge Initiative, and the four RHIC experimental collaborations, BRAHMS, PHENIX, PHOBOS, and STAR.

The main goals of the exercise were to show the performance of: event data recording; event reconstruction; and data mining (selecting rich subsets from large volumes of data); each for multiple experiments simultaneously.

During the exercise, aggregate event data recording rates into the High Performance Storage System (HPSS) for the four experiments as high as 18 Mbyte/sec for an 8-hour period were measured. (HPSS is hierarchical storage management system software developed under a Cooperative Research And Development Agreement including several DOE Labs, now commercialized by IBM.)

Event reconstruction by the four experiments on a computing farm consisting of up to 104 Pentium II processors, representing some 1400 SPECint95 benchmarks of CPU capacity, were achieved with CPU utilization efficiencies for a 16-hour period averaging 80% across the experiments.

During simultaneous event data mining by the four experiments, a variety of data access measurements were made. These included evaluation of the performance of a Sun server compared with network-connected Pentium farm machines, the use of Grand Challenge Project software to coordinate queries, and the use of an Oak Ridge developed system to batch files for access from HPSS tapes.

The Grand Challenge Project and STAR were also able to build and exercise a data summary tape level objectivity event data store. Secondary objectives, including running multiple simultaneous functions for a subset of the experiments and running for extended periods for individual experiments, seven days for PHENIX and STAR, were also achieved.

From the perspective of the RHIC Computing Facility, the exercise was valuable in terms of verifying and detailing expected behaviour and limitations of the current facility and by revealing some unexpected problems.

As anticipated, the Managed Data Server (MDS) and in particular the HPSS were found to be the single most complex and critical components. The HPSS showed itself capable of high performance and adequate to the goals of the exercise. However, it was also clear that the time between its initial installation at the RHIC Computing Facility and its large-scale use in the exercise were not adequate to achieve the desired levels of reliability. The limited storage resources, in particular tape drives, available for this exercise also contributed to the stress on HPSS. Except for an initial delivery delay, the performance of the Intel-based Linux processor farms during the exercise were gratifyingly close to what was anticipated. An unexpected issue was the performance of the RHIC Wide Area Network. While the need to tune RHIC Computing Facility network parameters and collaborating remote machines was anticipated, end-to-end problems including the national ESnet and/or commercial links were more serious and less tractable than anticipated.

From the perspective of the RHIC Computing Facility, the ability of all six parties to participate effectively in a unified exercise was the most important outcome. If this synergy continues and convergent iteration can be achieved, effective computing for the RHIC experimental programme will be assured.

bright-rec iop pub iop-science physcis connect