COMPASS course to future computing

A major new spectrometer that is being installed at CERN will be a flagship fixed-target experiment for the millennium. Its voracious appetite for data requires new computing solutions, opening the door for subsequent 21st-century studies.

compass1_9-99 — Moving a 420 ton magnet for the COMPASS experiment, a CERN flagship fixed-target study for the next decade.

Hall 888, of the north area of CERN’s SPS synchrotron, is muon beam country. For almost two decades this hall hosted the European Muon Collaboration (EMC) spectrometer (companioned downstream by the NA4 apparatus), subsequently adapted to the needs of the NMC experiment and then the SMC experiment. Using CERN’s high-energy muon beam and a variety of targets, these experiments provided a wealth of insights into the quark/gluon content of nuclear particles.

Their successor will be Common Muon and Proton Apparatus for Structure and Spectroscopy (COMPASS), proposed in March 1996 by a large community of physicists with a keen interest in nucleon structure and hadron spectroscopy. The experiment aims to address remaining questions using all of the artillery available today.

One central issue is to look at the contribution of gluons to the nucleon spin. EMC and SMC made decisive advances towards the understanding of the nucleon spin in terms of its constituents, but the role of the gluon needs to be clarified.

The other major physics objective is to look for particles such as glueballs, composed of gluons rather than quarks, quarkgluon hybrids and quarkantiquark combinations. Such exotica have long been searched for and a few candidates have been identified, but nothing like the rich spectrum expected from theory.

The experiment was approved in February 1997 and the construction of new detectors is proceeding fast. Key features of the new spectrometer (actually a two-stage spectrometer, to allow for large geometrical and dynamical acceptance) are:

full particle identification (charged particles using high granularity RICH ring-imaging Cherenkov counters);
calorimetry for energy measurement;
high rate (beam intensities of 108 particles per pulse).

Coping with such a high beam rate is the main feature of the new spectrometer. On the detector side, many novelties will be implemented:

for the first time a large quantity of an unusual material (Li6D) will be used for the polarized target;
large-area trackers using “straw tubes” at large angles;
Micromegas developed at Saclay will cover the central part of the first spectrometer;
a small-area tracker of the “double GEM” type (CERN Courier December 1998 p19) will cover the central part of the second spectrometer;
the Cherenkov photons in the RICH will be detected with a large array (6 m2) of wire chambers with caesium iodide photocathodes, a new technique developed at CERN in the RD26 project.

Swallowing data

Such a voracious appetite for data influences the detectors, the data acquisition system and data storage and analysis.

COMPASS will be able to trigger 105 times per second and store 104 events per second, each typically 30 Kbytes, for a total data size of 300 Tbytes peryear at a mean acquisition rate of 35 Mbytes per second. Data will be sent via an optical link directly to CERN’s main computer centre using the Central Data Recording (CDR) facility pioneered for the NA48 CP violation experiment.

The estimated power needed to process COMPASS data is five times that of the already impressive supercomputer used by NA48.

Still, the quantity of data that COMPASS will handle is such that a host of new problems had to be faced and solved quickly for data-taking next year. Handling the stream of data propelled by the CDR system is a major challenge.

The estimated power needed to process COMPASS data is five times that of the already impressive supercomputer used by NA48. The analysis plan foresees processing the data at CERN, while almost all final physics analysis as well as most of the simulation will be done in the collaborating institutes. The performance of COMPASS computing and analysis will be a useful guide for the high-rate experiments at CERN’s LHC collider which are scheduled to begin operation in 2005.

New software tools

Fortran has had its day, and a move from top-down structured programming to object-oriented programming is in sight. In object-oriented analysis and design, software systems are modelled as collections of co-operating objects, treating individual objects as instances of a class within a hierarchy of classes.

compass2_9-99 — Prototype of the COMPASS computing farm. The PCs are mostly dual Pentium II 300 MHz running Windows NT 4.0 SP3. The data servers are DEC1200 and SUN 450 machines.

Compared with the well known “top-down structured” programming, object-oriented programming looks more abstract and moves the complexity of development to the first step the definition of classes and relations. On the other hand, object-oriented programming, with its encapsulation, polymorphism and inheritance features, helps the maintainability of elaborate software, particularly when many authors are involved.

Codes

For an experiment that will be active over many years, and with off-line computing having to keep pace with incoming data, a new off-line code was called for, with object-oriented compatibility.

Of the many programming languages that support object-oriented programming, only two are widespread: C++ and Java.

The main reason behind the C++ success is its backward compatibility with the C language which is used extensively for on-line and system applications. This also means that C libraries can easily be used within a C++ program. C++ limitations mainly come from the necessity to be compatible with C, also allowing structured top-down programming.

Java is a pure object-oriented programming language, but it is still under development with no compilers available. Right now, C++ looks like the best choice for an object-oriented programming language.

Data and databases

Handling extremely large data volumes and rates is the key feature of COMPASS. C++ provides only low-level access to disk files. In particular, there is no means of managing tape input/output or, in general, tertiary storage devices at the language level.

Database programs can provide extended disk input/output power, giving a consistent framework with many new functions. As well as plain data, the C++ language is capable of handling objects and object collections.

To take advantage of C++ means storing and retrieving structures (which can be very complicated and can evolve with time) in a transparent way and without adding too much complexity.

The most natural choice is object-oriented databases that use C++ language and can handle memory and disk transparently. CERN plays a leading role in the development and use of such systems in high-energy physics. The RD45 collaboration was set up five years ago and COMPASS (as well as the NA45 heavy-ion study at CERN and BaBar at SLAC, Stanford) are taking advantage of this work.

Given 10¹⁰ events per year and all of the associated complexity calls for:

minimal duplication of information;
seamless access to the data from different sources (events, calibration and alignment);
direct access to specific parts of the event information for some selected sample;
transparent data access from local and remote sites.

Currently the most promising candidate for these tasks is Objectivity/DB, and in 1997 COMPASS decided to use this commercial product to store all data for off-line analysis, keeping them under “federated” databases (consisting of separated files on different computers).

The internal structure of the database should allow easy access to the physical quantities needed in the analysis without external bookkeeping. One major advantage is the possibility of “tagging” events by physics properties. Users should thus be able to select subsets of data and access the full information.

The transparent navigation among the events and other analysis objects is very attractive, and Objectivity/DB promises to be able to handle a very large “federation” of database files.