Ultra-high performance distributed computing software is vital for a successful LHC physics programme. This presents a challenge and an opportunity, says Robert Eisenstein.
“May you live in interesting times,” says the old Chinese proverb, and we surely do. We are at a time in history when many fundamental notions about science are changing rapidly and profoundly. Natural curiosity is blurring the old boundaries between fields: astronomy and physics are now one indivisible whole; the biochemical roots of biology drive the entire field; and for all sciences the computational aspects, for both data collection and simulation, are now indispensable.
Cheap, readily available, powerful computational capacity and other new technologies allow us to make incredibly fine-grained measurements, revealing details never observable before. We can simulate our detectors and basic physical processes at a level of precision that was unimaginable just a few years ago. This has led to an enormous increase in the demand for processor speed, data storage and fast networks, and it is now impossible to find at one location all the computational resources necessary to keep up with the data output and processing demands of a major experiment. With LEP, or at Fermilab, each experiment could still take care of its own computing needs, but that modality is not viable at full LHC design luminosities. This is true not only for high-energy physics, but for many other branches of experimental and theoretical science.
Thus the idea of distributed computing was born. It is not a new concept, and there are quite a few examples already in existence. However, applied to the LHC, it means that the success of any single large experiment now depends on the implementation of a highly sophisticated international computational “Grid”, capable of assembling and utilizing the necessary processing tools in a way that
is intended to be transparent to the user.
Many issues then naturally arise. How will these various “Grids” share the hardware fabric that they necessarily cohabit? How can efficiencies be achieved that optimize its use? How can we avoid needless recreations of software? How will the Grid provide security from wilful or accidental harm? How much will it cost to implement an initial Grid? What is a realistic timescale? How will all this be managed, and who is in charge?
It is clear that we have before us a task that requires significant advances in computer science, as well as a level of international co-operation that may be unprecedented in science. Substantial progress is needed over the next 5-7 years, or else there is a strong possibility that the use of full LHC luminosity will not be realized on the timescale foreseen. The event rates would simply be too high to be processed computationally.
Most of these things are known, at least in principle. In fact, there are national Grid efforts throughout Europe, North America and Asia, and there are small but significant “test grids” in high-energy physics already operating. The Global Grid Forum is an important medium for sharing what is known about this new computing modality. At CERN, the LHC Computing Grid Project working groups are hard at work with colleagues throughout the high-energy physics community, a principal task being to facilitate close collaboration between the LHC experiments to define common goals and solutions. The importance of doing this cannot be overstated.
As is often the case with high technology, it is hard to plan in detail because progress is so rapid. And creativity – long both a necessity and a source of pride in high-energy physics – must be preserved. Budgetary aspects and international complexities are also not simple. But these software systems must soon be operational at a level consistent with what the detectors will provide, in exactly in the same way as for other detector components. I believe it is time to depart from past practice and to begin treating software as a “deliverable” in the same way we do those other components. That means bringing to bear the concepts of modern project management: clear project definition and assignments; clear lines of responsibility; careful evaluations of resources needed; resource-loaded schedules with milestones; regular assessment and review; and detailed memoranda to establish who is doing what. Will things change en route? Absolutely. But as Eisenhower once put it: “Plans are useless, but planning is essential.”
Several people in the software community are concerned that such efforts might be counter-productive. But good project management incorporates all of the essential intangible factors that make for successful outcomes: respect for the individuals and groups involved; proper sharing of both the resources available and the credit due; a degree of flexibility and tolerance for change; and encouragement of creative solutions.
As has happened often before, high-energy physics is at the “bleeding edge” of an important technological advance – indeed, software is but one among many. One crucial difference today is the high public visibility of the LHC project and the worldwide attention being paid to Grid developments. There may well be no other scientific community capable of pulling this off, but in fact we have no choice. It is a difficult challenge, but also a golden opportunity. We must make the most of it!