On the same day that the LHC’s first three-year physics run ended, CERN announced that its data centre had recorded more than 100 petabytes (PB) – 100 million gigabytes – of physics data.
Amassed over the past 20 years, the storing of this 100 PB – the equivalent of 700 years of full HD-quality video – has been a challenge. At CERN, the bulk of the data (about 88 PB) is archived on tape using the CERN Advanced Storage (CASTOR) system. The rest (13 PB) is stored on the EOS-disk pool system, which is optimized for fast analysis access by many concurrent users.
For the CASTOR system, eight robotic tape libraries are distributed across two buildings, with each tape library capable of containing up to 14,000 tape cartridges. CERN currently has around 52,000 tape cartridges with a capacity ranging from 1 terabyte (TB) to 5.5 TB each. For the EOS system, the data are stored on more than 17,000 disks attached to 800 disk servers.
Not all of the data are generated by LHC experiments. CERN’s IT Department hosts data from many other high-energy physics experiments at CERN, past and present, and is also a data centre for the Alpha Magnetic Spectrometer.
For both tape and disk, efficient data storage and access must be provided, and this involves identifying performance bottlenecks and understanding how users want to access the data. Tapes are checked regularly to make sure that they stay in good condition and are accessible to users. To optimize storage space, the complete archive is regularly migrated to the newest high-capacity tapes. Disk-based systems are replicated automatically after hard-disk failures and a scalable namespace enables fast concurrent access to millions of individual files.
The data centre will keep busy during the long shutdown of the whole accelerator complex, analysing data taken during the LHC’s first three-year run and preparing for the higher expected data flow when the accelerators and experiments start up again. An extension of the centre and the use of a remote data centre in Hungary will further increase the data centre’s capacity.