Managing data in a distributed and heterogeneous Grid environment is one of the most challenging tasks of Large Hadron Collider (LHC) computing, in terms of both developing the required software and deploying the underlying services.

Relational database management systems (databases) play a central role, especially for conditions and event metadata, because they can provide consistent storage to many concurrent users.

Given the complexity of today's database systems, it is often difficult for users to exploit the underlying services in the most efficient way. However, many of the required optimizations can be delegated to an intermediate software layer if both the main physics use cases and service constraints are taken into account early on in its design.

LCG persistency framework

The persistency framework for the LHC Computing Grid (LCG), which is being developed jointly between the LHC experiments and the IT/PSS group, aims to provide such a software layer. Its purpose is to decouple the user code from the features of any particular database implementation.

The project started in 2002 in the LCG Applications Area, driven by the requirements of its users (ATLAS, CMS and LHCb experiments). Project priorities are set with the experiment representatives in the LCG Architects Forum, and experiment developers contribute actively to the software implementation. Development is also tightly coupled to service constraints, as the software is developed in close contact with the IT/PSS physics database service team and the LCG Distributed Deployment of Databases (3D) project (led by IT/PSS).

The persistency framework project focused initially on the development of POOL, a hybrid store based on object streaming into ROOT files and metadata storage into databases. More recently the scope of the project was extended to provide a generic database access layer (CORAL) and a specialized component for storing and looking up conditions data (COOL).

Accessing databases with CORAL

Database access for all persistency framework components proceeds via the CORAL (COmmon Relational Abstraction Layer) package. CORAL is also being used in production by several LHC experiments directly from their offline and online applications.

CORAL provides a set of C++ interfaces that are independent of the database implementation and therefore enable the same code to be used against a variety of database systems. At the moment Oracle, MySQL, SQLite and FroNTier (a Web-based database caching package) are supported via plug-in libraries that can be loaded at application runtime.

Support for several database implementations is important, not only to minimize the risk of technology binding but also to cover the available database deployment infrastructure across LCG sites. The experiment deployment models foresee the use of Oracle at Tier-0 and Tier-1, and the use of MySQL, SQLite or FroNTier at other Tiers. More details will be available in a forthcoming CNL article about the LCG 3D project.

To exploit distributed database resources that are becoming available via the LCG 3D project, CORAL provides secure database authentication and indirection, including retrial/failover across multiple database replicas. CORAL resolves user-defined logical database names into physical connections to database servers that are now available. In the event of network or service problems, CORAL connections will failover to the next available database replica, if necessary at a different site.

CORAL implements several database access optimizations directly (such as row prefetching and the efficient use of server-side cursors) and significantly decreases the user effort to implement others (bind variables and bulk DML operations). These single-client optimizations are complemented by a connection pool that minimizes the number of concurrent server connections from larger applications with several database components and improves access to the database.

POOL: object storage in databases

The POOL hybrid store functionality is now well integrated in the software frameworks of ATLAS, CMS and LHCb, and has been successfully tested in several large-scale data challenges using object storage in ROOT files.

Building on CORAL, POOL was recently generalized to store arbitrary C++ objects in any of the CORAL-supported database systems. This is particularly useful for calibration and configuration data, which cannot easily be managed in files. With this mechanism objects are decomposed according to their C++ type and stored as rows in relational tables. A set of customizable mapping rules enables the user to steer the automated table generation and to control the mapping of C++ data types to their relational counterpart.

COOL handles conditions databases

The COOL (LCG Conditions Database) package provides a software infrastructure for managing conditions data, focusing on the issue of their time variation and versioning.

The development of COOL began at the end of 2004 to replace several disjointed packages previously developed for MySQL and Oracle. COOL still shares their basic data model for conditions data but is now based on a single code implementation and the same relational schema for all supported back-ends, thanks to the use of CORAL.

In COOL, measured or calculated detector conditions (such as detector temperatures and alignment parameters) are associated with an interval of validity, the time range to which the stored conditions apply. Groups of similar condition items can be organized in a hierarchical structure similar to a file system. Multiple versions of condition data can be maintained (for example, originating from alternative alignment methods) and can be referred to by tag names (similar to release tags in the CVS code management system).

The COOL software provides a high-level C++ interface to store and retrieve the data according to the most important physics use cases. It takes over most of the physical management of database tables and the creation of indices for fast data access, enabling users to focus on the definition of the experiment conditions and their logical structure rather than on database access optimization. COOL enables users to store data either directly inside the database tables or to maintain references to data stored externally (such as XML or POOL files or other databases), depending on data volume and the experiment deployment model.

COOL is today the baseline conditions data implementation for the ATLAS and LHCb experiments. Although COOL is still being optimized, its performance already matches some of the experiment requirements. Sustained data rates over 20 MB/s and 20 k rows/s have been observed for retrieval from an Oracle RAC cluster database.

Summary

With the introduction of the CORAL and COOL packages alongside POOL, the LCG persistency framework is now providing storage functionality for all major physics data types as a consistent set of layered components. Designing these components in close contact with the experiments and the database service providers at CERN and other LCG tier sites will make it possible to successfully deploy both software and services for the start-up of the LHC.

Further information

• POOL (the LCG Persistency Framework): http://pool.cern.ch;
• CORAL (COmmon Relational Abstraction Layer): http:// pool.cern.ch/coral;
• COOL (LCG Conditions Database): http://cern.ch/cool.