The status of high-energy physics (HEP) information systems has been jointly analyzed by the libraries of CERN, DESY, Fermilab and SLAC. As a result, the four laboratories have started the INSPIRE project – a new platform built by moving the successful SPIRES features and content, curated at DESY, Fermilab and SLAC, into the open-source CDS-Invenio digital library software that was developed by the CERN document server team.

Here I explain how the project was born and how the INSPIRE system is being built.

Current HEP information systems
The different HEP information systems being actively used by physicists can be classified into three main categories:
• the community-based HEP scientific document servers: arXiv (LANL, now Cornell), SPIRES (SLAC, Fermilab, DESY), CDS (CERN), ADS (SAO/NASA, Harvard) and KISS (KEK);
• the publisher-driven scientific document servers, such as PROLA (APS) and ScienceDirect (Elsevier);
• the generalist services, such as Google Web, Google Scholar and other web search engines.

Community-based systems have obvious synergies and they have been collaborating regularly in the past, linking to each other, exchanging general information and metadata. About a year ago, CERN, DESY, Fermilab and SLAC decided to run a user poll jointly with the goal of better understanding the perceptions, behaviours and wishes of the end users of these information systems.

Advertised through online messages and posts to e-mail listboxes between 30 April and 11 June 2007, more than 2100 answers, corresponding to about 10% of the active HEP community, have been collected. Theorists (61%), experimental physicists (22%) and software engineers (6%) from all over the world (22% US, 10% Germany, 8% Italy, 7% UK, 5% CERN) have described their habits and needs in detail. With 83% of responses coming from those using HEP information systems several times per week (and 57% daily), the quality of the feedback has been outstanding.

The poll results [1] show that the community-based services are clearly predominant (91.4%), with SPIRES standing as the most used "first reflex" HEP information system (48.2%). Commercial services are very rarely used in the community (0.1%) and general web search engines (such as Google) are used to some extent, mainly by the younger generation (8% on average, but up to 20% for users with less than two years of experience in HEP). Of course, results returned by these generalist search engines were harvested from the community-based servers and document repositories.

Users have also given their preferences regarding existing functionalities, like access to full text and to citation information. They have given their wishlist of features that they would like to have in the coming years, such as Web 2.0 user-contributed tagging. It is interesting to find that a large proportion of users are willing to invest their time in such a community service.

From this comprehensive picture of the perceptions and needs of the end users of HEP information systems, the SPIRES collaboration, and the CERN Library and CDS, decided to investigate further how a closer collaboration could fully match the community expectations.

SPIRES and CDS-Invenio
The SPIRES-HEP database [2] started in 1974 and was based on SPIRES DBMS, using an IBM mainframe and command line interface. It is run today by SLAC, DESY and Fermilab. In the 1980s an e-mail interface was added and in the early 1990s it became the first US web server exposing deep web content of SPIRES-HEP. It was considered by Tim Berners-Lee as the "killer application" that showed what the Web could bring in the future.

High-quality metadata with human-proofed publication information, links to full text, author affiliations and much more has maintained the attractiveness of the service in the past years. The addition of citation services, such as the cite summary format, has provided physicists with a useful tool to follow up on the impact of documents with their peers. However, SPIRES now suffers from aging technology (SPIRES DBMS), which has resulted in scalability and maintenance issues.

The CERN Library has a history [3] of maintaining preprints since the 1950s. The CERN preprint server appeared on the web in 1993, and later it increased its scope to become the CDS in 2000, used at the same time as the interface to the CERN Library and as the CERN institutional repository (archiving multimedia, notes, conferences, etc). Two sister applications, a conference-management system (Indico) and digital library software (Invenio), were developed in parallel to improve the electronic archiving of the laboratory's assets. The Invenio package was released as open source in 2002 and started to be adopted by services within HEP (e.g. ILC) and outside HEP (e.g. HBZ), receiving valuable contributions from various communities (e.g. EPFL and UAB).

The top-quality metadata curation in SPIRES, combined with the highly performing and scalable software of CDS-Invenio, look like a perfect match to meet the user expectations that were expressed in the poll.

The birth of INSPIRE
In mid-May 2007 SLAC organized the first HEP/PPA Information Resource Summit, where the four laboratories (CERN, DESY, Fermilab and SLAC) decided to conduct a feasibility study of reproducing SPIRES data and features using CDS-Invenio software. The feasibility study ran until autumn 2007 and concluded positively.

A new phase then started where all partners joined forces to replicate SPIRES user-level functionalities in Invenio. Some 760,000 records were converted and loaded. Citation features were empowered and the specific SPIRES syntax was simulated to ensure that users would be able to continue searching as they were used to. The website and search result formats were also configured to resemble the existing SPIRES look and feel.

A year later, DESY organized the second HEP/PPA Information Resource Summit, where the main people from HEP information systems participated. The current research director of DESY and CERN director-general designate Rolf-Dieter Heuer explained his vision of a next-generation HEP information system [4], open access and data publishing in the coming LHC era. The INSPIRE project was announced [5], the new platform explained and a call for further collaborations with all willing parties was initiated to prepare the next phases of the project.

Next steps
After reproducing SPIRES user features, the INSPIRE collaboration will focus in the coming year on cataloguer-level functionalities. Record-editing interfaces, checking and maintenance tools, inputting and harvesting workflows, and record-enrichment tools based on knowledge bases will be treated. Efforts will focus on building strong native tools to enable libraries from the four institutes to share the data-curation workload. It will be challenging to set up an optimized distributed cataloguing environment with the goal of reaching a level of metadata quality as high as the SPIRES standard, and at the same time be able to eliminate the current duplication of work in this area.

In the longer term (2009 onwards), the INSPIRE project will deploy more advanced features, some of them already available in Invenio and others still to be developed. Collaborative tools, such as baskets, alerts and tagging, are ready but they require user authentication throughout the whole HEP community. A coherent solution for all users should be addressed by the INSPIRE partners. Community-shared author/experiment/institute and conference databases will also be considered, in close contact with journal publishers and conference organizers who are likely to have similar needs.

The entire corpus of HEP literature in INSPIRE will be opened to new applications investigating novel text- and data-mining technologies, such as extended citations networks, combined impact metrics, the indexing of plots and tables, open-access dissemination and more.

Conclusion
The SPIRES collaboration and the CDS-Invenio open-source community are joining forces to build INSPIRE, a new HEP information portal, which will integrate present databases and repositories to host the entire body of the HEP literature, aiming to become the reference HEP scientific information platform worldwide. It will empower scientists with new tools to discover and access the results most relevant to their research, enable novel text- and data-mining applications, and deploy new metrics to assess the impact of articles and authors. In addition, it will introduce the Web 2.0 paradigm of user-enriched content in the domain of sciences with community-based approaches to the peer-review process.

INSPIRE represents a natural evolution of scholarly communication built on successful community-based information systems, and it provides a vision for information management in other fields of science. Inspired by the needs of HEP, we hope that the INSPIRE project will be inspiring for other communities.

References
[1] A Gentil-Beccot et al. Information Resources in High-Energy Physics: Surveying the Present Landscape and Charting the Future Course ArXiv 08.04.2701.
[2] L Addis Brief and Biased History of Preprint and Database Activities at the SLAC Library, 1962-1994 www.slac.stanford.edu/spires/papers/history.html.
[3] A Pepe et al. CERN Document Server Software: the Integrated Digital Library CERN-OPEN-2005-018.
[4] R D Heuer et al. Innovation in Scholarly Communication: Vision and Projects from High-Energy Physics arXiv:0805.2789.
[5] Press release Interactions News Wire #37-08 DESY: High-Energy Physics Labs Join to Build a New Scientific Information System www.interactions.org/cms/?pid=1026243.