CERN users working with the NICE/Windows Services might have heard of "DFS storage", and they probably even use it without realizing. So where does the "G:" drive come from?
What is the Distributed File System?
The G: drive that is available on all centrally managed CERN Windows machines is in fact a view of data stored on a dozen data servers, generated by a specific service called Distributed File System (DFS).
DFS is a service that provides a single point of reference and a logical tree structure for file system resources that may be physically located anywhere on the network. This single point of reference at CERN is represented by the path \\cern.ch\dfs and is automatically mounted as the G: drive on all Windows machines by the central management system.
Using DFS to share resources across the network facilitates the navigation to the data, as everything is seen as a single drive. In addition, an access control system is provided by the built-in Windows "File Security and Access Rights" methods available from any Windows Explorer and also from http://cern.ch/winservices.
What is the current DFS infrastructure at CERN?
At CERN, all of the information hosted on DFS is accessible either via the G: drive or via the path \\cern.ch\dfs. Data accessible through DFS are stored on several servers that are backed up every night onto magnetic tapes.
The new DFS infrastructure is built on 10 servers with disk arrays containing 15 data disks of 1 TB each.
• Six servers are dedicated to DFS Workspaces, Experiments and Departmental data (the corresponding folders are called Workspaces, Departments, etc).
• Four servers are dedicated to Home Directories; the corresponding folder is \\cern.ch\dfs\Users and is directly accessible through the "My Documents" Windows shortcut as well as through the normal G: drive.
Some other servers are also used to store DFS data:
• 10 servers are dedicated to specific data, such as the Media Archive that hosts the official CERN video files (six servers), some specific experiments and services needing to host a large amount of data.
What is new in the DFS infrastructure?
Recently, while the periodic hardware renewal was being handled, a change in the infrastructure was introduced: the Cross-Backup architecture.
The idea of Cross-Backup is to group the file servers in pairs and mirror all of the data on each server pair. The required disk space is of course doubled, but the price of storage has dramatically decreased in recent years, allowing this architecture to be set up at a low cost.
For each server member of a pair, the idea is to split the disk space into two partitions: an active partition and a mirror partition. The mirror partition's goal is to be the backup of the active partition for the other server in the pair.
The backup activity takes place at different schedules depending on the type of server and the data hosted on the active partition. Large partitions containing mostly archived data are backed up every night, whereas Home Directories containing daily work and small files are often backed up in real time.
Figure 1 shows the Cross-Backup architecture concept. This new architecture means that:
• The mirror partition can be backed up onto tapes at any time without overloading the access to the active partition.
• An online backup-recovery mechanism can be provided from the mirror partition.
• There is faster service recovery in case of a disaster: if a server fails to respond, a swap of the DFS link to the mirror partition allows a full recovery in just a few minutes.
This new architecture has been in production since April and provides a better quality of service through the use of these mirror partitions. Data restoration and disaster recovery can be handled with greater efficiency. In case of data loss, a simple call to the Helpdesk (tel 78888) will launch the restoration procedure and the data will be recovered quicker than before.
Useful link
DFS services: https://cern.ch/winservices/Services/DFS/Default.aspx