Operating systems use lock managers to organise and serialise the access to resources. A distributed lock manager (DLM) runs in every machine in a cluster, with an identical copy of a cluster-wide lock database. In this way a DLM provides software applications which are distributed across a cluster on multiple machines with a means to synchronize their accesses to shared resources.
DLMs have been used as the foundation for several successful clustered file systems, in which the machines in a cluster can use each other's storage via a unified file system, with significant advantages for performance and availability. The main performance benefit comes from solving the problem of disk cache coherency between participating computers. The DLM is used not only for file locking but also for coordination of all disk access. VMScluster, the first clustering system to come into widespread use, relied on the OpenVMS DLM in just this way.
The DLM uses a generalized concept of a resource, which is some entity to which shared access must be controlled. This can relate to a file, a record, an area of shared memory, or anything else that the application designer chooses. A hierarchy of resources may be defined, so that a number of levels of locking can be implemented. For instance, a hypothetical database might define a resource hierarchy as follows:
A process can then acquire locks on the database as a whole, and then on particular parts of the database. A lock must be obtained on a parent resource before a subordinate resource can be locked.
A process running within a VMSCluster may obtain a lock on a resource. There are six lock modes that can be granted, and these determine the level of exclusivity being granted, it is possible to convert the lock to a higher or lower level of lock mode. When all processes have unlocked a resource, the system's information about the resource is destroyed.
The following truth table shows the compatibility of each lock mode with the others:
Mode | NL | CR | CW | PR | PW | EX |
---|---|---|---|---|---|---|
NL | Yes | Yes | Yes | Yes | Yes | Yes |
CR | Yes | Yes | Yes | Yes | Yes | No |
CW | Yes | Yes | Yes | No | No | No |
PR | Yes | Yes | No | Yes | No | No |
PW | Yes | Yes | No | No | No | No |
EX | Yes | No | No | No | No | No |
A process can obtain a lock on a resource by enqueueing a lock request. This is similar to the QIO technique that is used to perform I/O. The enqueue lock request can either complete synchronously, in which case the process waits until the lock is granted, or asynchronously, in which case an AST occurs when the lock has been obtained.
It is also possible to establish a blocking AST, which is triggered when a process has obtained a lock that is preventing access to the resource by another process. The original process can then optionally take action to allow the other access (e.g. by demoting or releasing the lock).
A lock value block is associated with each resource. This can be read by any process that has obtained a lock on the resource (other than a null lock) and can be updated by a process that has obtained a protected update or exclusive lock on it.
It can be used to hold any information about the resource that the application designer chooses. A typical use is to hold a version number of the resource. Each time the associated entity (e.g. a database record) is updated, the holder of the lock increments the lock value block. When another process wishes to read the resource, it obtains the appropriate lock and compares the current lock value with the value it had last time the process locked the resource. If the value is the same, the process knows that the associated entity has not been updated since last time it read it, and therefore it is unnecessary to read it again. Hence, this technique can be used to implement various types of cache in a database or similar application.
When one or more processes have obtained locks on resources, it is possible to produce a situation where each is preventing another from obtaining a lock, and none of them can proceed. This is known as a deadlock (E. W. Dijkstra originally called it a deadly embrace).[1]
A simple example is when Process 1 has obtained an exclusive lock on Resource A, and Process 2 has obtained an exclusive lock on Resource B. If Process 1 then tries to lock Resource B, it will have to wait for Process 2 to release it. But if Process 2 then tries to lock Resource A, both processes will wait forever for each other.
The OpenVMS DLM periodically checks for deadlock situations. In the example above, the second lock enqueue request of one of the processes would return with a deadlock status. It would then be up to this process to take action to resolve the deadlock—in this case by releasing the first lock it obtained.
Both Red Hat and Oracle have developed clustering software for Linux.
OCFS2, the Oracle Cluster File System was added[2] to the official Linux kernel with version 2.6.16, in January 2006. The alpha-quality code warning on OCFS2 was removed in 2.6.19.
Red Hat's cluster software, including their DLM and GFS2 was officially added to the Linux kernel [3] with version 2.6.19, in November 2006.
Both systems use a DLM modeled on the venerable VMS DLM.[4] Oracle's DLM has a simpler API. (the core function, dlmlock()
, has eight parameters, whereas the VMS SYS$ENQ
service and Red Hat's dlm_lock
both have 11.)
Other DLM implementations include the following: