Need for Global Concurrency Control
Latches and mutexes may only protect access to memory structures if they are accessed by processes in the same instance. In RAC, latches and mutexes are still used, but for global concurrency control, some additional global enqueues are used to provide protection across instances. Oracle requires concurrency control because it is a multi-user system. Single-instance Oracle provides concurrency control:
- Latches or mutexes for memory structures
- Enqueues for resource control
- Buffer cache pins for cache management
In RAC, structures and resources may be accessed by or modified by a session running on any database instance. RAC, therefore, requires additional global concurrency controls to mediate access across instances.
- Global locks control library and row cache access.
- Global enqueues control resource access.
- Cache fusion controls buffer cache access.
Global Resource Directory (GRD)
A master metadata structure contains information about the state of the related resource for each instance in which that resource resides. A shadow metadata structure only contains information about the state of the related resource in the instance containing the shadow metadata.
An object under global concurrency control is called a resource. Resource metadata is held in the Global Resource Directory (GRD). Global enqueue resources are used for enqueues and locks. Global cache resources are used for buffer cache control. The GRD is distributed among all active instances of each database or ASM environment. Each currently managed GRD resource has a master metadata structure and one or more shadow metadata structures. The GRD uses memory from the shared pool.
Global Resource Management
The resource mastering instance is the instance containing the master metadata used to manage concurrency control for a specific entity. Each instance will be the resource master for some of the database entities.
The resource shadowing instance is any instance containing shadow metadata used to manage concurrency control for a specific entity. Each instance will contain shadow resources for entities it has accessed and for which it is not the resource master.
After first access of a globally managed entity by any instance, a global resource is allocated. An internal algorithm is used to decide which instance should contain the master metadata structure for that entity. This instance is known as the resource master. The resource mastering instance may be any active instance of the database or ASM environment.
Subsequent access to an entity from another instance for which resource master metadata exists causes resource shadow metadata to be allocated in the requesting instance and updates to be done to the master metadata.
Global Resource Remastering
Remastering is the process of allocating control of the master metadata for a specific entity to another instance. When a new instance starts, remastering is not done immediately. Instead it is done gradually, based on which instances are accessing which resources (hence the term “lazy”). Instance-level or lazy remastering occurs when a new instance of the same database or ASM starts or when a current instance is shut down gracefully.
When an instance shuts down gracefully—meaning normal, immediate, or transactional—then resources mastered by the terminating instance are handed off to the surviving instances by using an optimized internal algorithm designed to minimize the remastering and subsequent concurrency control overheads. File affinity remastering occurs when requests to access blocks in a data file occur frequently from an instance, and the resource masters for the blocks are often held by other instances.
The decision to perform file-affinity or object-affinity remastering is made automatically when an internal threshold is reached. Starting with Oracle RAC 12c Release 2, Cache Fusion maintains an in-memory hash that tracks the blocks in the buffer cache and the service name used by the sessions to connect. Cache Fusion uses this hash for resource mastering optimization. Object-affinity remastering occurs when requests to access blocks in a data object occur frequently from an instance, and the resource masters for the blocks are often held by other instances.
Resource mastering of a resource cached in the buffer is only considered on the node where the service that the session used to access the resource is running. This results in improved performance as it eliminates the need for sending additional messages on the private network for resource change operations.
Global Resource Recovery
When one or more but not all instances fail, the failing instance(s) resource masters are lost. Any resource master that had a shadow in a surviving instance must be recovered. The surviving instances can rebuild resource master metadata for a specific resource, by aggregating details from surviving shadow metadata for the same resource.
Global locks and enqueue metadata are done first, followed by global cache metadata. The rebuilding results in each surviving instance mastering some of the recovered resource master metadata.
Enqueues are done first, because Oracle must know who has access to which resource in order for recovery to proceed. A look into the RAC database alert log shows global resource activity during instance recovery:
... Reconfiguration started (old inc 0, new inc 6) List of instances: 1 2 3 (myinst: 3) Global Resource Directory frozen * allocate domain 0, invalid = TRUE Communication channels reestablished * domain 0 valid = 0 according to instance 1 Master broadcasted resource hash value bitmaps Non-local Process blocks cleaned out LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Set master node info Submitted all remote-enqueue requests Dwn-cvts replayed, VALBLKs dubious All grantable enqueues granted Submitted all GCS remote-cache requests Fix write in gcs resources Reconfiguration complete (total time 0.5 secs) ...
Global Resource Background Processes
|ACMS||Atomic Control file to Memory Service|
|LMHB||Monitors LMON, LMD, and LMSn processes|
|LMD0||Requests global enqueues and instance locks|
|LMON||Issues heartbeats and performs recovery|
|LMSn||Processes global cache fusion requests|
|LCK0||Is involved in library and row cache locking|
|RCBG||Processes Global Result Cache invalidations|
- Atomic Control File to Memory Service (ACMS): In a RAC environment, the ACMS per-instance process is an agent that contributes to ensuring that a distributed SGA memory update is either globally committed on success or globally aborted if a failure occurs.
- Global Enqueue Service Monitor (LMON): The LMON process monitors global enqueues and resources across the cluster and performs global enqueue recovery operations.
- Global Enqueue Service Daemon (LMD): The LMD process manages incoming remote resource requests within each instance.
- Global Cache Service Process (LMS): The LMS process maintains records of the data file statuses and each cached block by recording information in the GRD. The LMS process also controls the flow of messages to remote instances and manages global data block access and transmits block images between the buffer caches of different instances. This processing is part of the cache fusion feature.
- Instance Enqueue Process (LCK0): The LCK0 process manages noncache fusion resource requests such as library and row cache requests.
- Global Cache/Enqueue Service Heartbeat Monitor (LMHB): LMHB monitors LMON, LMD, and LMSn processes to ensure that they are running normally without blocking or spinning.
- Result Cache Background Process (RCBG): This process is used for handling invalidation and other messages generated by server processes attached to other instances in Oracle RAC.
Global Resource Access Coordination
There are two types of global resource coordination. Global enqueue management, which is used for:
– Global enqueues.
– Global instance locks.
Global buffer cache, which is also known as cache fusion or global cache, is a logical buffer cache spanning all instances. It coordinates access to block images in the global cache and supports Parallel Query across the global cache.
Global enqueues are used to control access to resources, where the owner(s), waiter(s) if any, or converter(s) if any, or both, may be sessions in the same or different instances. Some global enqueues serve the same purpose they would serve in a single instance. For example, table manipulation (TM) enqueues, transaction enqueues (TX), control file enqueues (CF), high watermark enqueues (HW), sequence cache replenishment (SQ), and redo thread enqueues (RT) all serve the same purpose as they would in a single instance. However, there are master and shadow metadata structures as described earlier in this lesson in the GRD, and the mastering instance will keep track of the waiters and converters.
Instance locks are enqueues that represent resources in the row cache or library cache protected within each instance by pins, mutexes, or latches. For cross-instance concurrency control, an enqueue is used, the owner(s) of which is or are the instance(s) that is or are currently the “source of truth” with regard to the current state of that resource. The LCK0 process acts as the owner, waiter, or converter of the enqueue as a “proxy” process representing the instance. These enqueues are known as instance locks.
Processing starts in the requesting instance as follows:
- A global enqueue request is made by a session.
- The request is passed to LMD0 in the requesting instance.
- The foreground waits for the request on event.
- LMD0 determines the mastering instance.
- LMD0 forwards the request to the mastering instance if required.
- The mastering instance adds a new master resource if required. Process is made an owner, waiter, or converter as appropriate. Once the resource can be granted to the requestor, LMD0 in the mastering instance notifies LMD0 in the requesting instance.
- When the resource is available, the foreground is posted by LMD0 in the requesting instance
If requesting and mastering instances are the same, then LMD0 need not forward the request over the interconnect. LMD0 in the mastering instance notifies LMD0 in the requesting instance whether the resource is available to the requestor immediately. If a dequeue request is passed to the mastering instance, then LMD0 notifies the LMD0 processes for any waiters or converters that need resuming and they are posted by the LMD0 in their own instances.
Instance locks are used to represent which instance(s) has (have) control over an instance-wide structure:
- Row cache entries
- Library cache entries
- Result cache entries
The owner, waiter, or converter on an instance lock is the LCK0 process. As long as the local LCK0 process in an instance owns the lock on a specific resource, any session in that instance can use the cached metadata, because it is considered current. If the local instance does not own the lock, then a request must be made for the lock and the foreground waits on DFS Lock Handle wait event.
Instance locks use enqueue structures but the scope is different. LCK0 acts as the owner or waiter for such situations. The owner of an instance lock represents the instance having permission to access the related entity in the row cache or library cache. Assuming that an instance owns the instance lock, then the usual latches or pins or mutexes provide concurrency control within the instance as usual.
Global Cache Management: Overview
Global cache management provides:
- A concurrency mechanism for multiple buffer caches
- An optimization of block access for reads
- An optimization of writes for dirty buffers
- A mechanism to optimize parallel queries
Concurrency control handles situations where the same block has multiple images in the same or different buffer caches. Reduction in physical I/O is achieved by having a global view of buffer cache resources and potentially satisfying I/O requests from any instances cache, rather than reading from disk or writing the same buffer multiple times to disk from different buffer caches.
Parallel queries may result in caching parts of a table in each buffer cache and using cache fusion to avoid repeatedly doing direct reads from disk. This is known as “In Memory Parallel Query.” Parts of the table are cached in separate buffer caches, rather than having the blocks cached multiple times in different caches due to cache fusion block transfer. The parallel execution servers in the different instances serve results over the interconnect for the part of the table in their respective instance caches.
Global Cache Management Components
- The LMSn processes
- Buffer headers
- Global Cache Master Resources
- Global Cache Shadow Resources
Multiple LMS processes may be used by Oracle RAC depending on the workload size. There are buffer headers for each buffer in each buffer cache. There may be multiple block images for the same database block in the same or different buffer caches.
The mastering instance for a specific database block will have master metadata in the GRD, maintained by LMSn, describing all block images, specifying the instance and state of each image. The shadow metadata in the GRD, is maintained by LMSn in the same instance, containing information for each block in the buffer cache not mastered locally.
Global Cache Buffer States
Buffer states are visible in V$BH.STATUS. Important buffer states for cache fusion in V$BH.STATUS are:
- Shared Current: The buffer contains a block image that matches the one on disk. One or more instances may have images for the same block in the SCUR state. After an instance has one in this state, cache fusion is used if another instance reads the same block for read purposes.
- Exclusive Current: The buffer contains a block image that is about to be updated, or has been updated. It may or may not have been written by the database writer. Only one instance may have an XCUR image for a block.
- Consistent Read: The buffer contains a block image that is consistent with an earlier point in time. This image may have been created in the same way as in single-instance databases, but copying a block into an available buffer and using undo to roll back the changes to create the older image. It may also get created by converting a block image from SCUR or PI.
- Past Image: The buffer contains a block image that was XCUR but then shipped to another instance using cache fusion. A later image of this block now exists in another buffer cache. Once DBWn writes the later image to disk from the other instance, the PI image becomes a CR image. Multiple PI images may exist for same block in different buffer caches
Global Cache Management Scenarios for Single Block Reads
There are several scenarios for single block reads:
- Read from Disk: Occurs when an I/O request occurs for a block that has no image in any buffer cache
- Read – Reads: Occurs when a block image exists in at least one buffer cache in shared current state (SCUR), and another instance wishes to access the block for read
- Read – Write: Occurs when a block image exists in at least one buffer cache in shared current state (SCUR), and another instance wishes to access the block for update (XCUR)
- Write – Write: Occurs when a block image exists in one buffer cache in exclusive current state (XCUR), and another instance wishes to access the same block for write in exclusive current state (XCUR)
- Write – Read: Occurs when a block image exists in one buffer cache in exclusive current state (XCUR), and another instance wishes to access the block for read. The instance doing the read may get it in CR or in SCUR as will be described later.
- Write to Disk: Occurs when DBWn writes a dirty buffer to disk. If the block was modified in multiple instances, then only the latest image will be written. This image will be (XCUR). All the older dirty images for the same block will be past images (PI).
Global Cache Management Scenarios for Multi-Block Reads
When multi-block read requests occur:
- The instance doing the I/O must acquire resources for each block in the correct state
- LMSn coordination from the requesting instance to the LMSn on the mastering instance(s) happens
- Different blocks in the same multi-block read may have different mastering instances
- Dynamic remastering, described earlier, may help reduce the performance overhead
There are several scenarios for multi-block reads:
- No resource masters exist for any block.
- Resource masters for some block(s), all are SCUR
- Resource masters for some block(s), some are XCUR
No resource masters for any block in a particular multi-block read request: In this case, a request is made to the specific mastering instance for each block in the multi-block read and after being granted permission by LMSn, the server process does the multi-block read from disk.
Resource masters exist for at least one block in a particular multi-block read request, but it or they are Shared Current (SCUR): This means that the block has not been modified. In this case, a request is made to the specific mastering instance for each block in the multi-block read and, after being granted, the processing reads from disk.
Resource Masters exist for at least one block in a particular multi-block read request, but at least one is Exclusive Current (XCUR) and, therefore, a newer version may exist in a buffer cache than on disk. In this case, a request is made to the specific mastering instance for each block in the multi-block read and, after being granted, the XCUR images are transferred by cache fusion, as described earlier, and the remaining images are read from disk in smaller multi-block reads.
Useful Global Resource Management Views
Below are some useful Global Resource Management Views:
GV$SESSION_WAIT GV$SYSSTAT GV$GES_STATISTICS V$RESOURCE_LIMIT V$BH GV$LOCK V$CR_BLOCK_SERVER V$CURRENT_BLOCK_SERVER V$INSTANCE_CACHE_TRANSFER V$DYNAMIC_REMASTER_STATS GV$RESULT_CACHE_STATS V$GCSPFMASTER_INFO