Oracle Recovery Manager
Oracle Database provides RMAN for backing up and restoring the database. RMAN enables you to back up, restore, and recover data files, control files, SPFILEs, and archived redo logs. You can run RMAN from the command line or you can use it from the Backup Manager in Enterprise Manager. In addition, RMAN is the recommended backup and recovery tool if you are using ASM. RMAN can use stored scripts, interactive scripts, or an interactive GUI front end. When using RMAN with your RAC database, use stored scripts to initiate the backup and recovery processes from the most appropriate node.
Create a snapshot control file in a location that exists on all your nodes. You can specify a cluster file system or a raw device destination for the location of your snapshot control file. This file is shared across all nodes in the cluster and must be accessible by all nodes in the cluster.
For recovery, you must ensure that each recovery node can access the archive log files from all instances or make the archived logs available to the recovering instance by copying them from another location.
Configuring RMAN Snapshot Control File Location
The snapshot control file is a copy of a database control file created in an operating system–specific location by RMAN. RMAN creates the snapshot control file so that it has a consistent version of a control file to use when either resynchronizing the recovery catalog or backing up the control file. You can also create a snapshot control file by entering the following at the RMAN prompt: DUPLICATE FROM ACTIVE. You can specify a cluster file system or ASM disk group destination for the location of your snapshot control file. This file is shared across all nodes in the cluster and must be accessible by all nodes in the cluster.
You can change the configured location of the snapshot control file. For example, on Linux and UNIX systems you can change the snapshot control file location by using the CONFIGURE SNAPSHOT CONTROLFILE NAME RMAN command. This command sets the configuration for the location of the snapshot control file for every instance of your cluster database. Therefore, ensure that the location specified exists on all nodes that perform backups. The CONFIGURE command creates persistent settings across RMAN sessions. Therefore, you do not need to run this command again unless you want to change the location of the snapshot control file. To delete a snapshot control file you must first change the snapshot control file location, then delete the file at the older location, as follows:
CONFIGURE SNAPSHOT CONTROLFILE NAME TO 'new_name'; DELETE COPY OF CONTROLFILE;
Determine the current location:
RMAN> SHOW SNAPSHOT CONTROLFILE NAME; '/u01/app/oracle/product/12.2.0/dbhome_1/dbs/snapcf_orcl_3.f'
You can use ASM or a shared file system location:
RMAN> CONFIGURE SNAPSHOT CONTROLFILE NAME TO '+FRA/SNAP/snap_prod.cf'; RMAN> CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/ocfs2/oradata/dbs/scf/snap_prod.cf';
Configuring Control File and SPFILE Autobackup
If you set CONFIGURE CONTROLFILE AUTOBACKUP to ON, RMAN automatically creates a control file and an SPFILE backup after you run the BACKUP or COPY command. RMAN can also automatically restore an SPFILE if this is required to start an instance to perform recovery. This means that the default location for the SPFILE must be available to all nodes in your RAC database.
RMAN> CONFIGURE CONTROLFILE AUTOBACKUP ON;
These features are important in disaster recovery because RMAN can restore the control file even without a recovery catalog. RMAN can restore an autobackup of the control file even after the loss of both the recovery catalog and the current control file.
You can change the default location that RMAN gives to this file with the CONFIGURE CONTROLFILE AUTOBACKUP FORMAT command. If you specify an absolute pathname in this command, this path must exist identically on all nodes that participate in backups.
RMAN> CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '+DATA';
RMAN performs the control file autobackup on the first allocated channel. Therefore, when you allocate multiple channels with different parameters, especially when you allocate a channel with the CONNECT command, determine which channel will perform the control file autobackup. Always allocate the channel for this node first. Besides using the RMAN control file, you can also use Oracle Enterprise Manager to use the RMAN features.
Crosschecking on Multiple RAC Clusters Nodes
When crosschecking on multiple RAC nodes, configure the cluster so that all backups can be accessed by every node, regardless of which node created the backup. When the cluster is configured this way, you can allocate channels at any node in the cluster during restore or crosscheck operations.
If you cannot configure the cluster so that each node can access all backups, then during restore and crosscheck operations, you must allocate channels on multiple nodes by providing the CONNECT option to the CONFIGURE CHANNEL command so that every backup can be accessed by at least one node. If some backups are not accessible during crosscheck because no channel was configured on the node that can access those backups, then those backups are marked EXPIRED in the RMAN repository after the crosscheck.
For example, you can use CONFIGURE CHANNEL … CONNECT in an Oracle RAC configuration in which tape backups are created on various nodes in the cluster and each backup is accessible only on the node on which it is created.
Channel Connections to Cluster Instances
When making backups in parallel, RMAN channels can connect to a different instance in the cluster. Instances to which the channels connect must be either all mounted or all open. The examples below illustrate two possible configurations:
1. If you want to dedicate channels to specific instances, you can control at which instance the channels are allocated by using separate connect strings for each channel configuration as shown by the first example.
CONFIGURE DEFAULT DEVICE TYPE TO sbt; CONFIGURE DEVICE TYPE sbt PARALLELISM 3; CONFIGURE CHANNEL 1 DEVICE TYPE sbt CONNECT='sys/rac@orcl_1'; CONFIGURE CHANNEL 2 DEVICE TYPE sbt CONNECT='sys/rac@orcl_2'; CONFIGURE CHANNEL 3 DEVICE TYPE sbt CONNECT='sys/rac@orcl_3';
2. If you define a special service for your backup and recovery jobs, you can use the second example shown in the slide. If you configure this service with load balancing turned on, then the channels are allocated at a node as decided by the load balancing algorithm.
CONFIGURE DEFAULT DEVICE TYPE TO sbt; CONFIGURE DEVICE TYPE sbt PARALLELISM 3; CONFIGURE CHANNEL DEVICE TYPE sbt CONNECT='sys/rac@bkp_serv';
During a backup, the instances to which the channels connect must be either all mounted or all open. For example, if the orcl_1 instance has the database mounted whereas the orcl_2 and orcl_3 instances have the database open, then the backup fails.
In some RAC database configurations, some cluster nodes have faster access to certain data files than to other data files. RMAN automatically detects this, which is known as node affinity awareness. When deciding which channel to use to back up a particular data file, RMAN gives preference to the nodes with faster access to the data files that you want to back up.
RMAN Channel Support for the Grid
RAC allows the use of nondeterministic connect strings that can connect to different instances based on RAC features, such as load balancing. Therefore, to support RAC, the RMAN polling mechanism no longer depends on deterministic connect strings and makes it possible to use RMAN with connect strings that are not bound to a specific instance in the Grid environment. Previously, if you wanted to use RMAN parallelism and spread a job between many instances, you had to manually allocate an RMAN channel for each instance. To use dynamic channel allocation, you do not need separate CONFIGURE CHANNEL CONNECT statements anymore. You only need to define your degree of parallelism by using a command such as CONFIGURE DEVICE TYPE disk PARALLELISM, and then run backup or restore commands.
CONFIGURE DEFAULT DEVICE TYPE TO sbt; CONFIGURE DEVICE TYPE sbt PARALLELISM 3;
RMAN then automatically connects to different instances and does the job in parallel. The Grid environment selects the instances that RMAN connects to, based on load balancing. As a result of this, configuring RMAN parallelism in a RAC environment becomes as simple as setting it up in a non-RAC environment. By configuring parallelism when backing up or recovering a RAC database, RMAN channels are dynamically allocated across all RAC instances.
RMAN Default Autolocation
Recovery Manager automatically discovers which nodes of a RAC configuration can access the files that you want to back up or restore. Recovery Manager autolocates the following files:
- Backup pieces during backup or restore
- Archived redo logs during backup
- Data file or control file copies during backup or restore
If you use a non-cluster file system local archiving scheme, a node can read only those archived redo logs that were generated by an instance on that node. RMAN never attempts to back up archived redo logs on a channel that it cannot read.
During a restore operation, RMAN automatically performs the autolocation of backups. A channel connected to a specific node attempts to restore only those files that were backed up to the node. For example, assume that log sequence 1001 is backed up to the drive attached to node 1, whereas log 1002 is backed up to the drive attached to node 2. If you then allocate channels that connect to each node, the channel connected to node 1 can restore log 1001 (but not 1002), and the channel connected to node 2 can restore log 1002 (but not 1001).
Distribution of Backups
When configuring the backup options for RAC, you have several possible configurations:
- Network backup server: A dedicated backup server performs and manages backups for the cluster and the cluster database. None of the nodes have local backup appliances.
- One local drive: One node has access to a local backup appliance and performs and manages backups for the cluster database. All nodes of the cluster should be on a cluster file system to be able to read all data files, archived redo logs, and SPFILEs. It is recommended that you do not use the noncluster file system archiving scheme if you have backup media on only one local drive.
- Multiple drives: Each node has access to a local backup appliance and can write to its own local backup media.
In the cluster file system scheme, any node can access all the data files, archived redo logs, and SPFILEs. In the noncluster file system scheme, you must write the backup script so that the backup is distributed to the correct drive and path for each node. For example, node 1 can back up the archived redo logs whose path names begin with /arc_dest_1, node 2 can back up the archived redo logs whose path names begin with /arc_dest_2, and node 3 can back up the archived redo logs whose path names begin with /arc_dest_3.
Managing Archived Redo Logs Using RMAN
When a node generates an archived redo log, Oracle Database always records the file name of the log in the control file of the target database. If you are using a recovery catalog, then RMAN also records the archived redo log file names in the recovery catalog when a resynchronization occurs.
The archived redo log naming scheme that you use is important because when a node writes to a log with a specific file name on its file system, the file must be readable by any node that must access this archived redo log. For example, if node1 archives a log to /oracle/arc_dest/log_1_100_23452345.arc, then node2 can back up this archived redo log only if it can read /oracle/arc_dest/log_1_100_23452345.arcon its own file system.
The backup and recovery strategy that you choose depends on how you configure the archiving destinations for each node. Whether only one node or all nodes perform archived redo log backups, you must ensure that all archived redo logs are backed up. If you use RMAN parallelism during recovery, then the node that performs recovery must have read access to all archived redo logs in your cluster.
Multiple nodes can restore archived logs in parallel. However, during recovery, only one node applies the archived logs. Therefore, the node that is performing the recovery must be able to access all of the archived logs that are needed for the recovery operation. By default, the database determines the optimum number of parallel threads to use during the recovery operation. You can use the PARALLEL clause in the RECOVER command to change the number of parallel threads.
Noncluster File System Local Archiving Scheme
When archiving locally to a noncluster file system, each node archives to a uniquely named local directory. If recovery is required, then you can configure the recovery node so that it can access directories on the other nodes remotely. For example, use NFS on Linux and UNIX computers, or mapped drives on Windows systems. Therefore, each node writes only to a local destination, but each node can also read archived redo log files in remote directories on the other nodes.
If you use noncluster file system local archiving for media recovery, then you must configure the node that is performing recovery for remote access to the other nodes so that it can read the archived redo log files in the archive directories on the other nodes. In addition, if you are performing recovery and you do not have all of the available archive logs, then you must perform incomplete recovery up to the first missing archived redo log sequence number. You do not have to use a specific configuration for this scheme. However, to distribute the backup processing onto multiple nodes, the easiest method is to configure channels.
Configuring Non-Cluster, Local Archiving
You can set the archiving destination values as follows in the initialization parameter file for either policymanaged or administrator-managed databases. Set the SID.LOG_ARCH_DEST parameter for each instance using the SID designator, as shown in the following example:
sid1.LOG_ARCHIVE_DEST_1="LOCATION=/arc_dest_1" sid2.LOG_ARCHIVE_DEST_1="LOCATION=/arc_dest_2" sid3.LOG_ARCHIVE_DEST_1="LOCATION=/arc_dest_3"
For policy-managed databases, manually create a node and instance binding to ensure that sid1 always runs on the same node, as follows:
$ srvctl modify instance -d mydb -n node1 -i sid1 $ srvctl modify instance -d mydb -n node2 -i sid2 $ srvctl modify instance -d mydb -n node3 -i sid3
The following list shows the possible archived redo log entries in the database control file. Note that any node can read archived redo logs from any of the threads, which must happen in order for the database to recover after a failure.
/arc_dest_1/log_1_1000_23435343.arc /arc_dest_2/log_1_1001_23452345.arc <- thread 1 archived in node2 /arc_dest_2/log_3_1563_23452345.arc <- thread 3 archived in node2 /arc_dest_1/log_2_753_23452345.arc <- thread 2 archived in node1 /arc_dest_2/log_2_754_23452345.arc /arc_dest_3/log_3_1564_23452345.arc
ASM and Cluster File System Archiving Scheme
The preferred configuration for Oracle RAC is to use ASM for a recovery area using a disk group for your recovery set that is different from the disk group used for your data files. When you use Oracle ASM, it uses an Oracle Managed Files naming format.
Alternatively, you can use a cluster file system archiving scheme. If you use a cluster file system, then each node writes to a single location on the cluster file system when archiving the redo log files. Each node can read the archived redo log files of the other nodes. For example, if Node 1 archives a redo log file to/arc_dest/log_1_100_23452345.arc on the cluster file system, then any other node in the cluster can also read this file.
If you do not use a cluster file system, then the archived redo log files cannot be on raw devices. This is because raw devices do not enable sequential writing of consecutive archive log files. The advantage of this scheme is that none of the nodes uses the network to archive logs. Because the file name written by a node can be read by any node in the cluster, RMAN can back up all logs from any node in the cluster. Backup and restore scripts are simplified because each node has access to all archived redo logs.
Configuring the CFS Archiving Scheme
In the cluster file system scheme, each node archives to a directory that is identified with the same name on all instances within the cluster database (/arc_dest, in the following example). To configure this directory, set values for the LOG_ARCH_DEST_1 parameter, as shown in the following example:
*.LOG_ARCHIVE_DEST_1="LOCATION=/arc_dest"
The following list shows archived redo log entry examples that would appear in the RMAN catalog or in the control file based on the previous example. Note that any node can archive logs using any of the threads:
/arc_dest/log_1_999_23452345.arc /arc_dest/log_1_1000_23435343.arc /arc_dest/log_1_1001_23452345.arc <- thread 1 archived in node3 /arc_dest/log_3_1563_23452345.arc <- thread 3 archived in node2 /arc_dest/log_2_753_23452345.arc <- thread 2 archived in node1 /arc_dest/log_2_754_23452345.arc /arc_dest/log_3_1564_23452345.arc
Because the file system is shared and because each node is writing its archived redo logs to the /arc_dest directory in the cluster file system, each node can read the logs written by itself and any other node.
Restoring and Recovering
Media recovery of a database that is accessed by RAC may require at least one archived log file for each thread. However, if a thread’s online redo log contains enough recovery information, restoring archived log files for any thread is unnecessary.
If you use RMAN for media recovery and you share archive log directories, you can change the destination of the automatic restoration of archive logs with the SET clause to restore the files to a local directory of the node where you begin recovery. If you backed up the archive logs from each node without using a central media management system, you must first restore all the log files from the remote nodes and move them to the host from which you will start recovery with RMAN. However, if you backed up each node’s log files using a central media management system, you can use RMAN’s AUTOLOCATE feature. This enables you to recover a database by using the local tape drive on the remote node.
If recovery reaches a time when an additional thread was enabled, the recovery process requests the archived log file for that thread. If you are using a backup control file, when all archive log files are exhausted, you may need to redirect the recovery process to the online redo log files to complete recovery. If recovery reaches a time when a thread was disabled, the process informs you that the log file for that thread is no longer needed.