• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer navigation

The Geek Diary

  • OS
    • Linux
    • CentOS/RHEL
    • Solaris
    • Oracle Linux
    • VCS
  • Interview Questions
  • Database
    • oracle
    • oracle 12c
    • ASM
    • mysql
    • MariaDB
  • DevOps
    • Docker
    • Shell Scripting
  • Big Data
    • Hadoop
    • Cloudera
    • Hortonworks HDP

Oracle RMAN: Monitoring Recovery Manager Jobs

by admin

Sometimes it is useful to identify what a server session performing a backup or copy operation is doing. You have access to several views that can assist in monitoring the progress of or obtaining information about RMAN jobs:

View Description
V$PROCESS Identifies currently active processes.
V$SESSION Identifies currently active sessions. Use this view to determine which Oracle database server sessions correspond to which RMAN allocated channels.
V$SESSION_LONGOPS Provides progress reports on long-running operations.
V$SESSION_WAIT Lists the events or resources for which sessions are waiting.

Correlating Server Sessions with Channels

To identify which server sessions correspond to which RMAN channels, use the set command with the command id parameter. The command id parameter enters the specified string into the CLIENT_INFO column of the V$SESSION dynamic performance view. Join V$SESSION with V$PROCESS to correlate the server session with the channel.

The CLIENT_INFO column of V$SESSION contains information for each Recovery Manager server session. The data appears in one of the following formats:

  • id=string – This form appears for the first connection to the target database established by RMAN.
  • id=string, ch=channel_id – This form appears for all allocated channels.

The SPID column of V$PROCESS identifies the operating system process number.

To correlate a process with a channel during a backup:

1. Start RMAN and connect to the target database and, optionally, the recovery catalog database. For example, enter:

% rman target / catalog rman/rman@rcat

2. Set the command id parameter after allocating the channels and then back up the desired object. For example, enter:

RMAN> run {
        allocate channel t1 type disk;
        allocate channel t2 type disk;
        set command id to 'rman';

        backup 
          incremental level 0
          filesperset 5
          tablespace 'SYSTEM';
      # optionally, issue a host command to access the operating system prompt
        host;
        sql 'ALTER SYSTEM ARCHIVE LOG ALL';
   }

3.Start a SQL*Plus session and then query the joined V$SESSION and V$PROCESS views while the RMAN job is executing. For example, enter:

SQL> SELECT sid, spid, client_info 
     FROM v$process p, v$session s 
     WHERE p.addr = s.paddr 
      AND client_info LIKE '%id=rman%';

   SID        SPID      CLIENT_INFO  
   ---------- --------- ---------------
   8          21973     id=rman  
   16         22057     id=rman    
   17         22068     id=rman,ch=t1 
   18         22070     id=rman,ch=t2

Monitoring Job Progress

Monitor the progress of backups, copies, and restores by querying the view V$SESSION_LONGOPS.

Each server session performing a backup, restore, or copy reports its progress compared to the total amount of work required for that particular part of the restore. For example, if you perform a restore using two channels, and each channel has two backup sets to restore (a total of 4 sets), then each server session reports its progress through a single set. When that set is completely restored, RMAN starts reporting progress on the next set to restore.

1. Start RMAN and connect to the target database and, optionally, the recovery catalog database. For example, enter:

% rman target / catalog rman/rman@rcat

2. Start an RMAN job. For example, enter:

RMAN> run {
        allocate channel t1 type disk;
        backup database;
      }

3. While the job is running, execute a script containing the following SQL statement:

SQL> SELECT sid, serial#, context, sofar, totalwork,
            round(sofar/totalwork*100,2) "% Complete"
     FROM v$session_longops
     WHERE opname LIKE 'RMAN%'
     AND opname NOT LIKE '%aggregate%'
     AND totalwork != 0
     AND sofar <> totalwork
    /

If you repeat the query while the backup progresses, then you see output such as the following:

SQL> @longops
       SID    SERIAL#    CONTEXT      SOFAR  TOTALWORK % Complete
---------- ---------- ---------- ---------- ---------- ----------
         8         19          1      10377      36617      28.34

SQL> @longops
       SID    SERIAL#    CONTEXT      SOFAR  TOTALWORK % Complete
---------- ---------- ---------- ---------- ---------- ----------
         8         19          1      21513      36617      58.75

SQL> @longops
       SID    SERIAL#    CONTEXT      SOFAR  TOTALWORK % Complete
---------- ---------- ---------- ---------- ---------- ----------
         8         19          1      29641      36617      80.95

SQL> @longops
       SID    SERIAL#    CONTEXT      SOFAR  TOTALWORK % Complete
---------- ---------- ---------- ---------- ---------- ----------
         8         19          1      35849      36617       97.9

SQL> @longops
no rows selected

4. If you run the script at intervals of two minutes or more and the % Complete column does not increase, then RMAN is encountering a problem. Query V$SESSION_WAIT to determine which events are being waited for. For example, enter:

SQL> SELECT sid, seconds_in_wait AS sec_wait, event FROM v$session_wait
     WHERE wait_time = 0
     ORDER BY sid;

       SID   SEC_WAIT EVENT
---------- ---------- -----------------------------------------------
         1  368383335 pmon timer
         2       1097 rdbms ipc message
         3     387928 rdbms ipc message
         4          0 rdbms ipc message
         5       1408 smon timer
         6     386114 rdbms ipc message
         7     387626 rdbms ipc message
         8       1060 SQL*Net message from client
         9       1060 SQL*Net message from client
        12       1060 SQL*Net message from client
        13       2366 SQL*Net message from client
        14       2757 SQL*Net message from client
12 rows selected.
Note: The V$SESSION_WAIT view shows only Oracle events, not media manager events.

5. If you do not want to run the sqlplus queries and wish to see the percentage of completion as the job is running you can add “debug io” to your backup or script. This will produce output to stdio to show the job’s progress.

RMAN> run {
         allocate channel t1 type disk;
         debug io;
         backup database;
         debug off;
        }

You can change the option to “debug on” which is equivalent to “debug all” to get even more detail about the SQL and PL/SQL being executed by RMAN.

Monitoring Job Performance

Monitor backup and restore performance by querying V$BACKUP_SYNC and V$BACKUP_ASYNC_IO. The following rows will exist for a backup or restore:

  • One row for each datafile.
  • One aggregate datafile row.
  • One row for each backup piece.

Whether or not the I/O is synchronous depends on how it is viewed by the controlling process. Asynchronous I/O can be provided either by native platform services or by I/O slaves.

With synchronous I/O, it is difficult to identify specific bottlenecks because all synchronous I/O is a bottleneck to the process. Compare the bytes-per-second rate with the device’s maximum throughput rate. If the bytes-per-second rate is lower than that device specifies, consider tuning that part of the backup/restore process.

1. Mb/s = Min(disk Mb/s, tape Mb/s)

2. Backup speed is a tradeoff to restore speed. The faster the backup the slower the restore can be. When not all the datafiles are restored.

3. Disk Mb/s – Read from physical disks during backup until all controllers saturate. The controller saturates as more disks are read: 1 disk 3 Mb/s, 2 disks 5 Mb/s, 3 disks 6 Mb/s, 4 disks 6.2 Mb/s, 5 disks 6.2 Mb/s, 6 disks 6.2 Mb/s; nothing is gained by using more than 5 disks.

Performance Rules

Two ways to spin more than one disk.

a. Multiplex many datafiles into a backup set (FILESPERSET) Worst case scenario: Restoring one datafile out of N. Will need to read all the backup set. Example: A 10Gb backupset is created in an hour (10Gb/hour) with 10 datafiles. If restoring only one datafile, the perceived restore output would be 1Gb/hour.

b. Use logical disks that are interlaced into several physical disks. One read/write from one logical disk, spins many physical drives. If on restore the logical disks do not spin the same number of disks, the disk might be the bottleneck on restore.

4. Tape Mb/s – The more tapes, the faster the backup and restore, so buy more tapes. The faster the tape drives (Redwood?s: 20 Mb/s, DLT7000: 3 Mb/s) the faster the backup. Allocate ONE channel per physical device. If more channels than physical drives are used, then the backup sets will be intermingled (similar to multiplexing datafiles).

Bottlenecks with Async IO

When using synchronous I/O, you can easily determine how much time backup jobs require because devices only perform one I/O task at a time. When using asynchronous I/O, it is more difficult to measure the bytes-per-second rate, for the following reasons:

  • Asynchronous processing implies that more than one task occurs at a time.
  • Oracle I/O uses polling rather than an interrupt mechanism to determine when each I/O request completes.
  • Because the backup or restore process is not immediately notified of I/O completion by the operating system, you cannot determine the duration of each I/O.

If LONG_WAITS+SHORT_WAITS is a significant fraction of IO_COUNT, then the file is probably a bottleneck.

Some platforms implementation of AIO can cause the caller to wait for I/O completion when performing a non-blocking poll for I/O. Waits times should zero to avoid bottlenecks.

1. LONG_WAITS= # of times the backup/restore process told the O/S to wait until an I/O was complete.
2. SHORT_WAITS= # of times the backup/restore process made an O/S call to poll for I/O completion in a non-blocking mode.

Note: If the SHORT_WAIT_TIME_TOTAL is low compared to the total time for this file, then the delay is most likely caused by other factors, such as process swapping. If possible, tune your operating system so the I/O wait time appears in the LONG_WAIT_TIME_TOTAL column.

Using Fixed Views to Monitor Backup Operations

Use V$BACKUP_SYNC_IO and V$BACKUP_ASYNC_IO to determine the source of backup or restore bottlenecks and to determine the progress of backup jobs. V$BACKUP_SYNC_IO contains rows when the I/O is synchronous to the process (or thread, on some platforms) performing the backup. V$BACKUP_ASYNC_IO contains rows when the I/O is asynchronous. Asynchronous I/O is obtained either with I/O processes or because it is supported by the underlying operating system.

Filed Under: oracle, RMAN

Some more articles you might also be interested in …

  1. How To Find When The Spfile Was Created On Linux Server
  2. How to Disable Oracle Net Tracing on a Server Process without Stopping the Process
  3. Script To Find Redolog Switch History And Find Archivelog Size For Each Instances In Oracle RAC
  4. Archived Redo File Conventions in Oracle RAC
  5. ORA-00904: invalid identifier
  6. Difference between SQL and SQL*Plus Statements
  7. How to Optimize a Data Guard Configuration
  8. How To Automate The Opening Of Pluggable Databases After The CDB Starts Up in Oracle 12c
  9. Running RMAN DUPLICATE / RESTORE on a different version than source database version.
  10. Basics of Materialized Views in Oracle

You May Also Like

Primary Sidebar

Recent Posts

  • pw-cat Command Examples in Linux
  • pvs: command not found
  • pulseaudio: command not found
  • pulseaudio Command Examples in Linux

© 2023 · The Geek Diary

  • Archives
  • Contact Us
  • Copyright