This post discusses the New 11g ASM feature called ASM Fast Mirror Resync. Also, an example is taken to show how this works. We will simulate the transient disk failure and recover the disk before disk repair time.
ASM Fast Mirror Resync
ASM fast resync keeps track of pending changes to extents on an OFFLINE disk during an outage. The extents are resynced when the disk is brought back online or replaced.
By default, ASM drops a disk shortly after it is taken offline. You can set the DISK_REPAIR_TIME attribute to prevent this operation by specifying a time interval to repair the disk and bring it back online. The default DISK_REPAIR_TIME attribute value of 3.6h should be adequate for most environments.
The elapsed time (since the disk was set to OFFLINE mode) is incremented only when the disk group containing the offline disks is mounted. The REPAIR_TIMER column of V$ASM_DISK shows the amount of time left (in seconds) before an offline disk is dropped. After the specified time has elapsed, ASM drops the disk.
You can override this attribute with an ALTER DISKGROUP DISK OFFLINE statement and the DROP AFTER clause. If an ALTER DISKGROUP SET ATTRIBUTE DISK_REPAIR_TIME is issued on a disk group that has disks that are currently offline, the new attribute value applies only to those disks that are not currently in OFFLINE mode.
A disk that is in OFFLINE mode cannot be dropped with an ALTER DISKGROUP DROP DISK statement; an error is returned if attempted. If for some reason the disk needs to be dropped (such as the disk cannot be repaired) before the repair time has expired, a disk can be dropped immediately by issuing a second OFFLINE statement with a DROP AFTER clause specifying 0h or 0m.
You can use ALTER DISKGROUP to set the DISK_REPAIR_TIME attribute to a specified hour or minute value, such as 4.5 hours or 270 minutes. For example:
alter diskgroup dg set attribute 'disk_repair_time' = '4.5h' alter diskgroup dg set attribute 'disk_repair_time' = '270m'
After you repair the disk, run the SQL statement ALTER DISKGROUP DISK ONLINE. This statement brings a repaired disk group back online to enable writes so that no new writes are missed. This statement also starts a procedure to copy of all of the extents that are marked as stale on their redundant copies.
If a disk goes offline when the ASM instance is in rolling upgrade mode, the disk remains offline until the rolling upgrade has ended and the timer for dropping the disk is stopped until the ASM cluster is out of rolling upgrade mode.
Please find below example in which we will simulate the transient disk failure and recover the disk before disk repair time.
1. Create a diskgroup named dgnm11gasm using the raw disks /dev/raw/raw1 and /dev/raw/raw2.
SQL> create diskgroup dgnm11gasm disk '/dev/raw/raw1','/dev/raw/raw2' attribute 'compatible.rdbms'='11.1','compatible.asm'='11.1'; Diskgroup created.
2. Verify the diskgroup name and disk group number:
SQL> select group_number,name from v$asm_diskgroup where group_number=1; GROUP_NUMBER NAME ------------ -------------------- 1 DGNM11GASM
3. Check the current value of the attribute disk_repair_time. As shown below default disk repair time is 3.6 hours.
SQL>select name,value from v$asm_attribute where group_number=1; NAME VALUE -------------------- -------------------- disk_repair_time 3.6h au_size 1048576 compatible.asm 188.8.131.52.0 compatible.rdbms 184.108.40.206.0
4. Create a test tablespace for our testing purpose using the diskgroup we just created above.
SQL> create tablespace test datafile '+DGNM11GASM' size 20m; Tablespace created.
5. Shutdown the DB Instance and dismount the ASM Diskgroup
SQL> alter diskgroup DGNM11GASM dismount; Diskgroup altered.
6. Change the permission of /dev/raw/raw1 to simulate the disk loss
# chown root.root /dev/raw/raw1 # ls -ltr /dev/raw/raw1 crw-rw---- 1 root root 162, 1 Jul 8 01:47 /dev/raw/raw1
You should get an error as shown below:
SQL> alter diskgroup dgnm11gasm mount; alter diskgroup dgnm11gasm mount * ERROR at line 1: ORA-15032: not all alterations performed ORA-15040: diskgroup is incomplete ORA-15042: ASM disk "0" is missing
7. With Oracle Database 11g, ASM will fail to mount a diskgroup if there are any missing disks or failgroups during mount. You need to mount the diskgroup with FORCE option.
Disk groups mounted with the FORCE option will have one or more disks offline if they were not available at the time of the mount.
SQL> alter diskgroup dgnm11gasm mount force; Diskgroup altered.
SQL>select path,name,repair_timer from v$asm_disk where group_number=1; PATH NAME REPAIR_TIMER --------------- -------------------- ------------ DGNM11GASM_0000 12960 /dev/raw/raw2 DGNM11GASM_0001 0
8. Disk groups mounted with the FORCE option will have one or more disks offline if they are not available at time of the mount. You must take corrective actions before DISK_REPAIR_TIME expires to restore those devices. Connect to DB Instance and add new datafile to the tablespace.
SQL> alter tablespace test add datafile '+DGNM11GASM' size 20m; Tablespace altered.
9. As there is only one disk available in the diskgroup (Normal redundancy), there will not be any mirror copy until the lost disk is accessible from oracle user and it is onlined using alter diskgroup online/new disk is added to diskgroup.
# chown oracle.dba /dev/raw/raw1
SQL> alter diskgroup dgnm11gasm online disk DGNM11GASM_0000; Diskgroup altered.
SQL> select group_number,operation,state from v$asm_operation; GROUP_NUMBER OPERA STAT POWER ---------- -------- --------- ----------- 1 ONLIN RUN 1
10. ASM fast resync keeps track of pending changes to extents on an OFFLINE disk during an outage. The extents are resynced when the disk is brought back online or replaced.
SQL> select path,header_status,mount_status from v$asm_disk where group_number=1; PATH HEADER_STATU MOUNT_S --------------- ------------ ------- /dev/raw/raw2 MEMBER CACHED /dev/raw/raw1 MEMBER CACHED