How to replace failed root disk under Solaris Volume Manager (SVM)
Let us discuss 3 different cases for a disk failure under ZFS.
2. Drive actually failed. New drive in place with same target number.
3. Drive actually failed. New drive in place with different target number.
Drive went Offline and came online
There might be a case when the drive goes offline and comes online and there is no actual hardware failure.
You would see the offline status in format and also in /var/adm/messages.
12. c1t21d0 [drive not available] /pci@1f,0/pci@1/pci@3/SUNW,qlc@5/fp@0,0/ssd@w22023020370705f1,0
If you do a zpool status, you would see the degraded pool and the unavailable disk.
# zpool status -v geekpool pool: geekpool state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. scrub: resilver completed with 0 errors on Tue Dec 29 13:05:45 2013 config: NAME STATE READ WRITE CKSUM geekpool DEGRADED 0 0 0 mirror DEGRADED 0 0 0 c1t12d0s1 ONLINE 0 0 0 c3t21d0s1 UNAVAIL 0 29 0 cannot open
There are 2 ways to tackle this type of failure :
1. Method 1
After verifying that the drive is online in format command, bring the disk online in the zfs pool
# zpool online geekpool c3t54d0s1
To make sure the data is not corrupt, use zfs scrub to check the integrity :
# zpool scrub geekpool
1. Method 2
After the drive is online, export and import the pool :
# zpool export geekpool # zpool import -f geekpool
Again to make sure of the integrity, do a scrub on the pool :
# zpool scrub geekpool
Drive actually failed. New drive in place with same target number.
Remove the old drive using the cfgadm or luxadm command. (The method to remove the disk may vary slightly according to the type of disk [SAS/SCSI/Fiber channel] ). Refer the below post to remove the failed disk :
Replace the failed disk with the original name
# zpool replace geekpool c1t21d0
Drive actually failed. New drive in place with different target number.
Remove the old drive using the cfgadm or luxadm command. (The method to remove the disk may vary slightly according to the type of disk [SAS/SCSI/Fiber channel] ). Refer the below post to remove the failed disk :
Replace the failed disk with the new disk name :
# zpool replace geekpool c1t21d0 c1t54d0
Disk failure under rootpool
The only change in case of a mirrored rootpool disk replacement is to install the bootblk. Install bootblk after the sync is complete.
For SPARC based systems :
# installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk /dev/rdsk/c1t0d0s0
For x86 based systems :
# installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c1t0d0s0
How to replace failed root disk under Solaris Volume Manager (SVM)