The Problem
You noticed many inconsistencies in different asm metadata views, asmcmd and os outputs that are associated with the ACFS file system. Database shows corruption in RMAN or from dbverify. Database uses ACFS. ALTER DISKGROUP CHECK REPAIR comes back clean but you still see errors or check for corruption returns back positive, eg:
$ /sbin/acfsutil info fs -o iscorrupt [MOUNT_POINT] acfsutil info fs: ACFS-03037: not an ACFS file system
$ /sbin/acfsutil info fs -o iscorrupt [MOUNT_POINT] 1
Also output from “/sbin/acfsutil info fs” shows Corrupt under flags. For example:
$ /sbin/acfsutil info fs
[MOUNT_POINT]
ACFS Version: 18.3.0.0.0
on-disk version: 47.0
compatible.advm: 18.3.0.0.0
ACFS compatibility: 18.3.0.0.0
flags: MountPoint,Available,Corrupt,AutoResizeEnabled
The Solution
You should use FSCK to do the following checks and try to repair ACFS.
The amount of time for FSCK to finish depends on the filesystem and on the amount of files to check. With the verbose option, fsck has to generate this information for all directories and files. This can impact performance significantly, especially for file systems with numerous files.
Fsck processing:
Fsck attempts to validate all of a file system’s metadata. Part of this process includes validation of metadata block free lists. The free block itself has to be checked to ensure that is it the correct type of metadata block and the free list has to be checked to ensure that there are no missing entries and that there are no loops in the list. Since files can be deleted in any order, these free blocks can be located anywhere on the volume. Processing and validating these lists can produce lots of random I/O due to the random nature of file deletion. If fsck takes too long it is ok to interrupt it given it’s running in check mode.
fsck checks and repairs an existing Oracle ACFS. This command can only be run on a dismounted file system. root privileges are required to run fsck. The Oracle ACFS driver must be loaded for fsck to work. To confirm or discard any consistency problem on the affected ACFS filesystem, run FSCK as follows:
1. Dismount the ACFS filesystem on each node as follow (as root user) and keep them dismounted to avoid corruption. Otherwise will get warning:
fsck.acfs: ACFS-00511: /dev/asm/is mounted on at least one node of the cluster.
# /usr/sbin/umountall -F acfs
Or
# /usr/sbin/umount [MOUNT_POINT]
Where,
MOUNT_POINT – is your ACFS filesystem.
2. Then, execute FSCK on the associated ADVM volume as follow (from each node if it RAC):
On Linux
# script /tmp/fsck_[node name].txt # /sbin/fsck -a -y -t acfs /dev/asm/[VOLUME_NAME] # exit
On Solaris
# script /tmp/fsck_[node name].txt # /usr/sbin/fsck -F acfs -y -o a,v /dev/asm/[VOLUME_NAME] # exit
On IBM-AIX
# /usr/sbin/fsck -V acfs -y -o a,v /dev/asm/[VOLUME_NAME]
Where,
/dev/asm/[VOLUME_NAME – is your ADVM volume.
IMPORTANT: By default, fsck only checks for and reports any errors. In check mode fsck can be cancelled if it is taking a long time. The -a flag must be specified to instruct fsck to repair errors in the file system. If check mode completed in a reasonable amount of time, and if it reported problems, run fsck in repair mode. In repair mode fsck cannot be interrupted without risk of leaving the file system in a worse state (loss of data depending on the nature of the corruption).
# /sbin/fsck -a -y -t acfs /dev/asm/[VOLUME_NAME]
3. Then mount back the ACFS filesystem on each node if it is RAC. It is always better to manually mount after fsck repairs.
# /usr/sbin/mount -v acfs /dev/asm/[VOLUME_NAME] [MOUNT_POINT]
Where,
/dev/asm/[VOLUME_NAME] – is your ADVM volume.
MOUNT_POINT – is your ACFS filesystem.
Or
# /usr/sbin/mount -v acfs -o all none none
Example:
– Locate correct volume with ‘/sbin/acfsutil info fs‘ and execute fsck.
[HOSTNAME]:(+ASM2)/opt/oracle > /sbin/acfsutil info fs /acfs1 ACFS Version: 11.2.0.4.0 flags: MountPoint,Available,Corrupt mount time: Fri Jun 10 17:29:14 2016 volumes: 1 total size: xxxxxxxxxxxx total free: xxxxxxxxxxxx primary volume: /dev/asm/[VOLUME_NAME2]
Example of an clean output:
Script command is started on Fri May 18 10:26:18 EDT 2016. [/dev/asm]# /usr/sbin/fsck -V acfs -y -o a,v /dev/asm/[VOLUME_NAME3] version = 11.2.0.3.0 fsck: temporary directory '/usr/tmp' fsck: current directory '/dev/asm' ***************************** ********** Pass: 1 ********** ***************************** fsck: file system check starting for volume: /dev/asm/[VOLUME_NAME3] Oracle ASM Cluster File System (ACFS) On-Disk Structure Version: 39.0 Volume indicates the on-disk version is 39.0 Volume indicates the on-disk version is 39.0 ACFS file system created at: Tue Apr 24 22:37:46 2012 checking primary file system fsck: Volume_Log recovery on node 3 not needed fsck: checking File_Entry_Table entry: 28 (0x1c) at disk offset: 83968 (0x14800) fsck: checking File_Entry_Table entry: 29 (0x1d) at disk offset: 84480 (0x14a00) fsck: checking File_Entry_Table entry: 30 (0x1e) at disk offset: 84992 (0x14c00) fsck: checking File_Entry_Table entry: 31 (0x1f) at disk offset: 85504 (0x14e00) fsck: check for unprocessed File_Entry_Table entries complete Files checked in primary file system: 100% Checking if any files are orphaned... Phase 1 Orphan check... Phase 2 Orphan check... 0 orphans found Checker completed with no errors. [/dev/asm]# exit Script command is complete on Fri May 18 10:27:14 EDT 2012.
Example of an output when fsck is fixing corruption:
fsck.acfs: file system check starting for volume: /dev/asm/[VOLUME_NAME4] Oracle ASM Cluster File System (ACFS) On-Disk Structure Version: 39.0 Volume indicates the on-disk version is 39.0 Volume indicates the on-disk version is 39.0 ACFS file system created at: Fri Jun 3 15:18:10 2011 checking primary file system fsck.acfs: Volume_Log recovery on node 1 not needed fsck.acfs: Volume_Log recovery on node 2 not needed fsck.acfs: ACFS-07592: metadata structure has incorrect header for: ACFS Internal Structure: [ACFS Free Block] file identifier: 1280 (0x500) disk offset: 24006656 (0x16e5000) Parent Structure: [ACFS Local Free List] file identifier: 33 (0x21) disk offset: 193024 (0x2f200) Value Found Expected Value ------------------------ ------------------------ File Identifier: 1280 (0x500) 1280 (0x500) Struct_Type: 0x0f500004 0x0f500007 (OFS_FILE_ENTRY) (OFS_FETA_FREE_BLOCK) Endian_Format: Little Endian Little Endian OSCreatedOn: Linux Linux Struct_Version: 1 1 FileSystemID: 1315928241 (0x4e6f78b1) 1315928241 (0x4e6f78b1) CheckSum: 1731346914 (0x673241e2) 1731346914 (0x673241e2) fsck.acfs: checking directory: for file: ACFS Internal File: [ACFS Root Directory] file identifier: 2 (0x2) disk offset: 70656 (0x11400) fsck.acfs: checking file '/.ACFS' fsck.acfs: checking directory: for file: File: '/.ACFS' file identifier: 7 (0x7) disk offset: 73216 (0x11e00) Parent Directory: [ACFS Root Directory] file identifier: 2 (0x2) disk offset: 70656 (0x11400) fsck.acfs: checking file '/.ACFS/.fileid' fsck.acfs: checking directory: ............. fsck.acfs: checking file '/home/[PATH]/dirchk/REP_UR.cpr' fsck.acfs: checking file '/home/[PATH]/dirchk/REP_UR.cps' fsck.acfs: checking file '/home/[PATH]/dirchk/EXT_AL.cps' fsck.acfs: checking file '/home/[PATH]/dirdat' fsck.acfs: checking directory: for file: File: '/home/[PATH]/dirdat' file identifier: 45 (0x2d) disk offset: 199168 (0x30a00) Parent Directory: /home/[PATH] file identifier: 43 (0x2b) disk offset: 198144 (0x30600) fsck.acfs: checking file '/home/[PATH]/dirdef' fsck.acfs: checking directory: for file: File: '/home/[PATH]/dirdef' file identifier: 46 (0x2e) disk offset: 199680 (0x30c00) Parent Directory: /home/[PATH] file identifier: 43 (0x2b) disk offset: 198144 (0x30600) fsck.acfs: checking file '/home/[PATH]/dirpcs' fsck.acfs: checking directory: for file: File: '/home/[PATH]/dirpcs' file identifier: 47 (0x2f) disk offset: 200192 (0x30e00) Parent Directory: /home/[PATH] file identifier: 43 (0x2b) disk offset: 198144 (0x30600) fsck.acfs: checking file '/home/[PATH]/dirpcs/MGR.pcm' fsck.acfs: checking file '/home/[PATH]/macro' fsck.acfs: checking directory: ........... fsck.acfs: ACFS-07711: orphan metadata structure (type: 0x0f500007 (OFS_FETA_FREE_BLOCK)) found for file identifier: 2142 (0x85e) at disk offset: 24448000 (0x1750c00) fsck.acfs: ACFS-07711: orphan metadata structure (type: 0x0f500007 (OFS_FETA_FREE_BLOCK)) found for file identifier: 2143 (0x85f) at disk offset: 24448512 (0x1750e00) fsck.acfs: ACFS-07711: orphan metadata structure (type: 0x0f500007 (OFS_FETA_FREE_BLOCK)) found for file identifier: 2339 (0x923) at disk offset: 61203968 (0x3a5e600) 159 orphans found fixing file system problems processed: 5% problems processed: 10% problems processed: 15% problems processed: 20% ............. problems processed: 90% problems processed: 95% problems processed: 100% ........ fsck.acfs: check for unprocessed File_Entry_Table entries complete Files checked in primary file system: 55% Files checked in primary file system: 60% Files checked in primary file system: 65% Files checked in primary file system: 70% Files checked in primary file system: 75% Files checked in primary file system: 80% Files checked in primary file system: 85% Files checked in primary file system: 90% Files checked in primary file system: 95% Files checked in primary file system: 100% Checking if any files are orphaned... Phase 1 Orphan check... Phase 2 Orphan check... 0 orphans found fsck.acfs: Checker/Fixer completed with the following results: File System Errors: 168 Fixed: 168 Not Fixed: 0
By default, fsck only checks for and reports any errors. The -o a option must be specified to instruct fsck to fix errors in the file system. In a few cases, fsck prompts for questions before proceeding to check a file system. These cases include:
- If fsck detects that another fsck is in progress on the file system.
- If fsck detects that the Oracle ACFS driver is not loaded.
- If the file system does not appear to be Oracle ACFS.
In checking mode, fsck also prompts if there are transaction logs that have not been processed completely due to an incomplete shutdown. To run in a non-interactive mode, include either the -y or -n options to answer yes or no to any questions.