• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer navigation

The Geek Diary

  • OS
    • Linux
    • CentOS/RHEL
    • Solaris
    • Oracle Linux
    • VCS
  • Interview Questions
  • Database
    • oracle
    • oracle 12c
    • ASM
    • mysql
    • MariaDB
  • DevOps
    • Docker
    • Shell Scripting
  • Big Data
    • Hadoop
    • Cloudera
    • Hortonworks HDP

How To Check a Disk for Bad Blocks or Disk Errors on CentOS / RHEL

by admin

Hard disks can fail unexpectedly and it is always best to keep recent backups of all important data. Please keep in mind that even if a current or oncoming failure is detected, there may not be enough time to backup the data. Below are several methods that can be used to identify bad blocks or disk errors in CentOS/RHEL.

Using smartctl

If there are several I/O errors in /var/log/messages or one simply suspects the hard disks may be failing, smartctl can be a helpful tool in checking them. S.M.A.R.T. stands for Self-Monitoring, Analysis and Reporting Technology. You have to enable the S.M.A.R.T. support in the BIOS before using it.

Next, install the needed packages to run /usr/sbin/smartctl. In Red Hat Enterprise Linux, it is provided by the smartmontools package.

1. Verify if your hard disk supports S.M.A.R.T. :

# smartctl -i /dev/xxx

Replace /dev/xxx with the hard disk of interest when using the commands outlined in this post.

2. For SATA drives use:

# smartctl -i -d ata /dev/xxx

3. Enable S.M.A.R.T. support with:

# smartctl -s on /dev/xxx            ### For SCSI Disks
# smartctl -s on -d ata /dev/xxx     ### for SATA Disks

4. Running the following command as root can be a quick PASS/FAIL test but more thorough testing discussed below is generally more conclusive:

# smartctl -H /dev/xxx

Running smartctl in the background

To start a background test run the following as root:

# smartctl -t long /dev/xxx

To access the results, use the following command:

# smartctl -a /dev/xxx

To learn more about various options that can be used with smartctl view the man page of the command:

# man smartctl

Using badblocks

You can also use the “badblocks” command in order to check for bad blocks on a disk device. The “badblocks” command can be very useful in isolating problems with syncing LVM partitions within Linux. LVM operations will fail due to bad blocks on a disk. Bad blocks on either the source or destination disk within a LVM mirror will cause a synchronization failure.

Badblocks can also be used in conjunction with the fsck and makefs to mark the blocks as bad. If the output of badblocks is going to be fed to the e2fsck or mke2fs programs, it is important that the block size is properly specified, since the block numbers which are generated are very dependent on the block size in use by the filesystem. For this reason, it is strongly recommended that users not run badblocks directly, but rather use the -c option of the e2fsck and mke2fs programs.

Warning: The mis-use of these commands can cause data loss. Additional information on the command “badblocks” is available using the “man badblocks” command.

1. Use the disk checking tool badblocks to scan the specified hard disk block by block. For example, to scan /dev/sdd issue the commands:

# mount | grep sdd                  # find all mounted partitions of sdd
# umount /dev/sdd1                  # unmount the partitions (may be more then one)
# badblocks -n -vv /dev/sdd

Where -n is use non-destructive read-write mode. By default only a non-destructive read-only test is done.

Note: Never use the -w option on a device containing an existing file system. This option erases data! If write-mode testing needs to be performed on an existing file system, use the -n option instead. It is slower, but it will preserve the data.

2. If the messages similar to the examples found below appear in /var/log/messages or to the console following the running of badblocks it is recommended to backup any data on the affected devices and replace the device:

Apr  4 13:50:40 test kernel: sdd: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Apr  4 13:50:40 test kernel: sdd: dma_intr: error=0x40 { UncorrectableError }, LBAsect=74367249, sector=74367232
Apr  4 13:50:40 test kernel: ide: failed opcode was: unknown
Apr  4 13:50:40 test kernel: end_request: I/O error, dev sdd, sector 74367232
Apr  4 13:50:42 test kernel: sdd: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Apr  4 13:50:42 test kernel: sdd: dma_intr: error=0x40 { UncorrectableError }, LBAsect=74367249, sector=74367240
Apr  4 13:50:42 test kernel: ide: failed opcode was: unknown
Apr  4 13:50:42 test kernel: end_request: I/O error, dev sdd, sector 74367240
Apr  4 13:50:44 test kernel: sdd: dma_intr: status=0x51 { DriveReady SeekComplete Error }

3. The command below will dump found bad blocks to the output file: badblocks.log.

# badblocks -v -o badblocks.log /dev/sdd

Filed Under: CentOS/RHEL 6, CentOS/RHEL 7, Linux

Some more articles you might also be interested in …

  1. grub2-install: command not found
  2. How To Configure 802.1q VLAN On NIC On CentOS/RHEL 7 and 8
  3. Oracle OS watcher (OSWatcher) – Understanding oswmpstat
  4. How to Transfer files securely using SCP Command in Linux
  5. physlock: command not found
  6. How To Increase The Retention Of “sar” Data To ‘N’ Days in Linux
  7. needrestart: command not found
  8. gsettings: command not found
  9. Volume “test_vg/lvol0” is not active locally – Error while running lvcreate
  10. lvdisplay Command Examples in Linux

You May Also Like

Primary Sidebar

Recent Posts

  • pw-cat Command Examples in Linux
  • pvs: command not found
  • pulseaudio: command not found
  • pulseaudio Command Examples in Linux

© 2023 · The Geek Diary

  • Archives
  • Contact Us
  • Copyright