Ubuntu Hard Disk – Does a Bad Sector Indicate a Failing Disk?

badblocksfsckhard-disksataUbuntu

My Ubuntu 13.10 system has been performing very poorly over the last day or so. Looking at the kernel logs, it appears that the <1yr old 3TB SATA disk is having issues with a particular sector:

Nov  4 20:54:04 mediaserver kernel: [10893.039180] ata4.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Nov  4 20:54:04 mediaserver kernel: [10893.039187] ata4.01: BMDMA stat 0x65
Nov  4 20:54:04 mediaserver kernel: [10893.039193] ata4.01: failed command: READ DMA EXT
Nov  4 20:54:04 mediaserver kernel: [10893.039202] ata4.01: cmd 25/00:08:f8:3f:83/00:00:af:00:00/f0 tag 0 dma 4096 in
Nov  4 20:54:04 mediaserver kernel: [10893.039202]          res 51/40:00:f8:3f:83/40:00:af:00:00/10 Emask 0x9 (media error)
Nov  4 20:54:04 mediaserver kernel: [10893.039207] ata4.01: status: { DRDY ERR }
Nov  4 20:54:04 mediaserver kernel: [10893.039211] ata4.01: error: { UNC }
Nov  4 20:54:04 mediaserver kernel: [10893.148527] ata4.00: configured for UDMA/133
Nov  4 20:54:04 mediaserver kernel: [10893.180322] ata4.01: configured for UDMA/133
Nov  4 20:54:04 mediaserver kernel: [10893.180345] sd 3:0:1:0: [sdc] Unhandled sense code
Nov  4 20:54:04 mediaserver kernel: [10893.180349] sd 3:0:1:0: [sdc]
Nov  4 20:54:04 mediaserver kernel: [10893.180353] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov  4 20:54:04 mediaserver kernel: [10893.180356] sd 3:0:1:0: [sdc]
Nov  4 20:54:04 mediaserver kernel: [10893.180359] Sense Key : Medium Error [current] [descriptor]
Nov  4 20:54:04 mediaserver kernel: [10893.180371] Descriptor sense data with sense descriptors (in hex):
Nov  4 20:54:04 mediaserver kernel: [10893.180373]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Nov  4 20:54:04 mediaserver kernel: [10893.180384]         af 83 3f f8
Nov  4 20:54:04 mediaserver kernel: [10893.180389] sd 3:0:1:0: [sdc]
Nov  4 20:54:04 mediaserver kernel: [10893.180393] Add. Sense: Unrecovered read error - auto reallocate failed
Nov  4 20:54:04 mediaserver kernel: [10893.180396] sd 3:0:1:0: [sdc] CDB:
Nov  4 20:54:04 mediaserver kernel: [10893.180398] Read(16): 88 00 00 00 00 00 af 83 3f f8 00 00 00 08 00 00
Nov  4 20:54:04 mediaserver kernel: [10893.180412] end_request: I/O error, dev sdc, sector 2944614392
Nov  4 20:54:04 mediaserver kernel: [10893.180431] ata4: EH complete

The kern.log file is around 33MB mostly full of the above error repeated and the sector doesn't appear to be any different in the repeated messages.

I'm currently running the following command on the now unmounted disk to test and attempt to sort out any issues the disk might have. I'm around 12hrs in and expect it to take another 24/48 hours as the disk is so large:

e2fsck -c -c -p -v /dev/sdc1

My question is: Is this drive failing, or am I looking at a common issue here? I'm wondering if there is any point to me to repairing or ignoring bad sectors and whether I should replace the disk under warranty whilst it's still covered. My knowledge of the above command is somewhat lacking, so I'm sceptical as to whether it'll help or not.

Quick update!

e2fsck finally finished after 2 days with lots of 'multiply-claimed block(s) in inode'. Trying to mount the filesystem resulted in an error, forcing it to drop back to read-only:

Nov 11 08:29:05 mediaserver kernel: [211822.287758] EXT4-fs (sdc1): warning: mounting fs with errors, running e2fsck is recommended
Nov 11 08:29:05 mediaserver kernel: [211822.301699] EXT4-fs (sdc1): mounted filesystem with ordered data mode. Opts: errors=remount-ro

Trying to read the sector manually:

sudo dd count=1 if=/dev/sdc of=/dev/null skip=2944614392
dd: reading ‘/dev/sdc’: Input/output error
0+0 records in
0+0 records out
0 bytes (0 B) copied, 5.73077 s, 0.0 kB/s

Trying to write to it:

sudo dd count=1 if=/dev/zero of=/dev/sdc seek=2944614392
dd: writing to ‘/dev/sdc’: Input/output error
1+0 records in
0+0 records out
0 bytes (0 B) copied, 2.87869 s, 0.0 kB/s

On both counts, the Reallocated_Sector_Ct remained 0.

The drive does go into a sleep state quite often. I'm now thinking this could be a filesystem issue? I'm not 100%.

Best Answer

Bad sectors are always an indication of a failing HDD, in fact the moment you see an I/O error such as this, you probably already lost/corrupted some data. Make a backup if you haven't one already, run a self test smartctl -t long /dev/disk and check SMART data smartctl -a /dev/disk. Get a replacement if you can.

Bad sectors can't be repaired, only replaced by reserve sectors, which harms HDD performance, as they require additional seeks to the reserve sectors every time they are accessed. Marking such sectors as bad on the filesystem layer helps, as they won't ever be accessed then; however it's hard to determine which sectors were already reallocated by the disk, so chances are the filesystem won't know to avoid the affected region.

Related Question