My /home
file system is JFS, it got to RO mode several times already, so I had to reboot/remount it. I saw this at '/var/log/messages`:
Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925711] ata2.00: configured for UDMA/133 Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925755] sd 1:0:0:0: [sda] Unhandled sense code Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925759] sd 1:0:0:0: [sda] Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925763] sd 1:0:0:0: [sda] Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925770] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925778] 0e 5a b2 b8 Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925782] sd 1:0:0:0: [sda] Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925785] sd 1:0:0:0: [sda] CDB: Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925815] sd 1:0:0:0: [sda] Unhandled sense code Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925817] sd 1:0:0:0: [sda] Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925820] sd 1:0:0:0: [sda] Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925825] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925833] 00 00 00 00 Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925836] sd 1:0:0:0: [sda] Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925839] sd 1:0:0:0: [sda] CDB: Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925863] sd 1:0:0:0: [sda] Unhandled sense code Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925865] sd 1:0:0:0: [sda] Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925868] sd 1:0:0:0: [sda] Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925872] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925879] 00 00 00 00 Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925882] sd 1:0:0:0: [sda] Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925885] sd 1:0:0:0: [sda] CDB: Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925908] ata2: EH complete
And smartctl -a /dev/sda
gave me this:
SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 179 174 021 Pre-fail Always - 2008 4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1005 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 082 082 000 Old_age Always - 13675 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 998 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 37 193 Load_Cycle_Count 0x0032 001 001 000 Old_age Always - 810861 194 Temperature_Celsius 0x0022 106 091 000 Old_age Always - 41 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 1 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
Hard-drive model:
Model Family: Western Digital Scorpio Blue Serial ATA (Adv. Format) Device Model: WDC WD7500BPVT-24HXZT3 Serial Number: WD-WX91A91R4010 LU WWN Device Id: 5 0014ee 601b831c9 Firmware Version: 03.01A03
Upd: I started another self-test (the first one I did several months ago) and got some updates:
SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 13680 229857912 # 2 Extended offline Completed without error 00% 9661 - # 3 Extended offline Completed: read failure 90% 9654 96004576 # 4 Extended offline Completed: read failure 90% 9653 96004576
lines from #2 to #4 I already had before.
I followed these guides: Badblock HOWTO and Debug the Filesystem. It seems the block is not reported as erroneous anymore, but it's not in Relocated blocks are not increased as well. The only thing that have been increased is Raw_Read_Error_Rate after I wrote zero to a bad block.
The questions is should I consider ordering a new hard-drive?
Best Answer
From the
smartctl
man page:So according to the
smartctl
output section you have posted, your drive actually looks in good shape. However, that doesn't necessarily mean that there is not another problem.Unfortunately the
Unhandled sense code
message does mean that something went wrong, but the kernel doesn't know what. You could try looking at the rest of thesmartctl
output to see if there is any thing wrong. There should be a part tha summarises the drive's overall health. You can get it on its own with the-H
option.If the drive supports self testing, you can start one with:
This starts one in the background, so you will have to keep checking for results. If the drive is not mounted, you can add the
-C
option enable captive mode which should take less time. Ashort
test is also possible, but less thorough.It is also a good idea to check physical connectors etc to make sure nothing as come loose - its an easy fix if it has.
Update
Wikipedia has a good reference for smart attributes. Note that the 'Better' column refers to the raw values in rightmost column of the output and not the normalised value at the start. Here is the part on 'Current Pending Sector' mentioned by frostschutz: