Linux – Initial Testing of New Hard Drives in Linux

hard drivelinuxsmart

I'm about to get a couple brand new HGST DeskStar 4TB 3.5" SATA hard drives. Are there any recommended practices these days I should perform before I start using them and entrusting my data to them, especially since they're still in the initial warranty period?

Typically, I just stick new drives in, fdisk them, encrypt if I need it, format with ext4, and go, though this time it'll be ZFS (via ZoL). When I get the time, I hook them into smartmontools so smartd can keep an eye on them, but that's about it.

Should I look at any particular SMART values at the beginning? Should I write ones, zeroes, or random data across the full length of the disk and, if so, read it all back? Should I leave it powered on for a full 30 days and keep an eye on anything? Should I verify APM settings are turned off for the drive so there is no premature wear from frequent spin-downs?

Update 7 Oct 2017: I followed the suggestions in @Xen2050's answer and @sawdust's comment.

I got the drives and I'm ready to begin testing them out. I created a script capturing Xen2050's recommendations.

#!/bin/sh

AWK=/usr/bin/awk
CLEAR=/usr/bin/clear
GREP=/bin/grep
SLEEP=/bin/sleep
SMARTCTL=/usr/sbin/smartctl

EXIT_SUCCESS=0
EXIT_INSUFFICIENT_ARGS=1

usage() {
   cat << END_OF_FILE
USAGE

   ${0} interval device

EXAMPLES

   ${0} /dev/sda

END_OF_FILE
}

runIteration() {
   runIteration_device=${1}

   #${HDPARM} -B ${runIteration_device} | ${GREP} 'APM_level'
   #${HDDTEMP} ${runIteration_device}
   #${SMARTCTL} --attributes ${runIteration_device}
   ${SMARTCTL} --attributes ${runIteration_device} | ${GREP} -E '(ATTRIBUTE_NAME|Temperature_Celsius|Current_Pending_Sector|Pre\-fail|Power_On_Hours|Power_Cycle_Count|Load_Cycle_Count)' | ${AWK} '
   {
      for (i = 1; i <= NF; ++i) {
         len=20;
         if ((i != 3) && (i != 7) && (i != 8)) {
            s = substr($i, 0, len-1);
            printf("%-4s", s);
         }

         if (i == 2) {
            printf(sprintf("%s%0" (len-length(s)) "s", "", ""));
         }

         printf(" ");
      }

      print "";
   }'

   ${SMARTCTL} --get=apm ${runIteration_device} | ${GREP} '^APM'
}

exitCode=${EXIT_SUCCESS}

if [ ${#} -eq 2 ]; then

   interval=${1}
   device=${2}

   while [ 1 ]; do
      ${CLEAR}
      runIteration ${device}
      ${SLEEP} ${interval}
   done

else

   exitCode=${EXIT_INSUFFICIENT_ARGS}
   echo ${0}: Insufficient arguments 1>&2
   usage 1>&2

fi

exit ${exitCode}

Test Setup

I have two of my four new drives plugged into two USB docks at a time that are sitting on a desk simply because this computer has no SATA ports available. I'm not sure whether I should expect the temperature to be higher or lower than inside an enclosed chassis with power supply fan running.

Because these are USB docks, I ran into some trouble I hadn't seen before. Although I could see the devices as /dev/sda and /dev/sdb, any smartctl commands resulted in an error last night. lsusb reported the docks are JMicron Technology, and a quick Google search indicated I needed to specify the --device option. After trying a few things out, I gave up on it as it didn't seem to work.

Tonight, I tried again without –device and it's working better for no apparent reason.

Also please keep in mind that I'm running this on a computer that is disconnected from the network (purely because I have no place to plug in an Ethernet cable). As a result, I'm trying to capture my notes here by running the corresponding smartctl commands on this laptop, pasting the output, and massaging values to match what I see on-screen of the test PC. I mention this because I caught myself missing the update of one value after pasting, so I want to apologize in advance in case anyone gets confused reading output below that does not make perfect sense because values look wrong. (FYI the value I missed was a VALUE/WORST for a Temperature_Celsius when I updated a RAW_VALUE.)

This also means I had to hand-type the script above onto the test PC. I believe I typed everything correctly, but there's always a chance I missed a comma or semicolon somewhere.

I performed the steps in the following sections twice–once with the first two drives, then again after powering everything off, replacing the drives with the remaining two, and then powering everything back up. I have annotated any differences from the second run where applicable.

OK. Now onto the fun part…

I booted up the PC with a live CD of System Rescue CD version 5.0.3. After I was at a prompt, I monitored the log:

# tail -F /var/log/messages

I powered on each of the USB docks and watched the messages come up for
/dev/sda and /dev/sdb.

Run SMART Attributes Monitoring Script

To run the script, I typed:

# ~/scripts/hdd_init_checks.sh 60 /dev/sdX

for each disk (sda and sdb).

I'm not aware if polling too frequently wears anything out on the drive, but I figured once per minute should be enough for the duration of this.

The initial parameters were identical for the two drives:

ID#  ATTRIBUTE_NAME        VALUE WORST THRESH   WHEN_FAILED RAW_VALUE 
1    Raw_Read_Error_Rate   100  100  016    -
2    Throughput_Performa   100  100  054    -
3    Spin_Up_Time          100  100  024    -
5    Reallocated_Sector_   100  100  005    -
7    Seek_Error_Rate       100  100  067    -
8    Seek_Time_Performan   100  100  020    -
9    Power_On_Hours        100  100  000    -
10   Spin_Retry_Count      100  100  060    -
12   Power_Cycle_Count     100  100  000    -
193  Load_Cycle_Count      100  100  000    -
194  Temperature_Celsius   250  250  000    -    24   (Min/Max
197  Current_Pending_Sec   100  100  000    -
APM level is:     Disabled

Run SMART Tests

I started running the SMART tests; however, smartctl --capabilities reported neither of these supported the conveyance self-test. Oh well.

# smartctl --capabilities /dev/sdX
...
                                    Self-test supported.
                                    No Conveyance Self-test supported.
...
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    ( 571) minutes.

Run Immediate Offline Test

I began with the immediate offline test for each of sda and sdb, but first I checked smartctl --capabilities /dev/sdX for each drive:

Offline data collection status:  (0x80) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Enabled.
...
Total time to complete Offline 
data collection:            (  113) seconds.

Then, I began the immediate offline test:

# smartctl --test=offline /dev/sdX

Testing has begun.
Please wait 113 seconds for test to complete.
Test will complete after Thu Oct  5 03:40:52 2017

During the test, I monitored its progress with smartctl --capabilities:

# watch -n 1 'echo "--- sda"; smartctl --capabilities /dev/sda | head -13 | tail -9; echo "--- sdb"; smartctl --capabilities /dev/sdb | head -13 | tail -9'

Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
...
Total time to complete Offline 
data collection:            (  113) seconds.

and viewed the results when it completed:

Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
...
Total time to complete Offline 
data collection:            (  113) seconds.

The parameters were now a bit different from above:

ID#  ATTRIBUTE_NAME        VALUE WORST THRESH   WHEN_FAILED RAW_VALUE 
1    Raw_Read_Error_Rate   100  100  016    -
2    Throughput_Performa   136  136  054    -
3    Spin_Up_Time          100  100  024    -
5    Reallocated_Sector_   100  100  005    -
7    Seek_Error_Rate       100  100  067    -
8    Seek_Time_Performan   128  128  020    -
9    Power_On_Hours        100  100  000    -
10   Spin_Retry_Count      100  100  060    -
12   Power_Cycle_Count     100  100  000    -
193  Load_Cycle_Count      100  100  000    -
194  Temperature_Celsius   125  125  000    -    48   (Min/Max
197  Current_Pending_Sec   100  100  000    -
APM level is:     Disabled

(2nd Run: With the second pair of hard drives, sda's Throughput_Performance was 137 for VALUE and WORST; sdb's values matched above. Also temperature was cooler for these at 31 and 34, but that's probably because I know what I'm doing this time and breezing through these steps so they didn't heat up yet.)

It looks like the temperature is going up; it's been a couple minutes since it ended as I'm trying to capture my notes here. It was 46, then 47, now 48. The drives are sitting in the bookcase portion of a standard desk, so they're enclosed on five of six sides, but I would expect it to be warmer inside of a PC case. I turned on the ceiling fan in the room to circulate some air in case it helps.

The error log showed no errors:

# smartctl --log=error /dev/sdX
=== START OF READ SMART DATA SECTION ===
SMART Error Log Version: 1
No Errors Logged

Run Short Self-Test

Next, I ran the two-minute short self-test for each of sda and sdb:

# smartctl --test=short /dev/sdX

Testing has begun.
Please wait 2 minutes for test to complete.
Test will complete after Thu Oct  5 04:10:34 2017

During the test, I monitored its progress with smartctl --capabilities:

# watch -n 1 'echo "--- sda"; smartctl --capabilities /dev/sda | head -13 | tail -9; echo "--- sdb"; smartctl --capabilities /dev/sdb | head -13 | tail -9'

Self-test execution status:      ( 249) Self-test routine in progress...
                                        90% of test remaining.
                                 ( 248) 80% of test remaining.
                                 ( 247) 70% of test remaining.
                                 ( 246) 60% of test remaining.
                                 ( 245) 50% of test remaining.
                                 ( 244) 40% of test remaining.
                                 ( 243) 30% of test remaining.
                                 ( 242) 20% of test remaining.
                                 ( 241) 10% of test remaining.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.

(Note: The output didn't really look like this; I combined all the different percentages for easier readability here. The long test below shows more realistic output.)

The parameters seem to not have changed at all except the temperature seems to be fluctuating between 47 and 48:

ID#  ATTRIBUTE_NAME        VALUE WORST THRESH   WHEN_FAILED RAW_VALUE 
1    Raw_Read_Error_Rate   100  100  016    -
2    Throughput_Performa   136  136  054    -
3    Spin_Up_Time          100  100  024    -
5    Reallocated_Sector_   100  100  005    -
7    Seek_Error_Rate       100  100  067    -
8    Seek_Time_Performan   128  128  020    -
9    Power_On_Hours        100  100  000    -
10   Spin_Retry_Count      100  100  060    -
12   Power_Cycle_Count     100  100  000    -
193  Load_Cycle_Count      100  100  000    -
194  Temperature_Celsius   127  127  000    -    47   (Min/Max
197  Current_Pending_Sec   100  100  000    -
APM level is:     Disabled

The self-test log showed no errors:

# smartctl --log=selftest /dev/sdX
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%         1         -

Note: The LifeTime(hours) column in the self test log can provide an indication of how long since the last test when combined with the Power_On_Hours from Attribute 9 for the current lifetime hours.

(2nd Run: LifeTime(hours) was 0 this time. Again because I'm moving faster and getting through these steps sooner this time.)

Run Conveyance Self-Test

I couldn't run this one for these devices. I tried:

# smartctl --test=conveyance /dev/sdX
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Conveyance Self-test functions not supported

Sending command: "Execute SMART Conveyance self-test routine immediately in off-line mode".
Command "Execute SMART Conveyance self-test routine immediately in off-line mode" failed: scsi error aborted command

Too bad. After reading someone else's response somewhere, I confirmed the man
page stated, "identify damage incurred during transporting of the device".
After receiving these via courier, I would love to have run this type of test.

Run Long/Extended Self-Test

I kicked off the last SMART test to run on these drives late at night and I won't be around to check it after 10 hours, so it'll have to wait till tomorrow night–almost 24 hours from now.

# smartctl --test=long /dev/sdX

Testing has begun.
Please wait 571 minutes for test to complete.
Test will complete after Thu Oct  5 13:57:44 2017

During the test, I periodically monitored its progress with smartctl --capabilities:

# watch -n 1 'echo ---- /dev/sda; smartctl --capabilities /dev/sda | head -13 | tail -9; echo ---- /dev/sdb; smartctl --capabilities /dev/sdb | head -13; tail -9'

---- /dev/sda
Self-test execution status:      ( 249) Self-test routine in progress...
                                        90% of test remaining.
Total time to complete Offline
data collection:                 ( 113) seconds.
---- /dev/sdb
Self-test execution status:      ( 249) Self-test routine in progress...
                                        90% of test remaining.
Total time to complete Offline
data collection:                 ( 113) seconds.
...

I came back probably about six hours after starting and observed the output indicated 10% remaining, but I knew I wouldn't be around for the completion. I did notice none of the SMART attributes seemed way out there.

I came back 24 hours after starting and confirmed the test completed:

Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.

Since the drives were idle all day, they seem to have cooled down by now:

ID#  ATTRIBUTE_NAME        VALUE WORST THRESH   WHEN_FAILED RAW_VALUE 
1    Raw_Read_Error_Rate   100  100  016    -
2    Throughput_Performa   136  136  054    -
3    Spin_Up_Time          100  100  024    -
5    Reallocated_Sector_   100  100  005    -
7    Seek_Error_Rate       100  100  067    -
8    Seek_Time_Performan   128  128  020    -
9    Power_On_Hours        100  100  000    -
10   Spin_Retry_Count      100  100  060    -
12   Power_Cycle_Count     100  100  000    -
193  Load_Cycle_Count      100  100  000    -
194  Temperature_Celsius   142  142  000    -    42   (Min/Max 23/50)
197  Current_Pending_Sec   100  100  000    -
APM level is:     Disabled

The Min/Max is cut off for some reason (probably due to a typo in the awk script above) so I manually ran smartctl --attributes and pasted in the values for the this past output. It appears the temperature reached 50 at some point.

The self-test log showed no errors:

# smartctl --log=selftest /dev/sdX
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 2  Extended offline    Completed without error       00%        10         -
# 1  Short offline       Completed without error       00%         1         -

Run `badblocks`

As I was running the long self-test above, I was debating whether or not I should perform this section at the same time, or wait until the self-test completes. For these first two disks, I opted to let the long self-test run to completion before formatting the partitions, so I'm writing this section while the test above runs, but I didn't run these steps until 24 hours later after the test above completed.

Note: As @Xen2050 stated, this section performs a write test on the device. I did not mind running this on my hard disk drives; however, I would think twice before running this on any flash memory or SSD due to the limited writes.

If I was going to use an ext2 or ext4 file system, I could run a command such as the following to format the partition(s) after partitioning with fdisk/gdisk:

# mke2fs -c -c /dev/sdX1

According to the man page, the first -c checks for bad blocks before creating the file system, and the second -c performs a slower read-write test.

The man page also has a warning to use the -c option rather than running badblocks directly.

However, I don't plan to put any ext file system on these drives, so I decided to run badblocks directly.

Check SMART Attributes Before Test

The temperature seems to be fluctuating between 42 and 43; otherwise, everything else is static:

ID#  ATTRIBUTE_NAME        VALUE WORST THRESH   WHEN_FAILED RAW_VALUE 
1    Raw_Read_Error_Rate   100  100  016    -
2    Throughput_Performa   136  136  054    -
3    Spin_Up_Time          100  100  024    -
5    Reallocated_Sector_   100  100  005    -
7    Seek_Error_Rate       100  100  067    -
8    Seek_Time_Performan   128  128  020    -
9    Power_On_Hours        100  100  000    -
10   Spin_Retry_Count      100  100  060    -
12   Power_Cycle_Count     100  100  000    -
193  Load_Cycle_Count      100  100  000    -
194  Temperature_Celsius   139  139  000    -    43   (Min/Max
197  Current_Pending_Sec   100  100  000    -
APM level is:     Disabled

We now have a baseline prior to the write test.

Run `badblocks`

Now, I was ready to run badblocks on both sda and sdb:

# time badblocks -s -v -w /dev/sdX
Checking for bad blocks in read-write mode
From block 0 to 3907018583
Testing with pattern 0xaa:   0.00% done, 0:55 elapsed. (0/0/0 errors)

As with the extended test above, I ran this on both drives simultaneously.

I came back 15-20 minutes later; the temperature is now at 46 for both drives and it looks like it's 0.02% complete.

Testing with pattern 0xaa:   0.02% done, 18:33 elapsed. (0/0/0 errors)

If I'm doing my math right, this means the test will take about 100000 minutes, or 70 days, to complete. I'm afraid I don't have that much time as I have two more drives to check and only a 30-day return/exchange period, so I'll worry about this later some time.

Check SMART Attributes After Test

I aborted the test after another 15 minutes or so. The SMART attributes were the same as above only with the temperature different.

Additional Tests

As stated, if I had the time or if the drives were smaller, I could let the write test continue to completion.

Alternatively, if I wanted to zero out the drives, I could do so with:

# dd if=/dev/zero of=/dev/sdX bs=1M

While doing so, I could monitor the SMART attributes for any drastic changes.

The drives have already been up for 24 hours per @sawdust's recommendation, and I observed the SMART attributes during this time.

(2nd Run: This is what I ended up doing for these two drives for a while.)

Repeat for Additional Drives

At this time, I powered off the drives and replaced with two additional new ones, and ran through all the steps above for them as stated earlier.

Best Answer

Most of the SMART monitoring tools will sound alerts themselves if it detects something going wrong, I think a couple to watch out for are "Current Pending Sectors" and maybe "Reallocated Sector Count", but a few errors are apparently common.

Run all the SMART self-tests too, offline, short, long, and the conveyance test should be particularly applicable, it's "intended to identify damage incurred during transporting of the device."

See smartctl's man page for more info, or Ubuntu's Community Help Wiki on Smartmontools
When formatting the drive, have it run the write testing of badblocks (or run it yourself before formatting, apparently not all mkfs's support it), it writes 0's, 1's, 01's and 10's and should be a good workout, check the SMART data afterwards too for any drastically increased numbers.

If it's a flash memory device or an SSD you might want to keep in mind they have a limited lifetime of writes, but decent SSD's [I've read in a test] where they should handle an insane amount of writes before failing, like constant writes for months, far more than normal usage does, so don't worry.]
Check the drive's spindown timeout, there were some drives in the past that would spindown every 2 or 3 minutes, really wearing out the drives in record time. In linux it's usually the responsibility of other programs to time & spindown drives, but watch out for the drive itself.
If you can monitor the temperature of the drive, do so. hddtemp should work. One drive being significantly hotter than the others would be a red flag about the drive or just it's cooling.

And if the manufacturer has a specific testing/monitoring program, give it a try. They should have more insight into the drive's specifics.

Related Solutions

Hard drive pending sector count

Your Current Pending Sector Count (2163) is higher than the Reallocation Sector Count (252).
This means that failing sectors can no longer be replaced by the disk firmware.
The disk is failing - make sure you've backups, and get a replacement..

Linux – Trying to remove/diagnose single Current_Pending_Sector in S.M.A.R.T. data

A sector is marked pending when a read fails. The pending sector will be marked reallocated if a subsequent write fails. If the write succeeds, it is removed from current pending sectors and assumed to be ok. (The exact behavior could differ slightly and I'll go into that later, but this is a close enough approximation for now.)

When you run badblocks -w, each pattern is first written, then read. It's possible that the write to the flaky sector succeeds but the subsequent read fails, which again adds it to the pending sector list. I would try writing zeroes to the entire disk with dd if=/dev/zero of=/dev/sda, checking the SMART status, then reading the entire disk with dd if=/dev/sda of=/dev/null and checking the SMART status again.

Update:

Based on your earlier results with badblocks -w, I would have expected the pending sector to be cleared after writing the entire disk. But since that didn't happen, it's safe to say this disk is not behaving as expected.

Let's review the description of Current Pending Sector Count:

Count of "unstable" sectors (waiting to be remapped, because of unrecoverable read errors). If an unstable sector is subsequently read successfully, the sector is remapped and this value is decreased. Read errors on a sector will not remap the sector immediately (since the correct value cannot be read and so the value to remap is not known, and also it might become readable later); instead, the drive firmware remembers that the sector needs to be remapped, and will remap it the next time it's written.[29] However some drives will not immediately remap such sectors when written; instead the drive will first attempt to write to the problem sector and if the write operation is successful then the sector will be marked good (in this case, the "Reallocation Event Count" (0xC4) will not be increased). This is a serious shortcoming, for if such a drive contains marginal sectors that consistently fail only after some time has passed following a successful write operation, then the drive will never remap these problem sectors.

Now let's review the important points:

...the drive firmware remembers that the sector needs to be remapped, and will remap it the next time it's written.[29] However some drives will not immediately remap such sectors when written; instead the drive will first attempt to write to the problem sector and if the write operation is successful then the sector will be marked good.

In other words, the pending sector should have either been remapped immediately, or the drive should have attempted to write to the sector and one of two things should have happened:

The write failed, in which case the pending sector should have been remapped.
The write succeeded, in which case the pending sector should have been cleared ("marked good").

I hinted at this earlier, but Wikipedia's description of Current Pending Sector suggests that the current pending sector count should always be zero after a full disk write. Since that is not the case here, we can conclude that either (a) Wikipedia is wrong (or at least incorrect for your drive), or (b) the drive's firmware cannot properly handle this error state (which I would consider a firmware bug).

If an unstable sector is subsequently read successfully, the sector is remapped and this value is decreased.

Since the current pending sector count is still unchanged after reading the entire drive, we can assert that either (a) the sector could not be successfully read or (b) the sector was successfully read and marked good, but there was an error reading a different sector. But since the reallocated sector count is still 0 after the read, we can exclude (b) as a possibility and can conclude that the pending sector was still unreadable.

At this point, it would be helpful to know if the drive has logged any new SMART errors. My next suggestion was going to be to check whether Seagate has a firmware update for your drive, but it looks like they don't.

Although I would recommend against continuing to use this drive, it sounds like you might be willing to accept the risks involved (namely, that it could continue to act erratically and/or could further degrade or fail catastrophically). In that case, you can try to install Linux, boot from a rescue CD, then (with the filesystems unmounted) use e2fsck -l filename to manually mark the appropriate block as bad. (Just make sure you maintain good backups!)

e2fsck -l filename

Add the block numbers listed in the file specified by filename to the list of bad blocks. The format of this file is the same as the one generated by the badblocks(8) program. Note that the block numbers are based on the blocksize of the filesystem. Hence, badblocks(8) must be given the blocksize of the filesystem in order to obtain correct results. As a result, it is much simpler and safer to use the -c option to e2fsck, since it will assure that the correct parameters are passed to the badblocks program.

(Note that e2fsck -c is preferred to e2fsck -l filename, and you might even want to try it, but based on your results thus far, I highly doubt e2fsck -c will find any bad blocks.)

Of course, you'll have to do some arithmetic to convert the LBA of the faulty sector (as provided by SMART) into a filesystem block number. The Bad Blocks HowTo provides a handy formula:

  b = (int)((L-S)*512/B)
where:
b = File System block number
B = File system block size in bytes
L = LBA of bad sector
S = Starting sector of partition as shown by fdisk -lu
and (int) denotes the integer part.

The HowTo also contains a complete example using this formula. After the OS is installed, you can confirm whether a file is occupying the flaky sector using debugfs (see the HowTo for detailed instructions).

Another option: partition around the suspected bad block When you install your OS, you could also try to partition around the error. If I did my arithmetic right, the error is at around 81.589 MB, so can either make /boot a little small and start your next partition after sector 167095, or skip the first 82 MB or so completely.

ABRT 235018779 Unfortunately, as for the ABRT error at sector 235018779, we can only speculate, but the ATA8-ACS spec gives us some clues.

From Working Draft AT Attachment 8 - ATA/ATAPI Command Set (ATA8-ACS):

6.2.1 Abort (ABRT) Error bit 2. Abort shall be set to one if the command is not supported. Abort may be set to one if the device is not able to complete the action requested by the command. Abort shall also be set to one if an address outside of the range of user-accessible addresses is requested if IDNF is not set to one.

Looking at the commands leading up to the ABRT (several READ SECTOR(S) followed by recalibration and reinitialization)...

Abort shall be set to one if the command is not supported. - This seems unlikely.

Abort may be set to one if the device is not able to complete the action requested by the command. - Maybe the P-list of reallocated sectors shifts the user-accessible addresses far enough that a user-accessible address translated to sector 235018779, and the read operation was not able to complete (for what reason, we don't know...but there wasn't a CRC error, so I don't think we can conclude that sector 235018779 is bad).

Abort shall also be set to one if an address outside of the range of user-accessible addresses is requested if IDNF is not set to one. - To me this seems most likely, and I would probably interpret it as the result of a software bug (either your OS or some program you were running). In that case, it is not a sign of impending doom for the hard drive.

Just in case you're not tired of running diagnostics yet...

You could try smartctl -t long /dev/sda again to see if it produces any more errors in the SMART log, or you could leave this one as an unsolved X-file ;) and check the SMART log periodically to see whether it happens again. In any case, if you continue to use the drive without getting it to either reallocate or clear the pending sector, you're already taking a risk.

Use a checksumming filesystem

For a little more safety, you may want to consider using a checksumming filesystem such as ZFS or btrfs to help protect against low-level data corruption. And don't forget to perform frequent backups if you have anything that cannot be easily reproduced.

Test Setup

Run SMART Attributes Monitoring Script

Run SMART Tests

Run Immediate Offline Test

Run Short Self-Test

Run Conveyance Self-Test

Run Long/Extended Self-Test

Run badblocks

Check SMART Attributes Before Test

Run badblocks

Check SMART Attributes After Test

Additional Tests

Repeat for Additional Drives

Best Answer

Related Solutions

Hard drive pending sector count

Linux – Trying to remove/diagnose single Current_Pending_Sector in S.M.A.R.T. data

Related Question

Run `badblocks`

Run `badblocks`