SSD freezing with event ID 153 (SRB_STATUS_ERROR)

freezehard drivehardware-failureperformancessd

My 4 years old OCZ Agility 4 is hiccuping after being moved to a different system. This SSD was my main drive until a month ago and I first noticed the "Event ID 153 – The IO Operation at logical block address xxxx for Disk y was retried" 1 year ago.

I'd just get the sporadic warning, one every couple of hours, and no lag associated. The event detail specified either the 00 09 28 or 00 12 28 codes. Based on these articles (1, 2) they mean:

0x00 = SCSISTAT_GOOD

0x09 = SRB_STATUS_TIMEOUT    or    0x12 = SRB_STATUS_DATA_OVERRUN

0x28 = SCSIOP_READ

Nothing alarming, and SMART data was good. It might have nothing to do with my current issue, as I do get the very occasional data_overrun on my brand new SSD.

After I upgraded the computer I moved the OCZ to a laptop. I formatted it and performed a clean install. I've been experiencing freezes since then.

The issue:
Whatever program I'm using will become unresponsive for a couple of seconds, but I'll still be able to move the cursor around and even alt+tab. Whatever program becomes active will become unresponsive as well, and that lasts until the SSD responds in 2s-10s. I don't get a single event 153 in hours, I get ~10 of these at once during the freezing incident, and that happens multiple times an hour. If I happen to be running a SSD benchmark at the time it'll output either 0MB/s or fail.

OCZ Benchmark write failure

When the drive doesn't hang the benchmark runs smoothly.

SSD benchmark success

I don't get the event 153 if I'm using an external adapter to plug that SSD to an USB port. I wasn't able to determine if Windows just doesn't report that event if the SSD is connected to the USB (the event 153 was only introduced in Win8, it might not support USB data) or if the issue doesn't happen at all.

The SMART data doesn't look too bad to my untrained eyes:

SSD info

The 153 event details code for these incidents are 00 04 2A and 00 04 28:

0x00 = SCSISTAT_GOOD

0x04 = SRB_STATUS_ERROR

0x28 = SCSIOP_READ    or    0x2A = SCSIOP_WRITE

The SBR error happens when the HBA returns a nonspecific bus error.

Could that be caused by a bad SATA adapter? I didn't try a different adapter yet because the model is difficult to find. If it's not the adapter, what can I do to mitigate the issue?

Already tried:

  • Update the SSD firmware. It's already up to date.
  • Increased the over provisioning.
  • Clean OS install. Updated drivers as possible. It's an old laptop (LGP430) and I'm having trouble finding the best drivers for it.
  • Made sure it's using AHCI mode.
  • Double-checked TRIM. It's enabled.
  • Installed Intel Rapid Storage, based on reports that it was able to stabilize SSDs with similar issues. Only older versions support this laptop specs, and installing it was a bad call. Windows crashed so badly I was forced to ntfsfix and mount the partition on Ubuntu so I could run a chkdsk then perform a system restore.

Best Answer

You might be a victim of the Intel Cougar Point (6-series) chipset SATA bug.

As far as I can tell, your system uses the HM65 chipset. This, and other 6-series chipsets, were affected by a problem in the SATA 3 Gb/s ports (ports 2 through 5) where a transistor is driven with excessive voltage, resulting in premature failure. The 6 Gb/s ports (0 and 1) are unaffected. See also: How to avoid purchasing Faulty "Cougar Point" Chipset motherboard?

If your SSD is connected to one of the 3 Gb/s ports (and I can't quite tell because CrystalDiskInfo indicates 6 Gb/s while benchmarks suggest 3 Gb/s), then you might be affected by this bug. This would probably be the case if the SSD is installed as a secondary drive on the system. Unfortunately, this means that the 3 Gb/s ports are unusable, leaving you with just the first two SATA ports to work with.

If the drive is in fact connected to a 6 Gb/s port, the issue may be elsewhere on the motherboard, though chipset failure cannot be ruled out.

Related Question