Hard Drive – Extreme Drops in Hard Disk Performance

hard drivehardware-failureperformance

Basic hardware information:
The hard drive in question is a Seagate BarraCuda 4TB (model number: ST4000DM004). For more details, see the output of hdparm -I among the appendices at the end.

Description of the issue and tests:
The problem, on the surface, appears to be just like the caching of the data to be written to the disk while the write speed is slower than that. However, things do not appear to be that simple in this case.

Copying files (on an NTFS file system):
When writing a reasonably large amount of data, the performance of the drive will drop suddenly and sharply. Again, usually this would be as simple as caching files in RAM, then the disk working afterward for a while. Here, however, when monitoring the /proc/meminfo file (under Ubuntu), the observed behavior does not seem to support this. Even after writing the data (either large files or several smaller ones) and calling sync, the amount of “dirty” memory will continue to decrease for a while, then grind to a near-complete halt. It will keep decreasing very slowly, until sometimes it eventually speeds up. This can repeat, depending on the amount of data left. Reading the device is also extremely sluggish when the writing speed decreases, and will remain so for a while even after sync completes if it does so in “slow mode”.

These initial tests were performed both from Ubuntu 21.10 and Windows 10, with similar behavior.

Additional remark for Windows:
When the disk stayed slow after completing the copy operation, and I tried reading files from the disk (e.g. playing a video, which kept lagging), Resource Monitor and Task Manager both showed a high percentage of disk usage on the device (100% or close to it) while the actual speed shown was <1 MB/s. (The OS also managed to freeze altogether at a point, but that may or may not be strictly related.)

Disk benchmarks:
To see if this is due to the file system or the hardware itself, I performed benchmarks on the device using the gnome-disks utility. The result of one such benchmark that I will show here exemplifies what I described above, the read and write speeds sharply dropping to almost nonexistence after a point, then recovering later (blue and red are respectively read and write speeds at each individual sample taken at locations going from the outside of the disk toward the inside, 1000 in total; the green dots and lines correspond to the access time benchmark which is separate from the others):

Read/write benchmark

Note that, by my understanding, the benchmarking tool eliminates factors such as write caching. Additionally, /proc/meminfo showed little to no data waiting to be written being held in cache during the slow periods in any case; the complete content of it can be seen among the appendices.

With the writes disabled in the benchmark, no such phenomenon presents itself, though there seems to be an anomalous sudden decrease in speed in the inner sections of the disk:

Read-only benchmark

(The location of the decrease is not dependent on time spent, but rather indeed the physical location on the disk, as indicated by other benchmarks with a different sample number where the cutoff happens at the same spot.)

Equivalent benchmarks on other, presumably healthy hard disks in the system yield the expected, regular results like this:

Read/write benchmark on healthy disk

Conclusion / Question:
From this I gather that the issue is likely caused by some hardware or firmware failure, but there may be any number of things I have overlooked.

What might likely causes of the presented phenomenon be? What next steps should I take to diagnose the issue further? Any help is greatly appreciated.

Appendices:
Detailed hardware information (as output by hdparm -I):

/dev/sdb:

ATA device, with non-removable media
        Model Number:       ST4000DM004-2CV104
        Serial Number:      ZFN3J8RH
        Firmware Revision:  0001
        Transport:          Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
Standards:
        Used: unknown (minor revision code 0x006d)
        Supported: 10 9 8 7 6 5
        Likely used: 10
Configuration:
        Logical         max     current
        cylinders       16383   16383
        heads           16      16
        sectors/track   63      63
        --
        CHS current addressable sectors:    16514064
        LBA    user addressable sectors:   268435455
        LBA48  user addressable sectors:  7814037168
        Logical  Sector size:                   512 bytes
        Physical Sector size:                  4096 bytes
        Logical Sector-0 offset:                  0 bytes
        device size with M = 1024*1024:     3815447 MBytes
        device size with M = 1000*1000:     4000787 MBytes (4000 GB)
        cache/buffer size  = unknown
        Form Factor: 3.5 inch
        Nominal Media Rotation Rate: 5425
Capabilities:
        LBA, IORDY(can be disabled)
        Queue depth: 32
        Standby timer values: spec'd by Standard, no device specific minimum
        R/W multiple sector transfer: Max = 16  Current = 16
        Recommended acoustic management value: 208, current value: 208
        DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
             Cycle time: min=120ns recommended=120ns
        PIO: pio0 pio1 pio2 pio3 pio4
             Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
        Enabled Supported:
           *    SMART feature set
                Security Mode feature set
           *    Power Management feature set
           *    Write cache
           *    Look-ahead
           *    Host Protected Area feature set
           *    WRITE_BUFFER command
           *    READ_BUFFER command
           *    DOWNLOAD_MICROCODE
                Power-Up In Standby feature set
           *    SET_FEATURES required to spinup after power up
                SET_MAX security extension
           *    48-bit Address feature set
           *    Mandatory FLUSH_CACHE
           *    FLUSH_CACHE_EXT
           *    SMART error logging
           *    SMART self-test
           *    General Purpose Logging feature set
           *    WRITE_{DMA|MULTIPLE}_FUA_EXT
           *    64-bit World wide name
                Write-Read-Verify feature set
           *    WRITE_UNCORRECTABLE_EXT command
           *    {READ,WRITE}_DMA_EXT_GPL commands
           *    Segmented DOWNLOAD_MICROCODE
           *    unknown 119[6]
           *    unknown 119[7]
           *    Gen1 signaling speed (1.5Gb/s)
           *    Gen2 signaling speed (3.0Gb/s)
           *    Gen3 signaling speed (6.0Gb/s)
           *    Native Command Queueing (NCQ)
           *    Host-initiated interface power management
           *    Phy event counters
           *    READ_LOG_DMA_EXT equivalent to READ_LOG_EXT
           *    DMA Setup Auto-Activate optimization
                Device-initiated interface power management
           *    Software settings preservation
                unknown 78[7]
           *    SMART Command Transport (SCT) feature set
           *    SCT Write Same (AC2)
           *    SCT Data Tables (AC5)
                unknown 206[7]
                unknown 206[12] (vendor specific)
                unknown 206[13] (vendor specific)
           *    DOWNLOAD MICROCODE DMA command
Security:
        Master password revision code = 65534
                supported
        not     enabled
        not     locked
                frozen
        not     expired: security count
                supported: enhanced erase
        490min for SECURITY ERASE UNIT. 490min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 5000c500c6a79fae
        NAA             : 5
        IEEE OUI        : 000c50
        Unique ID       : 0c6a79fae
Checksum: correct

/proc/meminfo during the first benchmark, at the time when the read/write speed was slow:

MemTotal:       16323712 kB
MemFree:         9894056 kB
MemAvailable:   12815716 kB
Buffers:          138380 kB
Cached:          3038420 kB
SwapCached:            0 kB
Active:          1533040 kB
Inactive:        4396560 kB
Active(anon):       2960 kB
Inactive(anon):  2817480 kB
Active(file):    1530080 kB
Inactive(file):  1579080 kB
Unevictable:          32 kB
Mlocked:              32 kB
SwapTotal:      17577980 kB
SwapFree:       17577980 kB
Dirty:               176 kB
Writeback:             0 kB
AnonPages:       2752844 kB
Mapped:           694816 kB
Shmem:             73200 kB
KReclaimable:     137092 kB
Slab:             260112 kB
SReclaimable:     137092 kB
SUnreclaim:       123020 kB
KernelStack:       13872 kB
PageTables:        33292 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    25739836 kB
Committed_AS:    9749696 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       76616 kB
VmallocChunk:          0 kB
Percpu:             8128 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      512904 kB
DirectMap2M:     7813120 kB
DirectMap1G:     8388608 kB

Best Answer

The Seagate ST4000DM004 uses SMR to write data to the disk surface. This means, that in order to write a single byte, it might have to rewrite multiple gigabytes.

In "normal usage patterns" (as designated so by HDD vendors, not by users!) this creates not much of a problem - the data is written to a CMR cache on the outer rim of the disk. Later, when disk usage goes down, the firmware will move the date to its final place in an SMR band.

When writing larger quantities of data at a time, this CMR cache is exhausted and the process of I/O to SMR bands has to take over - this is slower by orders of magnitude.

Nota bene: This is not a RAM cache - it is a small part of the disk surface, that is written in CMR (i.e., without overlapping tracks) to make the SMR horror less visible to users.

Related Question