Windows – Files on HDD are getting corrupted

file-corruptionhard drivememorymemory-timingswindows 8.1

tl;dr

On my new PC (with windows 8.1 x64) some files on local SATA-HDD are getting corrupted without visible reason (after some IDLE).

Not a virus/malware! (did test with AVG antivirus installed, also with clean brand-new 8.1 with no any third-party software/drivers)

No HW-failures detected by various test-utilities.

Long version

I noticed that some files in my archives are getting corrupted after some IDLE time.

Seems they are always same files who are getting corrupted: with my last tests on set of >33000 jpeg files I'm getting list of same 30 files which are always getting corrupted. It looks like these 30 files contain some specific bytes-sequence, which under certain conditions 'activates' corruption.

(After I realized there is a problem, I'm periodically restoring files from backup, and then comparing them against backup with WinMerge/BeyondCompare)

The corruption pattern is pretty same: in most cases some last bytes (about 10-20 last bytes) are filled with random data. But not always – also met files with random data in the beginning/middle of file.

I did some tests for HW-issues, but didn't find any issues:

  • tested RAM (with MemTest86+ and some other tools – was testing with different fill-patterns over night – no issues detected)
  • tested HDD (detected S.M.A.R.T. issues on 0x05 'Reallocated Sectors Count' attribute, exchanged HDD by warranty (same model). Now no S.M.A.R.T. issues, no bad-sectors on surface-scans.

Also did many various experiments. Like:

  • Reinstalled windows
  • Tried with clean windows (even with no drivers from motherboard's manufacturer, only default provided by Microsoft)
  • Tried with all proper drivers installed (downloaded from manufacturer's homepage)
  • Deleted all partitions and repartioned/formatted HDD
  • Tried with AVG Antivirus installed and without any

One test gave positive results (probably): used PartedMagic Linux booted from USB stick. I've got no corruptions after several weeks of linux usage. But I'm still not sure if this linux distribution was using same HW-access modes (like memory-usage, or some SATA-connection, etc.), or it simply didn't happen by chance.

In the beginning I thought that's something with windows drivers/cache configuration. Same question I raised on Microsoft Community, but got no solution. ( answers.microsoft.com/en-us/windows/forum/windows8_1-files/files-on-hdd-getting-corrupted/e2b04d4f-d3ea-492d-a181-c1d437ab1507 )

The problem still in analysis: I still didn't get the stable/predictable sequence to reproduce the issue. Currently I'm using more or less quasi-stable reproduce sequence (which still takes several days to reproduce issue):

  1. Modify config (HW, or SW)
  2. Restore files from backup
  3. Start WinMerge with comparing archive on HDD with backup copy at NAS (over local network)
  4. If no corruption detected, goto step 3.

Step 3. takes several hours (4-6), also corruptions may be detected after several iterations. Probably it happens if I try to use computer while it's comparing – not sure.

My current theory: it might be related to RAM (even though corrupted files never accessed on write mode. might be windows does some transparent reallocations of compressed NTFS content during some internal files-indexing procedure… don't know).

  • Removed single DDR module: issue wasn't reproduced after 3 days of continuous testing.
  • Replaced 'good' module with previously extracted potentially 'bad' module: issue was reproduced during 1 day. (though MemTest86+, immediately after issue, didn't detect any problems with RAM – did 6 passes of extended tests)
  • Keept 'bad' module installed, but modified RAM frequency in the BIOS 1600MHz->1300MHz – already running comparison tests for 3 days – no issue reproduced so far.

Hardware

Software

  • Windows 8.1 64bits (with all up-to-date updates)
  • Filesystem: NTFS compressed

Questions

Taking into account all above, could anybody advice or confirm my assumptions:

  1. Does anybody have any idea what could be a reason? Or what else can I do to detect a reason? Are there any other test tools which can do some deep tests (like memory test during intensive video memory usage, etc)?

  2. If my current assumption is right (probably my KINGSTON RAM model is not fully compatible with motheboard, or one RAM module is kinda defective and doesn't work properly at 1600MHz), with which test-tools can I prove that? (MemTest86+ and couple other didn't detect any problems)

  3. Today I've also noticed: when in BIOS I switch memory timings from AUTO to MANUAL, default values differ from recommended by KINGSTON specifications: there should be tRAS>33.75 (in BIOS the default value is 27), tRFC should be >260 (in BIOS the default value is 208, but maximum is 255, which still less than recommended 260ns). Could that theoretically be a reason? (will test manual timings as well, but would take some time).

Best Answer

So, after two months and some more experiments. :-)

tl;dr;

The problem has been solved by disabling NTFS compression.

The root cause is still unknown: I believe it can be caused either by HDD, memory, or motherboard. Or by implementation of the NTFS compression.

Long version

I played with RAM timings - didn't help.

Contacted to manufacturers support with questions on known hardware issues. RAM and motherboard manufacturers don't have any information on known issues. HDD manufacturer (Toshiba) didn't answer :-)

Anyhow, after I disabled compression, the issue wasn't reproduced after almost 2 months of normal computer usage. While another sample copy, stored in the compressed folder, was corrupted/restored many times.

It might be there is a bug in the implementation of the compression algorithm used in Windows 8.1.

I've also tested with Windows 10 release - compressed files get corrupted during one day of IDLE.

Related Question