MacOS – corrupted files, fail checksum but , disk and FS checks with disk utility

command linedisk-utilityhard drivemacosssd

A couple of files appear corrupted on the SSD drive on my son's MBP, currently on macOS Catalina.

They are WAV files which refuse to play and if from a shell I run:

$ sum -r <filename>

I get "Input/output error" on the two offending files, instead of a checksum.

$ sum -r *
23188 45843 01 Bombtrack.wav
58127 58913 02 Killing In The Name.wav
40298 63213 03 Take The Power Back.wav
64550 54096 04 Settle For Nothing.wav
47065 58063 05 Bullet In The Head.wav
38280 55418 06 Know Your Enemy.wav
11798 68313 07 Wake Up.wav
sum: 08 Fistful Of Steel.wav: Input/output error
sum: 09 Township Rebellion.wav: Input/output error
17779 68693 10 Freedom.wav

My concern is that macOS can't find anything nothing wrong with it, and the SMART data checks out. This SSD was a replacement for the original HDD and was fitted at the apple store.

I have run disk utility first aid on both the logical and physical volumes.
The disk utility doesn't find anything wrong with the filesystem or underlying disk
I'm assuming these two files sat on a corrupted part of the SSD.

In the end, Is there anything else I can do to repair or prevent corrupted parts of SSD from being used?

Best Answer

I treat all Input/output (IO) errors as 5 alarm situations. When I see IO in the console log, I save all work, quit all apps and then get a full backup. The filesystem is designed to keep the filesystem intact which means when a file has a problem, the file gets truncated and deleted. Your data loses, the filesystem gets healed. Seeing an IO error bubble up to the application layer is either:

no big deal - you have some corrupt files
a huge deal - you have limited time to back up files that aren’t already backed up

Then once I have a backup - I do watch a day or so for IO errors and delete the files that are affected. If I see the IO errors spread, I do an erase installation and keep monitoring.

SSD are a bit different than HDD so I’ve only seen one SSD ever throw an actual IO error since the controller almost always intercepts and corrects these with checksum. In my experience, 100% of issues are just bit rot, crash and app failures - not that the SSD is starting to show signs of failure. I’ve never had warning of an SSD failing - they just go. Also, the SSD Apple delivers are way, way, way more reliable than the HDD Apple delivered. Erase install is basically a cure-all, get out of jail free card for me in the last 10 years managing Macs. Only when a system can’t install and run a blank OS do I think hardware needs diagnosis and repair.

Back to you, if you don’t have a full backup you trust, please do that now with haste. Next, read up on how to erase. All signs you have indicate your hardware is fine and you might not even find any IO errors in the console app (or using log stream). Since you know exactly how to summon that error - watch the log as you poke at these broken files trying to read / open / checksum them.

Your instincts to test are perfect - the disk and hardware are almost certainly OK - just may only need to wipe the filesystem and restore good files on to a clean OS when the system can’t self heal itself. The SSD controller maps multiple chained storage cells with data, so TRIM and bad blocks are more about keeping a substantial portion of the space free so that “bad blocks” don’t get hard mapped out like hard drives needed. My understanding is perhaps 10% of the drive can go bad and you won’t lose a block or capacity as far as the operating system is concerned.

Related Solutions

MacOS – How to identify and fix files with corrupted / inaccessible disk blocks

If you are facing a healthy file system at the level of its structure and want to find files which have disk faulty blocks, here is how I would proceed:

Make a full backup of your disk with Time Machine or Carbon Copy Cloner

Check this backup.
Run the following heavy and risky (in case you do have bad blocks outside of your filesystem structure) command (make sure the {} is quoted so filenames containing spaces work):
```
find / -type f -print -exec dd if="{}" of=/dev/null bs=1m \;
```

This heavy find command will print for any plain file its name (thus not reading it, but just its directory entry) and then continue making a full and fast read of all its data blocks.

Upon hiting the first file containing bad blocks, this find will cause the kernel to log read error on /var/log/system.log, and it will either slow down or bring your system to a total halt. This will mostly depend on the hard drive capacity to relocate the bad blocks found on its internal pool dedicated to this usual fix task. This file containing bad blocks will be the last name printed by find.

Write down this file name on a piece of paper! Let's say that this file name is:

/.DocumentRevisions-V100/.cs/ChunkStorage/0/0/0/9

At this point you may have the possibility to kill find quickly by hiting ctrl+C. If killing it nicely is failing, just crash your Mac.

Upon rebooting your Mac, directly check the file containing bad blocks:

dd if='/.DocumentRevisions-V100/.cs/ChunkStorage/0/0/0/9' of=/dev/null bs=1m

If the command terminate correctly, then the error was light enough for your disk to be able to read this file and reallocate the bad blocks.

If the command doesn't terminate, you won't be able to kill it normally, your data is totally lost, and you will have to crash your Mac once more.

In this last case, you have to consider replacing your disk and to work from your last backups. Some other files might also contain bad blocks and may have stayed undetected since a long time as long as you didn't read them.

The kernel won't fire a read error on a block you never read.

MacOS – Can the data on an MacBook Pro SSD be recovered after formatting using Disk Utility

Since I was the author of the answer linked, I'll have a go at explaining what's going on....

First off, it's important to note that the study FAST referenced from the SE Information Security community, Reliably Erasing Data From Flash-Based Solid State Drives is from 2011 and now 6 years old¹.

The conference where this paper was presented was in Feb of 2011. Keep in mind that the SATA 3.1 spec which addressed the TRIM issues wasn't released until July of 2011

Within the TRIM specification, included in SATA 3.1,

A drawback of the original ATA TRIM command is that it was defined as a non-queueable command and therefore could not easily be mixed with a normal workload of queued read and write operations. SATA 3.1 introduced a queued TRIM command to remedy this.[62]

There are different types of TRIM defined by SATA Words 69 and 169 returned from an ATA IDENTIFY DEVICE command:

Non-deterministic TRIM: Each read command to the Logical block address (LBA) after a TRIM may return different data.

Deterministic TRIM (DRAT): All read commands to the LBA after a TRIM shall return the same data, or become determinate.

Deterministic Read Zero after TRIM (RZAT): All read commands to the LBA after a TRIM shall return zero.

What all of this comes down to is which type of TRIM command your hardware is capable of executing. From the research I have done thus far, modern SSD's use either DRAT or RZAT depending on the "quality" of the drive (higher quality generally meaning RZAT is implemented).

In drives that implement DRAT, the deleted space is marked "unused" the data "can" be recovered, to a point. With drives that use RZAT, once the TRIM command is executed, the drive will return "zeros" regardless of whatever data is in that space.²

The 2016 MBP uses the SanDisk SDRQKBDC4 064G 64 GB³ NAND which is OEM'd specifically for Apple. Unfortunately, I haven't found detailed specs on concerning TRIM support for these particular drives (the White Paper they reference is 404). While I can't get my hands on any detailed tech specs of this drive, I would wager that it uses the RZAT TRIM command and not DRAT.

Long story short, if the SSD has only DRAT capability, there is the possibility of recovering data, but it's usually in a situation where you were able to stop the deletion process and immediately start data recovery. If it implement RZAT, you can be pretty sure the recovery will only get zeros back unless they take it (along with a court order) back to the manufacturer to get a low level recovery. ²

TL;DR

With new and modern Macs and macOS, if you wipe your drive and reinstall macOS on top of it it is highly unlikely that someone will be able to recover your files.

If you are concerned that the user will take the drive to a forensics lab to recover data, (that's a whole different issue), you should replace the drive. Anything less than that and you will be fine.

¹ USENIX Conference on File and Storage Technologies, 2011.

² Recovering Evidence from SSD Drives in 2014: Understanding TRIM, Garbage Collection and Exclusions. Sept. 23, 2014

³ MacBook Pro 13" Touch Bar Teardown Ifixit.com, ~December 2016.

Best Answer

Related Solutions

MacOS – How to identify and fix files with corrupted / inaccessible disk blocks

MacOS – Can the data on an MacBook Pro SSD be recovered after formatting using Disk Utility

TL;DR

Related Question