Linux – Is it necessary to explicitly flush the HDD on-disk write caches

cachecommand linelinuxunmountingusb

Abstract

Sometimes, the Linux kernel is not aware of on-drive write caches of external USB storage devices. Is it necessary in such situations to explicitly flush those caches prior detaching these devices?

Example

I use a WD Elements external USB HDD of which hdparm -I says

...
Commands/features:
    Enabled Supported:
       ...
       *    Write cache
...

and hdparm -W:

...
 write-caching =  1 (on)

On the other hand, when I plug in the drive, I get the following kernel messages for it:

... No Caching mode page found
... Assuming drive cache: write through

According to this answer by Kyle Jones, these kernel messages indicate that the kernel assumes that its write operations will go directly to the platter.

The section "write_cache (RW)" of the file Documentation/block/queue-sysfs.txt in the Linux Kernel Documentation describes an implication of the kernel assuming a write through cache mode (thanks to Wayne Conrad):

… "write through", … will also eliminate cache flushes issued by
the kernel.

Questions

Up to now, my standard method to detach external USB storage devices from a Linux system was to unmount all mounted partitions on it, to wait until the drive's LED stops flashing, to physically unplug the USB connector, and if this does not power down the device (some have a separate power supply), to explicitly power it down.

Is this method safe, or does it imply the risk of loosing unflushed data in the on-drive write cache, in particular if the kernel is unaware of that cache?

In the latter case, it seems to be advisable to explicitly flush the on-drive write cache after the unmount by sending an SCSI sync command. This can be done for example using sg_sync which comes with the sg3-utils:

sg_sync <device>

Would this solve the problem? Or should this be supplemented by an SCSI stop command to the device?

The latter could be issued with (again sg3-utils):

sg_start 0 -r <device>

Are sg_sync and sg_start the right tools for this purpose or is it better to use one of the tools and methods that I mention below in section "Related methods" or to do something else to handle the problem?

To point out the relevance of this question, see this comment by ack.

Please note: I am not looking for a method to primarily spin down or power off a HDD by software—unplugging the drive and using power switches have proven to be rather reliable in this respect. Instead, I am looking for a method that guarantees that all written data have made it to a non-volatile storage prior the drive is shut down, including data that was cached on-drive.

Related methods

In the following, I give some comments on methods that I found to be related to flushing on-drive caches or, more general, to "safe removal" of removable storage devices. There, one aspect is the availability of the involved tools on existing Linux systems. This matters because it is not always possible to simply install missing software.

Desktop environment: Some desktop environments offer widgets to "safely remove" external USB storage devices. See for example these questions:

and related posts.

Depending on the implementation, these widgets seem to power down the devices, but I did not find any reference that clarifies whether they cause the devices to flush their caches beforehand, especially in cases in which the kernel is not aware of external on-drive caches.

In addition, many Linux systems (for example pure servers) do not have any desktop environment installed. So this method is not always available.

udisks: According to this answer by jimmij, udisks by Freedesktop.org can be used from the command line to "safely remove" an external USB storage device:

udisks --unmount /dev/sda1
udisks --detach /dev/sda

This excellent answer by Totor describes what the udisks --detach command does:

  • sends SCSI sync-cache command,
  • sends SCSI stop command,
  • unbinds the usb-storage kernel driver,
  • suspends the USB device (power),
  • logically disables/removes it from its USB port.

So it explicitly takes care of the on-drive caches. However, udisks is only available on about one half of the Linux systems I have to deal with.

udisksctl: Recent versions of udisks provide for the program udisksctl, that can be used for a substitute of the udisks commands shown above. This is again according to jimmij's answer:

udisksctl unmount -b <partition>
udisksctl power-off -b <device>

The same answer also cites the description of the power-off command in the udisksctl(1) man page:

power-off

Arranges for the drive to be safely removed and powered off. On the OS
side this includes ensuring that no process is using the drive, then
requesting that in-flight buffers and caches are committed to stable
storage.

Unfortunately this does not specify whether the "in-flight buffers and caches" includes external on-drive caches that the kernel is not aware of. But it is likely that udisksctl follows its predecessor udisks in this respect.

Unfortunately, udisksctl definitely does follow its predecessor with respect to its rather low availability on existing systems (compared to the availability of umount, for example).

eject: According to its man page, the command line tool eject can be used to eject removable media under software control. Affected partitions will be unmounted beforehand as needed.

Albeit this tool shows up in several discussions on "safe removal" of removable media, see for example this question by LGenzelis and this question by k.Cyborg, nothing indicates that it does more than an unmount in this respect.

In addition, I suspect that this tool focuses more on media and media trays than on devices. This might be the reason why it simply dies with the error message

eject: unable to eject

when it is applied on the WD Elements USB drive that I mentioned as the introductory example above. However, it succeeds on some USB memory sticks.

Nevertheless, as a part of the linux-utils, this tool is highly available.

sg3-utils: The programs sg_sync and sg_start were already discussed above.

This comment by quirks to an Ubuntu bug report indicates that udisks internally uses the sg3-utils to send its SCSI sync and stop commands to the device.

It seems to me that the sg3-utils have a wider availability than udisks. But this is only a vague and personal impression.

sdparm: This web page by Yan Li discusses a procedure to "Safely remove an USB hard drive in Linux". For that purpose, it recommends a script that in principle uses the following sdparm commands to flush on-drive caches and to stop (spin down?) the USB HDD:

sdparm --command=sync <device>
sdparm --command=stop <device>

These seem to be comparable to the sg_sync and sg_start commands discussed above, and might be used as a substitute for the latter.

hdparm: Being basically a tool to manage ATA drives, hdparm is in some sense a foreign object when it comes to USB drives, because the latter are primarily addressed as SCSI devices in Linux. In cases like our WD Elements example, there is an ATA HDD sitting behind an SCSI to ATA translation layer (SAT layer). See this answer by Mikko Rantalainen for more details. Depending on the SAT's implementation, hdparm can be used with limited functionality to manipulate these drives.

If supported, one can use the commands

hdparm -F <device>
hdparm -Y <device>

to flush the on-drive caches and to stop the drives. If this is not possible, one might think of using

hdparm -W0 <device>

as a workaround to flush the caches. But beware: This command is actually intended to switch the on-drive write caching off. Therefore one should make sure that it actually flushes rather than simply drops the cache contents that was accumulated up to now.

For the WD Elements drive, none of these commands work: Instead they report bad/missing sense data.

sysfs: Scattered over the web, there are recipes to unbind an external USB drive from its driver, to unregister the drive from the system, or to power it down by manipulating device attributes that the Linux kernel exposes in its sysfs.

Here is an example from this post by bash in the Debian forum:

echo "auto" > "/sys/bus/usb/devices/usb1/1-5/power/level"
echo "1-5:1.0" > /sys/bus/usb/devices/1-5\:1.0/driver/unbind

and

echo "1" > "/sys/bus/usb/devices/usb1/1-5/remove"

Or from this answer by Tony George:

echo 'offline' > /sys/block/sdb/device/state
echo '1' > /sys/block/sdb/device/delete

I have too little knowledge about the concepts that the kernel developers had in mind regarding the usage of these attributes to be able to judge these code snippets. But I have some doubt that they are helpful to flush on-drive write caches that the kernel is not aware of.

In addition, it seems that at least some of these recipes are outdated. In this respect, see the first paragraph of the "Sysfs Rules" in the kernel documentation as of 2019-01-25:

The kernel-exported sysfs exports internal kernel implementation
details and depends on internal kernel structures and layout. It is
agreed upon by the kernel developers that the Linux kernel does not
provide a stable internal API. Therefore, there are aspects of the
sysfs interface that may not be stable across kernel releases.

Umount and wait: This is the method that I referred to in my question above: It comprises an unmount and a subsequent waiting until the (presumably) write activities on the drive have finally ceased. The next and final step is to hope that the on-drive write caches were flushed in the course of that.

It is the main point of this question to clarify whether this is a safe method (for the data).

In any case, this method has the big advantage that it is easy, it is available on any Linux system I know, and the syntax of the umount command did not change over decades.

Further reading

Some general discussions about "ejecting" and "safe removal" of USB or other external storage devices can be found in the context of these questions:

In addition, Luis Alvarado describes in this answer the differences between the options "Unmount", "Eject", and "Safely Remove Drive" that are offered by Ubuntu's desktop environment, in particular that there is more to "Safely Remove Drive" than just an unmount. With respect to the latter see also

Many contributions take the view that it is "safe to remove" an external storage device as soon as it is unmounted. Here some examples:

Some contributions focus on spinning down external HDDs in addition to an unmount. See for example this question by winchendonsprings. In this comment, sourcejedi points out that spinning down an USB drive reduces its vulnerability to mechanical disturbances.

Others aim on powering off USB drives:

In the following contributions, on-drive caches are explicitly mentioned with respect to "safe removal" of storage devices:

The latter is the only contribution I have found that points out the damage that can arise due to wrong assumptions of the kernel with respect to external on-drive write caching. This is exactly the focus of this question. According to that comment, simply unmounting the drive is not enough to prevent that damage. Further details and possible ways to do it right would be highly appreciated.

Best Answer

I have been scouring the internet for answers on an boot error about write cache on external drives connected via usb. To answer the first question a) yes it is important to flush the buffers/cache before pulling out the usb. I suggest that like it or not, you need to umount the external usb drive so that the os can ensure that everything is written out to the hard drive. b) reading everything that i have read, and figuring out that the answer was just in front of me. "I believe the error about cache page code not found, assuming write through" (found while using the journalctl -b command), is inherent with booting the system with an external hard drive attached via usb. This is a matter of good design!!! USB devices can be pulled out at any time. If they are without the os aware or given enough time to prepare, the data becomes corrupted!!! So the errors that are recorded by the journalctl, will be ignored by me because it is part of the design. The only way for me to avoid those errors would be not to mount them via the /etc/fstab and let it automount, when plugged in.

Related Question