Block Device Cache vs Filesystem – Understanding the Differences

block-devicebuffercachelinux

Block devices provide buffering. This means that write() on a block device can return success, before the kernel has written the data to the device. A program can wait for all the buffered writes by calling fsync().

I have used dd (or cat) to write a filesystem image to a device. These commands do not call fsync() by default.

Next, suppose that I want to mount the written block device as a filesystem.

I suppose it is safest to e.g. use the sync command before mounting it. But what if I do not sync the block device? Is it possible that the filesystem might try to read some blocks, which have not yet been written to the device? Then could it read the old contents of the device, and not the correct data from the filesystem image?

My primary interest is in Linux behaviour. (And StackExchange encourages me to ask one specific question. I can upvote any alternative or historical behaviour as well though :-).

Best Answer

When the program closes the block device file, Linux flushes the associated cache, forcing the program to wait. This only applies to the last close() however. It will not happen if something else still has the block device open. Including if any partition of the same block device is still open.

So in the general case, it is best to sync the device somehow.

And to be safe, the way you should sync the device, is to run your dd command using the option conv=fsync. Without this, the kernel will not return write errors. So you would only notice an error if you looked in the kernel log (dmesg).

As well as waiting for all the cached writes, the last close() also drops all of the cache (kill_bdev()). I have verified this for myself, by watching the output of the free command.

linux-4.20/fs/block_dev.c:1778

static void __blkdev_put(struct block_device *bdev, fmode_t mode, int for_part)
{
    struct gendisk *disk = bdev->bd_disk;
    struct block_device *victim = NULL;

    mutex_lock_nested(&bdev->bd_mutex, for_part);
    if (for_part)
        bdev->bd_part_count--;

    if (!--bdev->bd_openers) {
        WARN_ON_ONCE(bdev->bd_holders);
        sync_blockdev(bdev);
        kill_bdev(bdev);

In case you are not familiar with C code, the last block above is equivalent to this:

    bdev->bd_openers = bdev->bd_openers - 1;
    if (bdev->bd_openers == 0) {
        WARN_ON_ONCE(bdev->bd_holders);
        sync_blockdev(bdev);
        kill_bdev(bdev);
Related Question