Linux – Create a write-cache loop device for a much larger block device

block-devicelinuxlinux-kernel

So, long story short.
* I had a 5.5TB RAID5 array with a HP controller.
* HP uses a really, really painful RAID5 algorithm (ref: Delayed Parity)
* I wrote a new block driver that translates requests to a new block device it creates, to the old disks of the HP array – the translation is 1 way (read only).

So I've got a 5.5TB "array" which seems to mount okay, EXT4 filesystem module complains a bit about the journal not being 100%, a couple of directories are not readable due to errors, but the rest of the data seems intact enough to believe my block driver is doing the right thing.

But, I'm very cautious. I don't want to just turn of read-only mode and run fsck on the filesystem only to find it absolutely destroys it because of something I hadn't considered.

What I'd like to do is map the block device to a loop device on a block-level; so I can run fsck on it (writes will be cached in the loop device) and reads will be unionised (if no writes recorded for that sector – read the block device, if there are, read the loop device).

The problem being, I don't have enough disk space to create a 5.5TB file – even this provisioned – to act as a write cache, and it seems a waste of time since there's only likely going to be a few GB of writes – maximum, as fsck does it's job.

The question: Is there a way to loopback/union the block device into a file, essentially zero in size to begin with, that will grow with the writes I make?

Thanks in advance.

Best Answer

You can use either dm-snapshot or NBD in copy-on-write mode.

The dm-snapshot solution is provided here (sorry for not repeating it):

https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file

As for NBD, you can install nbd-server and nbd-client, and then use it like this:

mount /mnt/storage # something with some free space
losetup --read-only /dev/sda1 /dev/loop0 # to ensure it's readonly
ln -s /dev/loop0 /mnt/storage/loop0
nbd-server 127.0.0.1@4242 /mnt/storage/loop0 -c

The symlink is necessary because nbd-server insists storing the temporary write cache file to the same location as the file it is serving. So without the link it would end up in /dev/ which is not useful at all.

Finally connect to it with the client:

nbd-client 127.0.0.1 4242 /dev/nbd0

The only problem with this NBD solution is that it uses quite a lot of RAM (depending on your device size), regardless of temporary storage being available. Since fsck itself is also quite RAM hungry at times, it's possible to run out if you don't have a lot of RAM installed.

Related Question