I have six Linux logical volumes that together back a virtual machine. The VM is currently shutdown, so its easy to take consistent images of them.
I'd like to pack all six images together in an archive. Trivially, I could do something like this:
cp /dev/Zia/vm_lvraid_* /tmp/somedir
tar c /tmp/somedir | whatever
But that of course creates an extra copy. I'd like to avoid the extra copy.
The obvious approach:
tar c /dev/Zia/vm_lvraid_* | whatever
does not work, as tar recognizes the files a special (symlinks in this case) and basically stores the ln -s
in the archive. Or, with --dereference
or directly pointed at /dev/dm-X
, it recognizes them as special (device files) and basically stores the mknod
in the archive.
I've searched for command-line options to tar to override this behavior, and couldn't find any. I also tried cpio
, same problem, and couldn't find any options to override it there, either. I also tried 7z
(ditto). Same with pax
. I even tried zip
, which just got itself confused.
edit: Looking at the source code of GNU tar and GNU cpio, it appears neither of them can do this. At least, not without serious trickery (the special handling of device files can't be disabled). So, suggestions of serious trickery would be appreciated or alternate utilities.
TLDR: Is there some archiver that will pack multiple disk images together (taken from raw devices) and stream that output, without making extra on-disk copies? My preference would be output in a common format, like POSIX or GNU tar.
Best Answer
So recently I wanted to do this with
tar
. Some investigation indicated to me that it was more than a little nonsensical that I couldn't. I did come up with this weirdsplit --filter="cat >file; tar -r ..."
thing, but, well, it was terribly slow. And the more I read abouttar
the more nonsensical it seemed.You see,
tar
is just a concatenated list of records. The constituent files are not altered in any way - they're whole within the archive. But they are blocked off on 512-byte block boundaries, and preceding every file there is a header. That's it. The header format is really, very simple as well.So, I wrote my own
tar
. I call it...shitar
.That's the meat and potatoes, really. It writes the headers and computes the chksum - which, relatively speaking, is the only hard part. It does the
ustar
header format... maybe. At least, it emulates what GNUtar
seems to think is theustar
header format to the point that it does not complain. And there's more to it, it's just that I haven't really coagulated it yet. Here, I'll show you:That's
tar
. Everything's padded with\0
nulls so I just turnem
into\n
ewlines for readability. Andshitar
:OUTPUT
I say kind of up there because that isn't
shitar
's purpose -tar
already does that beautifully. I just wanted to show how it works - which means I need to touch on thechksum
. If it wasn't for that I would just bedd
ing off the head of atar
file and done with it. That might even work sometimes, but it gets messy when there are multiple members in the archive. Still, the chksum is really easy.First, make it 7 spaces - (which is a weird gnu thing, I think, as the spec says 8, but whatever - a hack is a hack). Then add up the octal values of every byte in the header. That's your chksum. So you need the file metadata before you do the header, or you don't have a chksum. And that's a
ustar
archive, mostly.Ok. Now, what it is meant to do:
That makes three 500M disk images, formats and mounts each, and writes a file to each.
Note - apparently block devices will just always block correctly. Pretty handy.
That
tar
's the contents of the disk device files in-stream and pipes the output toxz
.Now, the moment of truth...
Hooray! Extraction...
Comparison...
And the mount...
And so, in this case,
shitar
performs ok, I guess. I'd rather not go into all of the things which it won't do well. But, I will say - don't do newlines in the filenames at the least.You can also do - and maybe should, considering the alternatives I've offered -this with
squashfs
. Not only do you get the single archive built from the stream - but it'smount
able and builtin to the kernel'svfs
:From pseudo-file.example:
You might also use
btrfs (send|receive)
to stream out a subvolume into whateverstdin
-capable compressor you liked. This subvolume need not exist before you decide to use it as compression container, of course.Still, about
squashfs
...I don't believe I'm doing this justice. Here's a very simple example:
That's only the inline
-p
argument formksquash
. You can source a file with-pf
containing as many of those as you like. The format is simple - you define a target file's name/path in the new archive's filesystem, you give it a mode and an owner, and then you tell it which process to execute and read stdout from. You can create as many as you like - and you can use LZMA, GZIP, LZ4, XZ... hmm there are more... compression formats as you like. And the end result is an archive into which youcd
.More on the format though:
This is, of course, not just an archive - it is a compressed, mountable Linux file-system image. Its format is the Linux kernel's - it is a vanilla kernel supported filesystem. In this way it is as common as the vanilla Linux kernel. So if you told me you were running a vanilla Linux system on which the
tar
program was not installed I would be dubious - but I would probably believe you. But if you told me you were running a vanilla Linux system on which thesquashfs
filesystem was not supported I would not believe you.