So recently I wanted to do this with tar
. Some investigation indicated to me that it was more than a little nonsensical that I couldn't. I did come up with this weird split --filter="cat >file; tar -r ..."
thing, but, well, it was terribly slow. And the more I read about tar
the more nonsensical it seemed.
You see, tar
is just a concatenated list of records. The constituent files are not altered in any way - they're whole within the archive. But they are blocked off on 512-byte block boundaries, and preceding every file there is a header. That's it. The header format is really, very simple as well.
So, I wrote my own tar
. I call it... shitar
.
z() (IFS=0; printf '%.s\\0' $(printf "%.$(($1-${#2}))d"))
chk() (IFS=${IFS#??}; set -f; set -- $(
printf "$(fmt)" "$n" "$@" '' "$un" "$gn"
); IFS=; a="$*"; printf %06o "$(($(
while printf %d+ "'${a:?}"; do a=${a#?}; done 2>/dev/null
)0))")
fmt() { printf '%s\\'"${1:-n}" %s "${1:+$(z 99 "$n")}%07d" \
%07o %07o %011o %011o "%-${1:-7}s" ' 0' "${1:+$(z 99)}ustar " %s \
"${1:+$(z 31 "$un")}%s"
}
That's the meat and potatoes, really. It writes the headers and computes the chksum - which, relatively speaking, is the only hard part. It does the ustar
header format... maybe. At least, it emulates what GNU tar
seems to think is the ustar
header format to the point that it does not complain. And there's more to it, it's just that I haven't really coagulated it yet. Here, I'll show you:
for f in 1 2; do echo hey > file$f; done
{ tar -cf - file[123]; echo .; } | tr \\0 \\n | grep -b .
0:file1 #filename - first 100 bytes
100:0000644 #octal mode - next 8
108:0001750 #octal uid,
116:0001750 #gid - next 16
124:00000000004 #octal filesize - next 12
136:12401536267 #octal epoch mod time - next 12
148:012235 #chksum - more on this
155: 0 #file type - gnu is weird here - so is shitar
257:ustar #magic string - header type
265:mikeserv #owner
297:mikeserv #group - link name... others shitar doesnt do
512:hey #512-bytes - start of file
1024:file2 #512 more - start of header 2
1124:0000644
1132:0001750
1140:0001750
1148:00000000004
1160:12401536267
1172:012236
1179: 0
1281:ustar
1289:mikeserv
1321:mikeserv
1536:hey
10240:. #default blocking factor 20 * 512
That's tar
. Everything's padded with \0
nulls so I just turn em
into \n
ewlines for readability. And shitar
:
#the rest, kind of, calls z(), fmt(), chk() + gets $mdata and blocks w/ dd
for n in file[123]
do d=$n; un=$USER; gn=$(id --group --name)
set -- $(stat --printf "%a\n%u\n%g\n%s\n%Y" "$n")
printf "$(fmt 0)" "$n" "$@" "$(chk "$@")" "$un" "$gn"
printf "$(z $((512-298)) "$gn")"; cat "$d"
printf "$(x=$(($4%512));z $(($4>512?($x>0?$x:512):512-$4)))"
done |
{ dd iflag=fullblock conv=sync bs=10240 2>/dev/null; echo .; } |
tr \\0 \\n | grep -b .
OUTPUT
0:file1 #it's the same. I shortened it.
100:0000644 #but the whole first file is here
108:0001750
116:0001750
124:00000000004
136:12401536267
148:012235 #including its checksum
155: 0
257:ustar
265:mikeserv
297:mikeserv
512:hey
1024:file2
...
1172:012236 #and file2s checksum
...
1536:hey
10240:.
I say kind of up there because that isn't shitar
's purpose - tar
already does that beautifully. I just wanted to show how it works - which means I need to touch on the chksum
. If it wasn't for that I would just be dd
ing off the head of a tar
file and done with it. That might even work sometimes, but it gets messy when there are multiple members in the archive. Still, the chksum is really easy.
First, make it 7 spaces - (which is a weird gnu thing, I think, as the spec says 8, but whatever - a hack is a hack). Then add up the octal values of every byte in the header. That's your chksum. So you need the file metadata before you do the header, or you don't have a chksum. And that's a ustar
archive, mostly.
Ok. Now, what it is meant to do:
cd /tmp; mkdir -p mnt
for d in 1 2 3
do fallocate -l $((1024*1024*500)) disk$d
lp=$(sudo losetup -f --show disk$d)
sync
sudo mkfs.vfat -n disk$d "$lp"
sudo mount "$lp" mnt
echo disk$d file$d | sudo tee mnt/file$d
sudo umount mnt
sudo losetup -d "$lp"
done
That makes three 500M disk images, formats and mounts each, and writes a file to each.
for n in disk[123]
do d=$(sudo losetup -f --show "$n")
un=$USER; gn=$(id --group --name)
set -- $(stat --printf "%a\n%u\n%g\n$(lsblk -bno SIZE "$d")\n%Y" "$n")
printf "$(fmt 0)" "$n" "$@" "$(chk "$@")" "$un" "$gn"
printf "$(z $((512-298)) "$gn")"
sudo cat "$d"
sudo losetup -d "$d"
done |
dd iflag=fullblock conv=sync bs=10240 2>/dev/null |
xz >disks.tar.xz
Note - apparently block devices will just always block correctly. Pretty handy.
That tar
's the contents of the disk device files in-stream and pipes the output to xz
.
ls -l disk*
-rw-r--r-- 1 mikeserv mikeserv 524288000 Sep 3 01:01 disk1
-rw-r--r-- 1 mikeserv mikeserv 524288000 Sep 3 01:01 disk2
-rw-r--r-- 1 mikeserv mikeserv 524288000 Sep 3 01:01 disk3
-rw-r--r-- 1 mikeserv mikeserv 229796 Sep 3 01:05 disks.tar.xz
Now, the moment of truth...
xz -d <./disks.tar.xz| tar -tvf -
-rw-r--r-- mikeserv/mikeserv 524288000 2014-09-03 01:01 disk1
-rw-r--r-- mikeserv/mikeserv 524288000 2014-09-03 01:01 disk2
-rw-r--r-- mikeserv/mikeserv 524288000 2014-09-03 01:01 disk3
Hooray! Extraction...
xz -d <./disks.tar.xz| tar -xf - --xform='s/[123]/1&/'
ls -l disk*
-rw-r--r-- 1 mikeserv mikeserv 524288000 Sep 3 01:01 disk1
-rw-r--r-- 1 mikeserv mikeserv 524288000 Sep 3 01:01 disk11
-rw-r--r-- 1 mikeserv mikeserv 524288000 Sep 3 01:01 disk12
-rw-r--r-- 1 mikeserv mikeserv 524288000 Sep 3 01:01 disk13
-rw-r--r-- 1 mikeserv mikeserv 524288000 Sep 3 01:01 disk2
-rw-r--r-- 1 mikeserv mikeserv 524288000 Sep 3 01:01 disk3
-rw-r--r-- 1 mikeserv mikeserv 229796 Sep 3 01:05 disks.tar.xz
Comparison...
cmp disk1 disk11 && echo yay || echo shite
yay
And the mount...
sudo mount disk13 mnt
cat mnt/*
disk3 file3
And so, in this case, shitar
performs ok, I guess. I'd rather not go into all of the things which it won't do well. But, I will say - don't do newlines in the filenames at the least.
You can also do - and maybe should, considering the alternatives I've offered -this with squashfs
. Not only do you get the single archive built from the stream - but it's mount
able and builtin to the kernel's vfs
:
From pseudo-file.example:
# Copy 10K from the device /dev/sda1 into the file input. Ordinarily
# Mksquashfs given a device, fifo, or named socket will place that special file
# within the Squashfs filesystem, this allows input from these special
# files to be captured and placed in the Squashfs filesystem.
input f 444 root root dd if=/dev/sda1 bs=1024 count=10
# Creating a block or character device examples
# Create a character device "chr_dev" with major:minor 100:1 and
# a block device "blk_dev" with major:minor 200:200, both with root
# uid/gid and a mode of rw-rw-rw.
chr_dev c 666 root root 100 1
blk_dev b 666 0 0 200 200
You might also use btrfs (send|receive)
to stream out a subvolume into whatever stdin
-capable compressor you liked. This subvolume need not exist before you decide to use it as compression container, of course.
Still, about squashfs
...
I don't believe I'm doing this justice. Here's a very simple example:
cd /tmp; mkdir ./emptydir
mksquashfs ./emptydir /tmp/tmp.sfs -p \
'file f 644 mikeserv mikeserv echo "this is the contents of file"'
Parallel mksquashfs: Using 6 processors
Creating 4.0 filesystem on /tmp/tmp.sfs, block size 131072.
[==================================================================================|] 1/1 100%
Exportable Squashfs 4.0 filesystem, gzip compressed, data block size 131072
compressed data, compressed metadata, compressed fragments,...
###...
###AND SO ON
###...
echo '/tmp/tmp.sfs /tmp/imgmnt squashfs loop,defaults,user 0 0'|
sudo tee -a /etc/fstab >/dev/null
mount ./tmp.sfs
cd ./imgmnt
ls
total 1
-rw-r--r-- 1 mikeserv mikeserv 29 Aug 20 11:34 file
cat file
this is the contents of file
cd ..
umount ./imgmnt
That's only the inline -p
argument for mksquash
. You can source a file with -pf
containing as many of those as you like. The format is simple - you define a target file's name/path in the new archive's filesystem, you give it a mode and an owner, and then you tell it which process to execute and read stdout from. You can create as many as you like - and you can use LZMA, GZIP, LZ4, XZ... hmm there are more... compression formats as you like. And the end result is an archive into which you cd
.
More on the format though:
This is, of course, not just an archive - it is a compressed, mountable Linux file-system image. Its format is the Linux kernel's - it is a vanilla kernel supported filesystem. In this way it is as common as the vanilla Linux kernel. So if you told me you were running a vanilla Linux system on which the tar
program was not installed I would be dubious - but I would probably believe you. But if you told me you were running a vanilla Linux system on which the squashfs
filesystem was not supported I would not believe you.
Best Answer
Your file is either truncated or corrupted, so
xz
can't get to the end of the data.tar
complains because the archive stops in the middle, which is logical sincexz
didn't manage to read the whole data.Run the following commands to check where the problem is:
If
cat
complains then the file is corrupted on the disk and the operating system detected the corruption. Check the kernel logs for more information; usually the disk needs to be replaced at this point. If onlyxz
complains then the OS didn't detect any corruption but the file is nevertheless not valid (either corrupted or truncated). Either way, you aren't going to be able to recover this file. You'll need to get it back from your offline backups.