Shell – Self extracting scripts: tar -xO and dd

ddshell-scripttar

I am working with a script that is a self extracting script. It's a script to install packages on a QNAP NAS.

It has some scripting at the start which extracts the rest of the file. Here it goes:

script_len=102
/bin/dd if="${0}" bs=$script_len skip=1 | /bin/tar -xO | /bin/tar -xzv

This uses dd to copy the bytes from byte 102 onwards into tar, where it is extracted.

What does -xO do? And why is it extracted "twice" (two invocations of tar with -x) ? ~~I couldn't find much discussion of this online – the man page seems to suggest it's something to do with "drives".~~ (Looks like I got my 0s and Os mixed up!)

Subsequently, the script does:

offset=$(/usr/bin/expr $script_len + 2042)
/bin/dd if="${0}" bs=$offset skip=1 | /bin/cat | /bin/dd bs=1024 count=7 of=$_EXTRACT_DIR/data.tar.gz

This appears to skip further into the file, and copies the bytes there into a new zipped TAR. Presumably those bytes are already structured and encoded that way.

But didn't we already read those bytes through tar in the first command? I see no way in which dd was told to stop reading the file.

Best Answer

Let's take a look at a QNAP package, e.g. http://www.twonkyforum.com/downloads/8.3/TwonkyServerEU_8.3_arm-x41.qpkg

....
script_len=2467
/bin/dd if="${0}" bs=$script_len skip=1 | /bin/tar -xO | /bin/tar -xzv -C $_EXTRACT_DIR script_len=2467
....

Now let's copy the data with dd, and look what's inside:

%dd if=TwonkyServerEU_8.3_arm-x41.qpkg bs=2467 skip=1 > first

That's a raw TAR archive, with a single tar.gz file inside it:

%file first 
first: POSIX tar archive (GNU)

%tar -tvf first 
-rw-r--r-- admin/administrators 7175 2017-01-06 17:49 control.tar.gz

The next pipeline step is /bin/tar -xO, and here is what TAR manual says on it:

To write the extracted files to the standard output, instead of creating the files on the file system, use --to-stdout' (-O') in conjunction with --extract' (--get', `-x').

This option is useful if you are extracting files to send them through a pipe, and do not need to preserve them in the file system. If you extract multiple members, they appear on standard output concatenated, in the order they are found in the archive.

As there is just one file control.tar.gz inside the archive, it will get extracted to STDOUT, to be processed by the next pipeline step, which will invoke TAR again to extract the inner content from it.

So, basically, there is a 'tar.gz' archive inside the 'tar' archive, which is why two sequential tar commands are necessary to extract it.

Note that tar is inherently designed to operate on stream data, so it can reliably detect the end of archive, even if it is followed by more data:

Physically, an archive consists of a series of file entries terminated by an end-of-archive entry, which consists of two 512 blocks of zero bytes.

So, tar -xO, will stop after the first data file read, and discard the rest, which I guess was a rationale for using this storage format in qpkg.

Related Solutions

Tar – Transform a Tar Archive’s Paths Without Extracting It

You could mount the archive with archivemount or mountavfs and recreate it again

archivemount tarfile.tar /mnt
cd /mnt
tar cf /tmp/tarfile.tar --transform 's/foo/bar/' .

write operations on the archive filesystem will perfom a full rewrite on umount, so don't seem a good option for large files.

EDIT

I don't know implementation details but seem like we are saving the write files to filesystem step.

Just test to solve dudes, (over a tar of my /usr)

#!/bin/bash

# try to avoid slab cache issues 
cat /tmp/usr.tar > /dev/null

T="$(date +%s)"
tar xf /tmp/usr.tar
tar cf usr.tar usr --transform 's/usr/foo/'
T="$(($(date +%s)-T))"
echo "Tar/Untar seconds: ${T}"

T="$(date +%s)"
archivemount -o readonly -o nobackup /tmp/usr.tar /mnt
tar cf usr.tar /mnt  --transform 's/usr/foo/'
umount /mnt
T="$(($(date +%s)-T))"
echo "Archivemount seconds: ${T}"

T="$(date +%s)"
mountavfs
cd '/root/.avfs/tmp/usr.tar#'
tar cf /tmp/test/usr.tar   --transform 's/usr/foo/' .
T="$(($(date +%s)-T))"
echo "Avfs seconds: ${T}"

Output:

Tar/Untar seconds: 480
Archivemount seconds:  failure, a lot of read errors.
Avfs seconds: 217

So Avfs wins!.

Shell – View a file in a tar archive without extracting it

It's probably a GNU specific option, but you could use the -O or --to-stdout to extract files to standard output

$ tar -axf file.tgz foo/bar -O

Best Answer

Related Solutions

Tar – Transform a Tar Archive’s Paths Without Extracting It

Shell – View a file in a tar archive without extracting it

Related Question