I am working with a script that is a self extracting script. It's a script to install packages on a QNAP NAS.
It has some scripting at the start which extracts the rest of the file. Here it goes:
script_len=102
/bin/dd if="${0}" bs=$script_len skip=1 | /bin/tar -xO | /bin/tar -xzv
This uses dd
to copy the bytes from byte 102 onwards into tar, where it is extracted.
What does -xO
do? And why is it extracted "twice" (two invocations of tar with -x
) ? I couldn't find much discussion of this online – the man page seems to suggest it's something to do with "drives". (Looks like I got my 0s and Os mixed up!)
Subsequently, the script does:
offset=$(/usr/bin/expr $script_len + 2042)
/bin/dd if="${0}" bs=$offset skip=1 | /bin/cat | /bin/dd bs=1024 count=7 of=$_EXTRACT_DIR/data.tar.gz
This appears to skip further into the file, and copies the bytes there into a new zipped TAR. Presumably those bytes are already structured and encoded that way.
But didn't we already read those bytes through tar in the first command? I see no way in which dd
was told to stop reading the file.
Best Answer
Let's take a look at a QNAP package, e.g. http://www.twonkyforum.com/downloads/8.3/TwonkyServerEU_8.3_arm-x41.qpkg
Now let's copy the data with
dd
, and look what's inside:That's a raw TAR archive, with a single tar.gz file inside it:
The next pipeline step is
/bin/tar -xO
, and here is what TAR manual says on it:As there is just one file
control.tar.gz
inside the archive, it will get extracted to STDOUT, to be processed by the next pipeline step, which will invoke TAR again to extract the inner content from it.So, basically, there is a 'tar.gz' archive inside the 'tar' archive, which is why two sequential
tar
commands are necessary to extract it.Note that
tar
is inherently designed to operate on stream data, so it can reliably detect the end of archive, even if it is followed by more data:So,
tar -xO
, will stop after the first data file read, and discard the rest, which I guess was a rationale for using this storage format inqpkg
.