You posted in a comment that you are working on a Mac OS X system. This is an important clue to the purpose of these ._*
files.
These ._*
archive entries are chunks of AppleDouble data that contain the extra information associated with the corresponding file (the one without the ._
prefix). They are generated by the Mac OS X–specific copyfile(3) family of functions. The AppleDouble blobs store access control data (ACLs) and extended attributes (commonly, Finder flags and “resource forks”, but xattrs can be used to store any kind of data).
The system-supplied Mac OS X archive tools (bsdtar
(also symlinked as tar
), gnutar
, and pax
) will generate a ._*
archive member for any file that has any extended information associated with it; in “unarchive” mode, they will also decode those archive members and apply the resulting extended information to the associated file. This creates a “full fidelity” archive for use on Mac OS X systems by preserving and later extracting all the information that the HFS+ filesystem can store.
The corresponding archive tools on other systems do not know to give special handling to these ._*
files, so they are unpacked as normal files. Since such files are fairly useless on other systems, they are often seen as “junk files”. Correspondingly, if a non–Mac OS X system generates an archive that includes normal files that start with ._
, the Mac OS X unarchiving tools will try to decode those files as extended information.
There is, however an undocumented(?) way to make the system-supplied Mac OS X archivers behave like they do on other Unixy systems: the COPYFILE_DISABLE environment variable. Setting this variable (to any value, even the empty string), will prevent the archivers from generating ._*
archive members to represent any extended information associated with the archived files. Its presence will also prevent the archivers from trying to interpret such archive members as extended information.
COPYFILE_DISABLE=1 tar czf new.tar.gz …
COPYFILE_DISABLE=1 tar xzf unixy.tar.gz …
You might set this variable in your shell’s initialization file if you want to work this way more often than not.
# disable special creation/extraction of ._* files by tar, etc. on Mac OS X
COPYFILE_DISABLE=1; export COPYFILE_DISABLE
Then, when you need to re-enable the feature (to preserve/restore the extended information), you can “unset” the variable for individual commands:
(unset COPYFILE_DISABLE; tar czf new-osx.tar.gz …)
The archivers on Mac OS X 10.4 also do something similar, though they use a different environment variable: COPY_EXTENDED_ATTRIBUTES_DISABLE
It won't be fast, especially for a large tarball with lots of files, but in bash you can do this:
tar -tzf tarball.tgz | while IFS= read -r file; do
tar --no-recursion -xzf tarball.tgz -- "$file"
gzip -- "$file"
done
The first tar command extracts the names of the files in the tarball, and passes those names to a while read ...
loop. The file name is then passed to a second tar command that extracts just that file, which is then compressed before the next file is extracted. The --no-recursion
flag is used so trying to extract a directory doesn't extract all the files under that directory, which is what tar would normally do.
You'll still need enough free space to store somewhat more than the original size of the compressed tarball.
Best Answer
You don't need the paranoia at all. GNU
tar
— and in fact any well-writtentar
program produced in the past 30 years or so — will refuse to extract files in the tarball that begin with a slash or that contain..
elements, by default.You have to go out of your way to force modern
tar
programs to extract such potentially-malicious tarballs: both GNU and BSDtar
need the-P
option to make them disable this protection. See the section Absolute File Names in the GNU tar manual.The
-P
flag isn't specified by POSIX,¹ though, so othertar
programs may have different ways of coping with this. For example, the Schily Tools'star
program uses-/
and-..
to disable these protections.The only thing you might consider adding to a naïve
tar
command is a-C
flag to force it to extract things in a safe temporary directory, so you don't have tocd
there first.Asides:
Technically,
tar
isn't specified by POSIX any more at all. They tried to tell the Unix computing world that we should be usingpax
now instead oftar
andcpio
, but the computing world largely ignored them.It's relevant here to note that the POSIX specification for
pax
doesn't say how it should handle leading slashes or embedded..
elements. There's a nonstandard--insecure
flag for BSDpax
to suppress protections against embedded..
path elements, but there is apparently no default protection against leading slashes; the BSDpax
man page indirectly recommends writing-s
substitution rules to deal with the absolute path risk.That's the sort of thing that happens when a de facto standard remains in active use while the de jure standard is largely ignored.