How to securely extract an untrusted tar file

Securitytar

I would like to be able to extract a tar file, such that all extracted files are placed under a certain prefix directory. Any attempt by the tar files to write to outside directories should cause the extraction to fail.

As you might imagine, this is so that I can securely extract an untrusted tar file.

How can I do this with GNU tar?

I came up with:

tar --exclude='/*' --exclude='*/../*' --exclude='../*' -xvf untrusted_file.tar

but I am not sure that this is paranoid enough.

Best Answer

You don't need the paranoia at all. GNU tar — and in fact any well-written tar program produced in the past 30 years or so — will refuse to extract files in the tarball that begin with a slash or that contain .. elements, by default.

You have to go out of your way to force modern tar programs to extract such potentially-malicious tarballs: both GNU and BSD tar need the -P option to make them disable this protection. See the section Absolute File Names in the GNU tar manual.

The -P flag isn't specified by POSIX,¹ though, so other tar programs may have different ways of coping with this. For example, the Schily Tools' star program uses -/ and -.. to disable these protections.

The only thing you might consider adding to a naïve tar command is a -C flag to force it to extract things in a safe temporary directory, so you don't have to cd there first.


Asides:

  1. Technically, tar isn't specified by POSIX any more at all. They tried to tell the Unix computing world that we should be using pax now instead of tar and cpio, but the computing world largely ignored them.

    It's relevant here to note that the POSIX specification for pax doesn't say how it should handle leading slashes or embedded .. elements. There's a nonstandard --insecure flag for BSD pax to suppress protections against embedded .. path elements, but there is apparently no default protection against leading slashes; the BSD pax man page indirectly recommends writing -s substitution rules to deal with the absolute path risk.

    That's the sort of thing that happens when a de facto standard remains in active use while the de jure standard is largely ignored.

Related Question