How to filter the contents of a tar file, producing another tar file in the pipe

pipetar

Consider a single tar file from an external system which contains some directories with various attributes which I want to retain such as permissions, mtimes, etc. How can I easily take a subset of these files as a regular user (not root)?

Looking for something like:

tar -f some.tar.gz --subset subdir/ | ssh remote@system tar xvz

It is also essential that the main attributes (ownership, group, mode, mtime) in this tar archive are retained. What about other attributes in a tar file such as extended header keywords?

Bonus points for a solution that avoids use of a temporary directory in case this subdir contains huge files.

Best Answer

bsdtar (based on libarchive) can filter tar (and some other archives) from stdin to stdout. It can for example pass through only filenames matching a pattern, and can do s/old/new/ renaming. It's already packaged for most distros, for example as bsdtar in Ubuntu.

sudo apt-get install bsdtar   # or aptitude, if you have it.

# example from the man page:
bsdtar -c -f new.tar --include='*foo*' @old.tgz
#create new.tar containing only entries from old.tgz containing the string ‘foo’
bsdtar -czf - --include='*foo*' @-  # filter stdin to stdout, with gzip compression of output.

Note that has a wide choice of compression formats for input/output, so you don't have to manually pipe through gunzip / lz4 yourself. You can use - for stdin with the @tarfile syntax, and/or - for stdout like normal.


My searching also found this streaming tar modify tool which appears to want you to define the archive changes you want using javascript. (I think the whole thing is written in js).

https://github.com/mafintosh/tar-stream

Related Question