This will list all the PDFs:
$ find dir/ -name '*.pdf'
./dir/subdir2/subsubdir1/document.pdf
./dir/subdir3/another-document.pdf
You can pipe that to xargs
to get it as a single space-delimited line, and feed that to tar
to create the archive:
$ find dir/ -name '*.pdf' | xargs tar czf dir.tar.gz
(This way omits the empty directories)
You want to create a tar file away from the place the files you need to tar reside?
There are many ways to do this.
If it is to be created locally (= on the same machine) :
tar czvf /path/to/destination/newfile.tar.gz ./SOURCEDIR_OR_FILES
You can add additionnal files or directories to tar at the end of that command.
If it is to be created remotely (ie, you want to create the tar file on a remote host from the one containing the data to be tared):
tar czvf - ./SOURCEDIR_OR_FILES | ssh user@host 'cat > newfile.tar.gz'
The later version is very versatile. For example you can also "duplicate" a directory + subdirs using the same technique:
Duplicate a directory+subdirs to another local directory:
tar cf - ./SOURCEDIR_OR_FILES | ( cd LOCAL_DEST_DIR && tar xvf - )
Duplicate a directory+subdirs to another remote directory:
tar cvf - ./SOURCEDIR_OR_FILES | ssh user@host 'cd REMOTE_DEST_DIR && tar xf - '
Drop the 'v' if you don't need it to display files as they are tar-ed (or untarred): it will then go much faster, but won't say much unless there is an error.
I use "./..." for the source to force tar to store it as a RELATIVE path. In some cases you'll want to add additionnal path information:
For example to tar the crontab files, including the one in /etc, you could do:
cd / ; tar czf all_crons.tgz ./etc/*cron* ./var/spool/cron
I use on purpose the relative path: some OLD versions of tar may be dangerous and extract files with their original GLOBAL path, meaning you could do : cd /safedir ; tar xvf sometar
and have the files with global names overwrite files at their original path, which is OUTSIDE of /safedir and not underneath it! Very dangerous, and still possible as there are old production servers out there. Better to be used to use relative paths all the time, even if you use a more recent tar.
Best Answer
The
--xform
argument takes any number ofsed
substitute expressions, which are very powerful. In your case use a pattern that matches everything until the last/
and replace it with nothing:Add
--show-transformed-names
to see the new names.Note, this substitution applies to all filenames, not just those given on the command line, so, for example, if you have a file
/a/b/c
and your list just specifies/a
, then the final filename is justc
, notb/c
. You can always be more explicit and provide an exact list of substitutions, eg in your caseNote, the initial
/
will be removed by tar (unless you use-P
) so the above expressions are missing it. Also, the list of directories has to be sorted so the longest match is done first, elsetmp/path2/
won't match astmp/
has already been removed. But you can automate the creation of this list, eg: