How to convert all files from gzip to xz on the fly (and recursively)

conversiongzippiperecursivexz

I have a directory tree with gzipped files like this:

basedir/a/file.dat.gz
basedir/b/file.dat.gz
basedir/c/file.dat.gz
etc.

How can I convert all of these from gzip to xz with a single command and without decompressing each file to disk?

The trivial two-liner with decompressing to disk looks like this:

find basedir/ -type f -name '*.dat.gz' -exec gzip -d {} \;
find basedir/ -type f -name '*.dat' -exec xz {} \;

First command could even be shorter: gunzip -r *

For a single file on-the-fly conversion is simple (although this doesn't replace the .gz file):

gzip -cd basedir/a/file.dat.gz | xz > basedir/a/file.dat.xz

Since gzip and xz are handling the extensions themselves I'd like to say:

gunzip -rc * > xz

I looked at find | xargs basename -s .gz { } a bit but didn't get a working solution.

I could write a shell script, but I feel there should be a simple solution.


Edit

Thanks for all who answered already. I know we all love 'commands that will never fail™'. So, to keep this simple:

  • All subdirectories contain only numbers, letters (äöü, though), underscore and minus.
  • All files are named file.dat[.n].gz, n being a positive integer
  • No directory or file will have a '.gz' anywhere (other than as the final file suffix).
  • This is the only content these directories contain.
  • I control the naming and can restrict it if needed.

Using a simple find -exec ... or ls | xargs, is there a command to replace '.gz' in the found filename by '.xz' on the fly? Then I could write something like (pseudo):

find basedir/ -type f -name '*.gz' -exec [ gzip -cd {} | xz > {replace .gz by .xz} \; ]

Best Answer

find . -name '*.gz' -type f -exec bash -o pipefail -Cc '
  for file do
    gunzip < "$file" | xz > "${file%.gz}.xz" && rm -f "$file"
  done' bash {} +

The -C prevents overwriting an existing file and won't follow symlinks except if the exiting file is a non-regular file or a link to a non-regular file, so you would not lose data unless you have for instance a file.gz and a file.xz that is a symlink to /dev/null. To guard against that, you could use zsh instead and also use the -execdir feature of some find implementations for good measure and avoid some race conditions:

find . -name '*.gz' -type f -execdir zsh -o pipefail -c '
  zmodload zsh/system || exit
  for file do
    gunzip < "$file" | (
      sysopen -u 1 -w -o excl -- "${file%.gz}.xz" && xz) &&
      rm -f -- "$file"
  done' zsh {} +

Or to clean-up xz files upon failed recompressions:

find . -name '*.gz' -type f -execdir zsh -o pipefail -c '
  zmodload zsh/system || exit
  for file do
    sysopen -u 1 -w -o excl -- "${file%.gz}.xz" &&
      if gunzip < "$file" | xz; then
        rm -f -- "$file"
      else
        rm -f -- "${file%.gz}.xz"
      fi
  done' zsh {} +

If you'd rather it being short, and are ready to ignore some of those potential issues, in zsh, you could do

for f (./**/*.gz(D.)) {gunzip < $f | xz > $f:r.xz && rm -f $f}
Related Question