find . -name '*.gz' -type f -exec bash -o pipefail -Cc '
for file do
gunzip < "$file" | xz > "${file%.gz}.xz" && rm -f "$file"
done' bash {} +
The -C
prevents overwriting an existing file and won't follow symlinks except if the exiting file is a non-regular file or a link to a non-regular file, so you would not lose data unless you have for instance a file.gz
and a file.xz
that is a symlink to /dev/null
. To guard against that, you could use zsh
instead and also use the -execdir
feature of some find
implementations for good measure and avoid some race conditions:
find . -name '*.gz' -type f -execdir zsh -o pipefail -c '
zmodload zsh/system || exit
for file do
gunzip < "$file" | (
sysopen -u 1 -w -o excl -- "${file%.gz}.xz" && xz) &&
rm -f -- "$file"
done' zsh {} +
Or to clean-up xz
files upon failed recompressions:
find . -name '*.gz' -type f -execdir zsh -o pipefail -c '
zmodload zsh/system || exit
for file do
sysopen -u 1 -w -o excl -- "${file%.gz}.xz" &&
if gunzip < "$file" | xz; then
rm -f -- "$file"
else
rm -f -- "${file%.gz}.xz"
fi
done' zsh {} +
If you'd rather it being short, and are ready to ignore some of those potential issues, in zsh
, you could do
for f (./**/*.gz(D.)) {gunzip < $f | xz > $f:r.xz && rm -f $f}
Looking at the source code, the implementation of pipe_read
in source/fs/pipe.c
has changed quite a bit in the Linux kernel, but from a quick reading of the code in 2.0.40, 2.4.37, 2.6.32, 3.11 and 4.9, it seems to me that whenever there has been (or is, while read
is blocking) a write of size w and a read of size r with r > w then read
will return at least w bytes. So if you have fixed-size chunks (of a size smaller than PIPE_BUF
) and always make reads of that same size, then you are in practice guaranteed to always read a whole chunk.
On the other hand, if you have variable-sized chunks, then you have no such guarantee. There is a guarantee of atomicity only on the write side: a write of less than PIPE_BUF
will not be cut by another writer. But on the reader side, if there have been e.g. a write of 10 bytes followed by a write of 20 bytes, and you later try to read 15 bytes, then you'll get the complete first write and the first 5 bytes of the second write. The read
call doesn't stop reading data until it would have to block or its output buffer is full.
If you want to transmit data in chunks, use a datagram socket instead of a pipe.
Best Answer
If your
./run
will produce its output to stdout if not given a file argument (which is customary in Unix/Linux), then you can simply use:If it needs a filename argument, but if it's fine writing to a pipe, then you can either use a special device such as
/dev/stdout
or/dev/fd/1
(both should be equivalent), like so:Or you can use process substitution, which is typically available in most modern shells such as bash, zsh, or ksh, which will end up using a device from
/dev/fd
behind the scenes to accomplish the same:This last one also needs
./run
to be able to write to a pipe, but it should work better than the others if./run
writes tooutput.txt
and to stdout in its normal operation, in which case the output would get mixed up if you redirect both to stdout.Programs are usually ok writing to a pipe, but some of them might want to seek and rewind to offsets within an output file, which is not possible in a pipe. If that's the case, then writing to a temporary file and then compressing it is probably all you can do.