compression – Time-Adaptive Compression Tool Overview

compression

I'm asking about a scenario of copying a big file to a remote server.

A simplest case is:

tar c myfile | ssh myserver tar x

If network connectivity is fast then all is fine.

On a slower network I do

tar c myfile | bzip2 -1 | ssh myserver tar xj

— making my transfer faster at the cost of CPU time.

Of course I can play with compression ratio, typically trying to guess the right one so my CPU is not too busy and the network is saturated.

Is there a compression utility or a compression flag that would tell bzip2/xz/… to compress as much as possible while the output buffer is busy?

Best Answer

zstd --adapt

The zstd compression utility has an option that turns on adaptive compression (the option was added in zstd v1.3.6). This would adjust the compression to "the current perceived I/O conditions".

See the zstd manual for more information.

A complete pipeline may look something like this:

tar -c -f - source_directory |
zstd --adapt |
ssh user@server 'cd /someplace && { zstd -d | tar -x -f -;}'

or

tar -c -f - source_directory |
zstd --adapt |
ssh user@server 'zstd -d | tar -x -C /someplace -f -'

If you add -v to the first zstd in the pipeline, you will get a progress indicator line saying something like

(L7) Buffered :  32 MB - Consumed : 192 MB - Compressed :  72 MB => 37.50%

where the (L7) indicates the compression level. For any moderately large amount of data, you would expect it to fluctuate over time, showing that zstd is indeed adapting to the I/O conditions (and presumably also to the data itself).