CPU-adaptive compression

compressionperformance

Let assume I need to send some data from one computer to another, over a pretty fast network… for example standard 100Mbit connection (~10MB/s). My disk drives are standard HDD, so their speed is somewhere between 30MB/s and 100MB/s. So I guess that compressing the data on the fly could help.

But… I don't want to be limited by CPU. If I choose an algorithm that is intensive on CPU, the transfer will actually go slower than without compression.

This is difficult with compressors like GZIP and BZIP2 because you usually set the compression strength once for the whole transfer, and my data streams are sometimes easy, sometimes hard to compress–this makes the process suboptimal because sometimes I do not use full CPU, and sometimes the bandwidth is underutilized.

Is there a compression program that would adapt to current CPU/bandwidth and hit the sweet spot so that the transfer will be optimal? Ideally for Linux, but I am still curious about all solutions. I'd love to see something compatible with GZIP/BZIP2 decompressors, but this is not necessary.

So I'd like to optimize total transfer time, not simply amount of bytes to send.

Also I don't need real time decompression… real time compression is enough. The destination host can process the data later in its spare time. I know this doesn't change much (compression is usually much more CPU-intensive than decompression), but if there's a solution that could use this fact, all the better.

Each time I am transferring different data, and I really want to make these one-time transfers as quick as possible. So I won't benefit from getting multiple transfers faster due to stronger compression.

Thanks,

Best Answer

This is a current subject of research - primarily in the area of sensor networks where the goal is to minimise power usage, rather than maximise throughput. The principle of adaptive compression is the same however.

Here is a recent paper from a professor at USC.

Maybe you could have a go at implementing his algorithm? I'm sure there would be many people interested in a good implementation.

Related Question