Why are there multiple rsync threads

rsync

I use a single rsync program to backup a file system.

by ps, I find there are four rsync threads or processes, two in R state (running) and two in S state (suspended?):

$ ps aux | grep rsync
root     14144  0.0  0.0   6008  1868 pts/1    S+   03:16   0:00 sudo rsync -azvv /windows-d/ ./2015.03.07_03:16:05/
root     14145 47.2  0.5  62424 46108 pts/1    R+   03:16 226:44 rsync -azvv /windows-d/ ./2015.03.07_03:16:05/
root     14146  0.6  0.2  80052 20584 pts/1    S+   03:16   2:59 rsync -azvv /windows-d/ ./2015.03.07_03:16:05/
root     14147 11.4  0.2  49324 20264 pts/1    S+   03:16  55:02 rsync -azvv /windows-d/ ./2015.03.07_03:16:05/
ting     16986  0.0  0.0   4392   820 pts/4    S+   11:16   0:00 grep --color=auto rsync

by pstree, I find there are three rsync processes or threads:

$ pstree | grep rsync
     |                |-bash---sudo---rsync---rsync---rsync

Why do I have multiple rsync threads or processes,while I am running only one program?

From the stdout output, it doesn't seem to be parallel transferring multiple files (which seems to require extra effort? Speed up rsync with Simultaneous/Concurrent File Transfers)?

But I examine the destination, and find that there are some directories (say dir1) with only some but not all the files already transferred, while rsync's output to stdout says it is transferring files in a different and separate directory (say dir2, which has the same parent dir as dir1 does). It seems to me that later it will output to stdout saying that it will transfer the remaining files in the directories (e.g. dir1) with some but not all the files already transferred.

Best Answer

There are multiple things the rsync program needs to do, among them:

  • finding files that are not in sync with the remote server
  • deciding which parts need to be transmitted
  • transmitting the deltas so the "other side" can be updated

Often, but not always the transmission part is the limiting factor in bandwidth.

Rsync doesn't do parallel transfer of patch data. But it does generate other data, and exchanges, and so builds up knowledge, about what other deltas might need transferral. It does this using threads, during the transfer, so that when the transfer of a particular delta is completed, the next delta is (hopefully) ready for transfer.

A more naive approach would would wait for a delta transmission to complete, and then start comparing the next files for necessary transmissions. Since it can take a while to find the next differing file, the transmission bandwidth would not be utilized during that time.

Related Question