I use a single rsync
program to backup a file system.
by ps
, I find there are four rsync
threads or processes, two in R state (running) and two in S state (suspended?):
$ ps aux | grep rsync
root 14144 0.0 0.0 6008 1868 pts/1 S+ 03:16 0:00 sudo rsync -azvv /windows-d/ ./2015.03.07_03:16:05/
root 14145 47.2 0.5 62424 46108 pts/1 R+ 03:16 226:44 rsync -azvv /windows-d/ ./2015.03.07_03:16:05/
root 14146 0.6 0.2 80052 20584 pts/1 S+ 03:16 2:59 rsync -azvv /windows-d/ ./2015.03.07_03:16:05/
root 14147 11.4 0.2 49324 20264 pts/1 S+ 03:16 55:02 rsync -azvv /windows-d/ ./2015.03.07_03:16:05/
ting 16986 0.0 0.0 4392 820 pts/4 S+ 11:16 0:00 grep --color=auto rsync
by pstree
, I find there are three rsync
processes or threads:
$ pstree | grep rsync
| |-bash---sudo---rsync---rsync---rsync
Why do I have multiple rsync
threads or processes,while I am running only one program?
From the stdout output, it doesn't seem to be parallel transferring multiple files (which seems to require extra effort? Speed up rsync with Simultaneous/Concurrent File Transfers)?
But I examine the destination, and find that there are some directories (say dir1
) with only some but not all the files already transferred, while rsync
's output to stdout says it is transferring files in a different and separate directory (say dir2
, which has the same parent dir as dir1
does). It seems to me that later it will output to stdout saying that it will transfer the remaining files in the directories (e.g. dir1
) with some but not all the files already transferred.
Best Answer
There are multiple things the rsync program needs to do, among them:
Often, but not always the transmission part is the limiting factor in bandwidth.
Rsync doesn't do parallel transfer of patch data. But it does generate other data, and exchanges, and so builds up knowledge, about what other deltas might need transferral. It does this using threads, during the transfer, so that when the transfer of a particular delta is completed, the next delta is (hopefully) ready for transfer.
A more naive approach would would wait for a delta transmission to complete, and then start comparing the next files for necessary transmissions. Since it can take a while to find the next differing file, the transmission bandwidth would not be utilized during that time.