Get rsync to skip files with same size

rsyncunix

I'm using the following command to copy a large number of big video files to external drives.

rsync -Ph --inplace /Volumes/Production/Prefix* Prefix

We had a power failure and the copying was abrupted. With rsync it's no problem to just restart it, but the thing is that it takes quite a while for it to get back to where it was. It goes through every file and looks like it reads through the whole file. Its speed is reported to be around 3-5 times faster than what it usually is, up to when it gets to the point it starts to copy again.

What is it doing exactly during this time? Is it reading through the whole file and comparing it with the source? Or is it doing something else fancy? Is there a way to get rsync to skip completed files faster? For example tell it to only check files that have a different file size or something?

Best Answer

rsync has an option: --size-only which does what you want.

Related Solutions

Rsync – How to Resume Transfer of a Single File

Looking at some pages for rsync:

--append
This causes rsync to update a file by appending data onto the end of the file, which presumes that the data that already exists on the receiving side is identical with the start of the file on the sending side.

--inplace
This option changes how rsync transfers a file when its data needs to be updated: instead of the default method of creating a new copy of the file and moving it into place when it is com- plete, rsync instead writes the updated data directly to the destination file.

--partial
By default, rsync will delete any partially transferred file if the transfer is interrupted. In some circumstances it is more desirable to keep partially transferred files.

Sounds like if the file is very big, you would want to use --partial --append. (--append implies --inplace) If this big file changes, then drop the --append and rsync will check the beginning of the file to ensure it too matches the source file. --inplace to me sounds dangerous, except if you are rsyncing a big file, you don't want rsync to create a new temporary file of the beginning part, continue the transfer, then remove the old file to put the new file in place. The transfer would go faster if you could use the same file, not to mention the disk space needed for the transfer would be less.

Also, I've found from a whole file transfer stand point a copy is faster than rsync. However, if I needed to update a file, I've had rsync sync the file faster than retransferring the whole file again. (like I said above) Rsync should be able to resume from a cp.

I hope this helps.

Macos – How to tell rsync to skip files on a damaged hard drive block, instead of being stuck trying to read it

Short answer: rsync it is not the right tool to be used in this case: its use can be even harmful.
Use ddrescue instead (better than dd_rescue). It is able to do what you are asking for.

If the disk is physically damaged, there is the possibility to brick it with any attempt to repair it.

It is not only a question about the use of your time, when rsync seems to hang forever approaching a damaged sector. The problem is that with repeated operations an irreparable failure can happens, and then you will be not anymore able to to rescue your data without expensive parts replacement (always if it will be still possible and you will not have bricked your HDD).

In this case the most safe procedure I found is

To create a raw image on another not broken disk.
To create a copy of that image.
To work on the copy to fix the filesystem and to rescue the files.

Why the copy? Because if it fails something in the filesystem fixing step you can always start again without the need to touch again the original damaged HDD.

I suggest you to use ddrescue, to do the raw disk image, defects included, because it works fine even in case of read errors.

How to do it with ddrescue

You can use ddrescue exactly as you would like to use rsync, skipping the damaged sectors without retrying or splitting them, copying as much data as possible.
This command is here below (instead of /dev/hda1 you will put your device):

ddrescue --no-split /dev/hda1 imagefile logfile

After that you have done this first passage (the faster one), you can try to refine it trying to access for 3 times in case of error.

ddrescue --direct --max-retries=3 /dev/hda1 imagefile logfile

You can continue to refine the image repeating the ddrescue command invocations with other options, trying each time to extract more data (see the references). When you will finish you can create the copy (if you have all the needed space) and then to fix the filesystem.

Note that the raw image will be as big as the original HDD.
You can find on internet, on this and on other sites of StackExchange many questions&answers about how to rescue data with ddrescue or other tools.

References:

Best Answer

Related Solutions

Rsync – How to Resume Transfer of a Single File

Macos – How to tell rsync to skip files on a damaged hard drive block, instead of being stuck trying to read it

Related Question