Ubuntu – Reasons for rsync NOT transferring all files

backuplinuxrsyncUbuntu

Does anyone know common reasons for such a large deficit difference in the number of files transferred when backing up my LARGE home directory using rsync on a Ubuntu 10.04 LTS setup? The machine is stable and all volumes are clean ext4 — no errors from fsck.ext4.

Number of files: 4857743
Number of files transferred: 4203266

That's a difference of 654,477 files!!!

I want to backup my FULL home folder to an external disk so I can fully WIPE and reformat my system and then restore my home from this rsync'd backup, but I am concerned I am missing significant data files.

I was logged in as root and used rsync to backup my /home/hholtmann/* directory to a spare backup drive in /mnt/wd750/c51/home/

Here is the command line I used as root

root@c-00000051:~# pwd
/root
root@c-00000051:~# rsync -ah --progress --stats /home/hholtmann /mnt/wd750/c51/home/ -v

Captured summary output from rsync

Number of files: 4857743
Number of files transferred: 4203266
Total file size: 487.41G bytes
Total transferred file size: 487.41G bytes
Literal data: 487.41G bytes
Matched data: 0 bytes
File list size: 102.48M
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 487.75G
Total bytes received: 82.42M

Just to compare an important project sub-dir in my home after rsync:

Byte difference between a source and destination sub-dir using du

root@c-00000051:~# du -cs /home/hholtmann/proj/
18992676    /home/hholtmann/proj/
18992676    total
root@c-00000051:~# du -cs /media/wd750/c51/home/hholtmann/proj/
19006768    /mnt/wd750/c51/home/hholtmann/proj/
19006768    total

HOWEVER: NO FILE COUNT difference between the same source and destination sub-dirs

root@c-00000051:~# find /home/hholtmann/proj/ -type f -follow | wc -l
945937
root@c-00000051:~# find /mnt/wd750/c51/home/hholtmann/proj/ -type f -follow | wc -l
945937

why such unexpected results? A file is a file… especially in a user's home dir!

What am I missing? Or is this a sign I'm ready for management!?!

SOLUTION and ANSWERED:

The selected answer below explains for the byte count difference and my incorrect expectation of the rsync summary data. I was just surprised by this byte difference given that both volumes are ext4 with default block sizes. I just assumed every file would take the same space in terms of du numbers.

I DID find some files that were NOT rsync'd by adding more verbose output to rsync by adding -vv to rsync and running again.

What I saw was errors from rsync stating that it could NOT write any of my DROPBOX dir files to the destination due to the "extended attributes" on the files. rsync was skipping all my dropbox path files.

Ends up my /home volume was mounted with the user_xattr ext4 mount option in the /etc/fstab file:

/dev/mapper/vg1-lv_home /home   ext4 nobarrier,noatime,user_xattr 0 2
# I HAD to add the ,user_xattr option to match my home volume
/dev/sda1           /mnt/wd750  ext4 nobarrier,noatime,user_xattr 0 2

After performing another full rsync for the 3rd time, I decided to let a file count run all night on my full home folder and rsync'd backup:

root@c-00000051:~# find /home/hholtmann/ -type f | wc -l
4203266
root@c-00000051:~# find /mnt/wd750/c51/home/hholtmann/ -type f | wc -l
4203266

** A PERFECT MATCH OF FILES **

CONCLUSION:

** Always ensure your backup volumes are mounted with the exact same file system mount options as the source AND turn on full logging with rsync for later grep analysis to search for any errors in long file listings! **

Best Answer

There are 2 parts to this question. First, why is there a difference between "Number of files" and "Number of files transferred". This is explained in the rsync manpage:

Number of files: is the count of all "files" (in the generic sense), which includes directories, symlinks, etc.

Number of files transferred: is the count of normal files that were updated via rsync’s delta-transfer algorithm, which does not include created dirs, symlinks, etc.

The difference here should be equal to the total amount of directories, symnlinks, other special files. Those were not "transferred" but just re-created.

Now for the second part, why is there a size difference with du. du shows the amount of disk space used by a file, not the size of the file. The same file can take up a different amount of disk space, if for example the filesystems blocksizes differ.

If you are still worried about data integrity, a easy way to be sure is to created hashes for all your files and compare:

( cd /home/hholtmann && find . -type f -exec md5sum {} \; ) > /tmp/hholtmann.md5sum
( cd /media/wd750/c51/home/ && md5sum -c /tmp/hholtmann.md5sum )
Related Question