#IF downvoted, please explain why in comment. Bummer !!!
From my comment i reached to a solution:
The whole command line is like this: $ find . -type f -printf "%T@ %p %s\n"| sort -n -r | awk '{ i+=$3; if (i<=200000) {print $2}}' | tar -cvf toto.tar -T -&& ssh -n prd "rm -rf dir/*" && scp toto.tar prd:tmp/ && ssh -n prd "tar xvf tmp/toto.tar"
The command starts with a find which look for all file in the current directory (should be adapted to the directory path on server A) and print 3 fields
- %T@ print the unix timestamp
- %p print the file path from where the
find
is launched
- %s print the size in byte of the file
- \n for new line of course.
Then the sorting of the output of find
is done by sort -n -r
which will reverse sort numerically on the first field sorting unix timestamp from most recent to oldest one.
To deal with the size constraint, awk
is helping a little bit, by printing the second field of the sort
output, until the sum of size is under the limit.
For each line it processes, it sum the value of the third field (size) to local variable i
, then print the second field of the sort output if i
is under the limit.
tar -cvf toto.tar -T -
will build an archive called toto.tar based on the file list provided by awk
output.
If success, first deletion of the server B backup; then scp toto.tar host:dir
will transfert the file to remote server (server B), and then ssh -n "tar xvf dir/toto.tar
will unpack the transfered archive on the remote folder preserving directory structure.
My previous solution based on scp was not preserving directory structure that is why i edited this answer.
Here the result of the run in my homedir with max size of 200kb:
$ rm toto.tar; find . -type f -printf "%T@ %p %s\n"| sort -n -r | awk '{ i+=$3; if (i<=200000) {print $2}}' | tar -cvf toto.tar -T -&& scp toto.tar prd:tmp/ && ssh -n prd "tar xvf tmp/toto.tar"
./.lesshst
./.viminfo
./scpplus
./.config/xfce4/desktop/icons.screen0-1350x650.rc
./.xsession-errors
./.config/xfce4/xfconf/xfce-perchannel-xml/xfce4-panel.xml
./.config/pulse/7f14833c645d4a6abb0beba68b79e0c0-default-source
./.config/pulse/7f14833c645d4a6abb0beba68b79e0c0-default-sink
./.cache/imsettings/log
./.cache/gpg-agent-info
./.ICEauthority
./.vboxclient-draganddrop.pid
./.vboxclient-seamless.pid
./.vboxclient-display.pid
./.vboxclient-clipboard.pid
./.dbus/session-bus/7f14833c645d4a6abb0beba68b79e0c0-0
./.cache/xscreensaver/xscreensaver-getimage.cache
./.config/xfce4/desktop/icons.screen0-1264x950.rc
./work/fpart-0.9.2/src/fpart
toto.tar 100% 170KB 170.0KB/s 00:00
./.lesshst
./.viminfo
./scpplus
./.config/xfce4/desktop/icons.screen0-1350x650.rc
./.xsession-errors
./.config/xfce4/xfconf/xfce-perchannel-xml/xfce4-panel.xml
./.config/pulse/7f14833c645d4a6abb0beba68b79e0c0-default-source
./.config/pulse/7f14833c645d4a6abb0beba68b79e0c0-default-sink
./.cache/imsettings/log
./.cache/gpg-agent-info
./.ICEauthority
./.vboxclient-draganddrop.pid
./.vboxclient-seamless.pid
./.vboxclient-display.pid
./.vboxclient-clipboard.pid
./.dbus/session-bus/7f14833c645d4a6abb0beba68b79e0c0-0
./.cache/xscreensaver/xscreensaver-getimage.cache
./.config/xfce4/desktop/icons.screen0-1264x950.rc
./work/fpart-0.9.2/src/fpart
The main concern is the fact this solution removes the backup folder, before transferring the latest 10gigs of data from the primary server. It's not very efficient if the set of newest data and the set of backuped data have lot of common file/directory. But this is a very easy way to really track down the latest 10gig (or whatever) of newest data, whatever is the data (quick and dirty)
update2: im bored but i will explain the second solution.
I finally reached a second solution which i will explain now.
It's not efficiently coded, it's a big onliner, could be formatted into a shell script with basic check in case of failure, or filename strange formatting.
The biggest issue to the first solution is that it always try to backup the last 10gig of newest files. Whatever is already backuped.
Which means if there is only 100M of new files at new launch, it will erase the whole backup and transfer again 10G of data (100Mnewest, and 9.9G less new)
Here is the oneliner :
ssh -n prd 'cd /var/tmp/test/ && find . -type f -printf "%T@ %p %s\n" ' |awk '{ print int($1)" "$2" "$3 }'|sort -n -r >/tmp/remote ; find . -type f -printf "%T@ %p %s\n" |awk '{ print int($1)" "$2" "$3 }'|sort -n -r | awk '{ i+=$3; if (i<=200000) {print $1" "$2" "$3}}'>/tmp/locale; grep -F -x -v -f /tmp/remote /tmp/locale |cut -d" " -f2 >/tmp/newfile;grep -F -x -v -f /tmp/locale /tmp/remote |cut -d" " -f2 >/tmp/toremove; cat /tmp/toremove |while read i; do echo "removing $i on remote server"; ssh -n prd "rm /var/tmp/test/$i"; done ; cat /tmp/newfile | tar -cvf toto.tar -T -&& scp toto.tar prd:/var/tmp/test/ && ssh -n prd "cd /var/tmp/test; tar xvf /var/tmp/test/toto.tar; rm /var/tmp/test/toto.tar"; rm /tmp/remote /tmp/locale /tmp/toremove /tmp/newfile toto.tar
Of course change prd
by your server B, and all directory path on local/remote server except for the create temporary file. Beware this do not deal with fucked file name with empty space or special character inside.
Explanation:
The main idea is to know which are the newest file not backuped on the backup server. Erase too old file on the backup server, and transfert to it only the newest one not present, all of this keeping in mind the size limit.
- First connect to backup server and grab the list of backuped file:
ssh -n prd 'cd /var/tmp/test/ && find . -type f -printf "%T@ %p %s\n" ' |awk '{ print int($1)" "$2" "$3 }'|sort -n -r >/tmp/remote ;
; i have to remove the fraction part of the time due to some issue with tar
which always set the fraction part to 0
. Which mean that the date between backup server and origin server will differ on the fractionnal part. The sort will order from the biggest value of the first field, to the lowest, which means from the newest file to the oldest file. And i save the result into the /tmp/remote
file. No need to check for whole size, as i always transfered less than 10G in my previous backup.
- second, i do the same locally to get the list of the newest files
with summed size under the limit :
find . -type f -printf "%T@ %p
%s\n" |awk '{ print int($1)" "$2" "$3 }'|sort -n -r | awk '{ i+=$3;
if (i<=200000) {print $1" "$2" "$3}}'>/tmp/locale;
; i save the
result into /tmp/locale
So in fact, all file which are in /tmp/locale
and not in /tmp/remote
are the newest files to be synced on backup server.
All the files which are in /tmp/remote
and not in /tmp/locale
are the files to be removed on backup server (too old).
To distinguish those subsets, i use grep
:
grep -F -x -v -f /tmp/remote /tmp/locale |cut -d" " -f2>/tmp/newfile;
will display all the file contained in /tmp/locale
and not in /tmp/remote
, which i save into
/tmp/newfile
grep -F -x -v -f /tmp/locale /tmp/remote |cut -d" " -f2 >/tmp/toremove;
will display all files contained in /tmp/remote
and not in /tmp/locale
and i save it into /tmp/toremove
So now, i have the list of file to remotely delete, and the list of file to transfer to the backup server, keeping the directory structure.
I will use tar
to build the locale archive to send on backup, delete remotely the old files, transfer the archive and unpack it.
And then we are almost done. I remove the temporary files in /tmp
for cleaning.
Into details this gives :
cat /tmp/toremove |while read i; do echo "removing $i on remote server"; ssh -n prd "rm /var/tmp/test/$i"; done ;
This loop with read the file list i cat
as input, displays a little message telling me which file it deletes and launch the remote rm
via ssh
cat /tmp/newfile | tar -cvf toto.tar -T -&& scp toto.tar prd:/var/tmp/test/ && ssh -n prd "cd /var/tmp/test; tar xvf /var/tmp/test/toto.tar; rm /var/tmp/test/toto.tar";
will build the locale toto.tar
archive which will contains all the files listed in /tmp/newfile
. If success i transfer it to the remote server, and then i remotely unpack it via ssh
, i remove the archive also on backup server, this way it will not interfere with the next launch.
rm /tmp/remote /tmp/locale /tmp/toremove /tmp/newfile toto.tar
is the local cleaning of files used during this launch.
This onliner can be shortened, removing the use of temporary file, and piping directly the output of grep
into the while
loop and tar
command.
It can also be improved to deal with all command return status (not enough place to build the archive; scp
or ssh
error...) and with strange file name (with space, or special character, to avoid mess with parameter expension)
Best Answer
I am not sure whether you can do it with any existing linux commands such as rsync or diff. But in my case I had to write my own script using Python, as python has the "filecmp" module for file comparison. I have posted the whole script and usage in my personal site - http://linuxfreelancer.com/
It usage is simple - give it the absolute path of new directory, old directory and difference directory in that order.