Linux – reserve disk space before rsync copying files

disk-usagefile-copyfileslinuxrsync

tl;dr I would like to reserve (or "claim"?) some amount of disk space before an rsync occurs so other rsync instances will only run if the disk space needed will certainly be available.

background

A job (a shell script that runs rsync) will:

  1. Use rsync to copy large amount of data from a source disk to a different destination disk
  2. do some work using the copied data
  3. remove the copied data

Multiple instances of the job script may run simultaneously.

In my case, once in a while, many job scripts simultaneously rsync and use all available disk space. All of the rsync instances fail (and so the jobs fail).

pseudo-code

Here is the algorithm I'm imagining:

$job = get_next_incoming_job()
$disk_dst = $job.disk_dst()  # destination disk for rsync
$space_need = $job.calculate_space_needed()

_check_space:  # jump label

if $space_need > space_available($disk_dst) then
    sleep $RANDOM
    goto _check_space:

$handle = reserve_space($disk_dst, $space_need)  # How??

# rsync will "fill-in" the reserved space - How??
rsync $job.source_data_path() $disk_dst/$job.ID/

do work using $disk_dst/$job.ID/

remove $disk_dst/$job.ID/
release_reserved_space($handle)  # How??

The magic function reserve_space would instantly change the $disk_dst reported free space (value returned by space_available). Other rsync job instances would see space_available() return less space right away (and thus, delay their work until later).

Currently, space_available() (via actual program df) will return a declining number while rsync instances run. The problem is multiple rsync instances can run out of space while running. I'd like the rsync instances to only run when it is certain they can complete (i.e. not run out of disk space while running).

Best Answer

If you stick to filesystem-independent tools, I can't think of a way to do this other than actually allocating the disk space, i.e. reserve would need to create a (non-sparse!) file of the requested size, and you'd need to delete this file before starting rsync.

If the files are on an ext2/ext3/ext4 volume and using root access for some operations is acceptable, you can use its reserved space feature. The reserved space is normally for root, but you can make it available to a different user or to a different group instead. Run the rsync process as that user/group and adjust the reserved space with tune2fs -m before running rsync.

There's probably a more flexible solution with ZFS or Btrfs pools but I don't know how to do it.

Related Question