Linux – Find oldest files/directories in file system up to 50 TB

findlinux

I need to find the oldest files with their associated directories in a 90 TB file system up to 50 TB and then move them to another file system. They have to retain their directory structure as that is what identifies what the files are. So –

first level/second level/third level/(file)

is the structure. I need to move that entire structure – there isn't anything in the top level directories but without them I cannot identify what the file belongs to as all of the files that I am looking for have the same name. When the process is complete I should have roughly 40 TB in the original file system left and almost nothing in the new file system left as the oldest files in the original are now there.

Thanks!

Best Answer

With GNU tools and rsync, you could do:

export LC_ALL=C # force tools to regard those file paths as arrays
                # of bytes (as they are in effect) and not do fancy
                # sorting (and use English for error/warning messages 
                # as an undesired side effect).

find . -type f -printf '%T@/%s/%p\0' | # print mtime/size/path
  sort -zn | # numerical sort, oldest first
  awk -v RS='\0' -v ORS='\0' -F / -v max=50e12 '
    {total_size += $2}
    total_size > max {exit}
    {
      sub("^[^/]*/[^/]*/", "") # remove mtime/size/
      print # path
    }' |
  rsync -nv -aHAX0 --files-from=- --remove-source-files . /dest/dir/

(untested. The -n is for dry-run. Remove if happy).

Note that we're calculating the cumulative file size based on the file sizes (%s, replace with %b for the disk usage in sectors (and change to total_size += $2 * 512) and ignoring hard links. Those files, when copied to the target file system, along with the directories that contain them will likely end up using more than 50TB (unless there's file system compression or deduplication in play).

Related Question