Linux – Efficiently delete large directory containing thousands of files

command linefileslinuxrm

We have an issue with a folder becoming unwieldy with hundreds of thousands of tiny files.

There are so many files that performing rm -rf returns an error and instead what we need to do is something like:

find /path/to/folder -name "filenamestart*" -type f -exec rm -f {} \;

This works but is very slow and constantly fails from running out of memory.

Is there a better way to do this? Ideally I would like to remove the entire directory without caring about the contents inside it.

Best Answer

Using rsync is surprising fast and simple.

mkdir empty_dir
rsync -a --delete empty_dir/    yourdirectory/

@sarath's answer mentioned another fast choice: Perl! Its benchmarks are faster than rsync -a --delete.

cd yourdirectory
perl -e 'for(<*>){((stat)[9]<(unlink))}'

Sources:

Related Solutions

Linux – How to determine how many files are within a directory without counting

The size of the directory (as seen with ls -ld /var/lib/php/sessions) can give an indication. If it's small, there aren't many files. If it's large, there may be many entries in there, or there may have been many in the past.

Listing the content, as long as you don't stat individual files, shouldn't take a lot much longer than reading a file the same size.

What might happen is that you have an alias for ls that does ls -F or ls --color. Those options cause an lstat system call to be performed on every file to see for instance if they are a file or directory.

You'll also want to make sure that you list dot files and that you leave the file list unsorted. For that, run:

command ls -f /var/lib/php/sessions | wc -l

Provided not too many filenames have newline characters, that should give you a good estimate.

$ ls -lhd 1
drwxr-xr-x 2 chazelas chazelas 69M Aug 15 20:02 1/
$ time ls -f 1 | wc -l
3218992
ls -f 1  0.68s user 1.20s system 99% cpu 1.881 total
wc -l  0.00s user 0.18s system 9% cpu 1.880 total
$ time ls -F 1 | wc -l
<still running...>

You can also deduce the number of files there by subtracting the number of unique files elsewhere in the file system from the number of used inodes in the output of df -i.

For instance, if the file system is mounted on /var, with GNU find:

find /var -xdev -path /var/lib/php/sessions -prune -o \
  -printf '%i\n' | sort -u | wc -l

To find the number of files not in /var/lib/php/sessions. If you subtract that to the IUsed field in the output of df -i /var, you'll get an approximation (because some special inodes are not linked to any directory in a typical ext file system) of the number of files linked to /var/lib/php/sessions that are not otherwise linked anywhere else (note that /var/lib/php/sessions could very well contain one billion entries for the same file (well actually the maximum number of links on a file is going to be much lower than that on most filesystems), so that method is not fool-proof).

Note that if reading the directory content should be relatively fast, removing files can be painfully slow.

rm -r, when removing files, first lists the directory content, and then calls unlink() for every file. And for every file, the system has to lookup the file in that huge directory, which if it's not hashed can be very expensive.

How to rename thousands of files efficiently

You can install the Perl script rename. Then try doing this :

$ rename -n 's/[A-Z]/lc($&)/ge; s/\s/_/g' files*

(remove the -n switch when your tests are OK)

There are two utilities called rename. The one in Fedora can't do this. Some other distributions come with the Perl one by default. If you run the following command (GNU)

$ file "$(readlink -f "$(type -p rename)")"

and you have a result like

.../rename: Perl script, ASCII text executable

and not containing:

ELF

then this seems to be the right tool =)

If not, such as on Fedora, install it manually.

Last but not least, this tool was originally written by Larry Wall, Perl's dad.

Best Answer

Related Solutions

Linux – How to determine how many files are within a directory without counting

How to rename thousands of files efficiently

Related Question