Linux – How to determine how many files are within a directory without counting

directoryext4fileslinux

I've been having a fairly serious issue on a high traffic web server. PHP pages are slowing down considerably, and it only seems to be an issue on pages where sessions are accessed, or a certain table within a database is being referenced. the '/var/log/messages' log file, I see hundreds of thousands of the following error:
'kernel: EXT4-fs warning (device dm-0): ext4_dx_add_entry: Directory index full!'

I suspect there is a bottleneck in '/var/lib/php/sessions' because I cannot open the folder in Filezilla, and cannot count the number of files/sub-directories with grep. While it is quite possibly a case of hard drive corruption, I'd like to verify a hunch of mine first by checking the number of files inside of this directory.

How would you go about finding the number of files within a folder without actually counting the files in said folder?

Best Answer

The size of the directory (as seen with ls -ld /var/lib/php/sessions) can give an indication. If it's small, there aren't many files. If it's large, there may be many entries in there, or there may have been many in the past.

Listing the content, as long as you don't stat individual files, shouldn't take a lot much longer than reading a file the same size.

What might happen is that you have an alias for ls that does ls -F or ls --color. Those options cause an lstat system call to be performed on every file to see for instance if they are a file or directory.

You'll also want to make sure that you list dot files and that you leave the file list unsorted. For that, run:

command ls -f /var/lib/php/sessions | wc -l

Provided not too many filenames have newline characters, that should give you a good estimate.

$ ls -lhd 1
drwxr-xr-x 2 chazelas chazelas 69M Aug 15 20:02 1/
$ time ls -f 1 | wc -l
3218992
ls -f 1  0.68s user 1.20s system 99% cpu 1.881 total
wc -l  0.00s user 0.18s system 9% cpu 1.880 total
$ time ls -F 1 | wc -l
<still running...>

You can also deduce the number of files there by subtracting the number of unique files elsewhere in the file system from the number of used inodes in the output of df -i.

For instance, if the file system is mounted on /var, with GNU find:

find /var -xdev -path /var/lib/php/sessions -prune -o \
  -printf '%i\n' | sort -u | wc -l

To find the number of files not in /var/lib/php/sessions. If you subtract that to the IUsed field in the output of df -i /var, you'll get an approximation (because some special inodes are not linked to any directory in a typical ext file system) of the number of files linked to /var/lib/php/sessions that are not otherwise linked anywhere else (note that /var/lib/php/sessions could very well contain one billion entries for the same file (well actually the maximum number of links on a file is going to be much lower than that on most filesystems), so that method is not fool-proof).

Note that if reading the directory content should be relatively fast, removing files can be painfully slow.

rm -r, when removing files, first lists the directory content, and then calls unlink() for every file. And for every file, the system has to lookup the file in that huge directory, which if it's not hashed can be very expensive.

Related Solutions

Linux – Efficiently delete large directory containing thousands of files

Using rsync is surprising fast and simple.

mkdir empty_dir
rsync -a --delete empty_dir/    yourdirectory/

@sarath's answer mentioned another fast choice: Perl! Its benchmarks are faster than rsync -a --delete.

cd yourdirectory
perl -e 'for(<*>){((stat)[9]<(unlink))}'

Sources:

List most recently modified files in a `ls`-style output without running `ls` on the entire directory

At least the GNU and FreeBSD finds have the -ls action, which produces output similar to ls:

$ find . -ls
   392815      4 drwxr-xr-x   2 user  group      4096 Jul 22 18:39 .
   392816      0 -rw-r--r--   1 user  group         0 Jul 22 18:39 ./foo.txt
   392818      0 -rw-r--r--   1 user  group         0 Jul 22 18:39 ./bar.txt

GNU find also has very configurable output in the form of the -printf action.

That said, I do wonder what makes your ls so slow. Both find and ls need to read the whole directory and call lstat() on all the files to find the dates, so there shouldn't be much of a difference. ls does need to sort the whole list of files, so that could make a difference if there is a really large number of files. In that case, you might want to consider spreading the files out to different directories, possibly based on their date. Dropping the -r and use head instead of tail might help.

Best Answer

Related Solutions

Linux – Efficiently delete large directory containing thousands of files

List most recently modified files in a `ls`-style output without running `ls` on the entire directory

Related Question