Linux – How to determine how many files are within a directory without counting

directoryext4fileslinux

I've been having a fairly serious issue on a high traffic web server. PHP pages are slowing down considerably, and it only seems to be an issue on pages where sessions are accessed, or a certain table within a database is being referenced. the '/var/log/messages' log file, I see hundreds of thousands of the following error:
'kernel: EXT4-fs warning (device dm-0): ext4_dx_add_entry: Directory index full!'

I suspect there is a bottleneck in '/var/lib/php/sessions' because I cannot open the folder in Filezilla, and cannot count the number of files/sub-directories with grep. While it is quite possibly a case of hard drive corruption, I'd like to verify a hunch of mine first by checking the number of files inside of this directory.

How would you go about finding the number of files within a folder without actually counting the files in said folder?

Best Answer

The size of the directory (as seen with ls -ld /var/lib/php/sessions) can give an indication. If it's small, there aren't many files. If it's large, there may be many entries in there, or there may have been many in the past.

Listing the content, as long as you don't stat individual files, shouldn't take a lot much longer than reading a file the same size.

What might happen is that you have an alias for ls that does ls -F or ls --color. Those options cause an lstat system call to be performed on every file to see for instance if they are a file or directory.

You'll also want to make sure that you list dot files and that you leave the file list unsorted. For that, run:

command ls -f /var/lib/php/sessions | wc -l

Provided not too many filenames have newline characters, that should give you a good estimate.

$ ls -lhd 1
drwxr-xr-x 2 chazelas chazelas 69M Aug 15 20:02 1/
$ time ls -f 1 | wc -l
3218992
ls -f 1  0.68s user 1.20s system 99% cpu 1.881 total
wc -l  0.00s user 0.18s system 9% cpu 1.880 total
$ time ls -F 1 | wc -l
<still running...>

You can also deduce the number of files there by subtracting the number of unique files elsewhere in the file system from the number of used inodes in the output of df -i.

For instance, if the file system is mounted on /var, with GNU find:

find /var -xdev -path /var/lib/php/sessions -prune -o \
  -printf '%i\n' | sort -u | wc -l

To find the number of files not in /var/lib/php/sessions. If you subtract that to the IUsed field in the output of df -i /var, you'll get an approximation (because some special inodes are not linked to any directory in a typical ext file system) of the number of files linked to /var/lib/php/sessions that are not otherwise linked anywhere else (note that /var/lib/php/sessions could very well contain one billion entries for the same file (well actually the maximum number of links on a file is going to be much lower than that on most filesystems), so that method is not fool-proof).

Note that if reading the directory content should be relatively fast, removing files can be painfully slow.

rm -r, when removing files, first lists the directory content, and then calls unlink() for every file. And for every file, the system has to lookup the file in that huge directory, which if it's not hashed can be very expensive.

Related Question