CentOS 5.9
I came across an issue the other day where a directory had a lot of files. To count it, I ran ls -l /foo/foo2/ | wc -l
Turns out that there were over 1 million files in a single directory (long story — the root cause is getting fixed).
My question is: is there a faster way to do the count? What would be the most efficient way to get the count?
Best Answer
Short answer:
(This includes
.
and..
, so subtract 2.)When you list the files in a directory, three common things might happen:
ls
command do that.stat
to retrieve metadata about each directory entry, such as whether it is a directory.#3 is the most expensive by far, because it requires loading an inode for each file. In comparison all the file names needed for #1 are compactly stored in a few blocks. #2 wastes some CPU time but it is often not a deal breaker.
If there are no newlines in file names, a simple
ls -A | wc -l
tells you how many files there are in the directory. Beware that if you have an alias forls
, this may trigger a call tostat
(e.g.ls --color
orls -F
need to know the file type, which requires a call tostat
), so from the command line, callcommand ls -A | wc -l
or\ls -A | wc -l
to avoid an alias.If there are newlines in the file name, whether newlines are listed or not depends on the Unix variant. GNU coreutils and BusyBox default to displaying
?
for a newline, so they're safe.Call
ls -f
to list the entries without sorting them (#2). This automatically turns on-a
(at least on modern systems). The-f
option is in POSIX but with optional status; most implementations support it, but not BusyBox. The option-q
replaces non-printable characters including newlines by?
; it's POSIX but isn't supported by BusyBox, so omit it if you need BusyBox support at the expense of overcounting files whose name contains a newline character.If the directory has no subdirectories, then most versions of
find
will not callstat
on its entries (leaf directory optimization: a directory that has a link count of 2 cannot have subdirectories, sofind
doesn't need to look up the metadata of the entries unless a condition such as-type
requires it). Sofind . | wc -l
is a portable, fast way to count files in a directory provided that the directory has no subdirectories and that no file name contains a newline.If the directory has no subdirectories but file names may contain newlines, try one of these (the second one should be faster if it's supported, but may not be noticeably so).
On the other hand, don't use
find
if the directory has subdirectories: evenfind . -maxdepth 1
callsstat
on every entry (at least with GNU find and BusyBox find). You avoid sorting (#2) but you pay the price of an inode lookup (#3) which kills performance.In the shell without external tools, you can run count the files in the current directory with
set -- *; echo $#
. This misses dot files (files whose name begins with.
) and reports 1 instead of 0 in an empty directory. This is the fastest way to count files in small directories because it doesn't require starting an external program, but (except in zsh) wastes time for larger directories due to the sorting step (#2).In bash, this is a reliable way to count the files in the current directory:
In ksh93, this is a reliable way to count the files in the current directory:
In zsh, this is a reliable way to count the files in the current directory:
If you have the
mark_dirs
option set, make sure to turn it off:a=(*(DNoN^M))
.In any POSIX shell, this is a reliable way to count the files in the current directory:
All of these methods sort the file names, except for the zsh one.