I want to know how many regular files have the extension .c
in a large complex directory structure, and also how many directories these files are spread across. The output I want is just those two numbers.
I've seen this question about how to get the number of files, but I need to know the number of directories the files are in too.
- My filenames (including directories) might have any characters; they may start with
.
or-
and have spaces or newlines. - I might have some symlinks whose names end with
.c
, and symlinks to directories. I don't want symlinks to be followed or counted, or I at least want to know if and when they are being counted. - The directory structure has many levels and the top level directory (the working directory) has at least one
.c
file in it.
I hastily wrote some commands in the (Bash) shell to count them myself, but I don't think the result is accurate…
shopt -s dotglob
shopt -s globstar
mkdir out
for d in **/; do
find "$d" -maxdepth 1 -type f -name "*.c" >> out/$(basename "$d")
done
ls -1Aq out | wc -l
cat out/* | wc -l
This outputs complaints about ambiguous redirects, misses files in the current directory, and trips up on special characters (for example, redirected find
output prints newlines in filenames) and writes a whole bunch of empty files (oops).
How can I reliably enumerate my .c
files and their containing directories?
In case it helps, here are some commands to create a test structure with bad names and symlinks:
mkdir -p cfiles/{1..3}/{a..b} && cd cfiles
mkdir space\ d
touch -- i.c -.c bad\ .c 'terrible
.c' not-c .hidden.c
for d in space\ d 1 2 2/{a..b} 3/b; do cp -t "$d" -- *.c; done
ln -s 2 dirlink
ln -s 3/b/i.c filelink.c
In the resulting structure, 7 directories contain .c
files, and 29 regular files end with .c
(if dotglob
is off when the commands are run) (if I've miscounted, please let me know). These are the numbers I want.
Please feel free not to use this particular test.
N.B.: Answers in any shell or other language will be tested & appreciated by me. If I have to install new packages, no problem. If you know a GUI solution, I encourage you to share (but I might not go so far as to install a whole DE to test it) 🙂 I use Ubuntu MATE 17.10.
Best Answer
I haven't examined the output with symlinks but:
find
command prints the directory name of each.c
file it finds.sort | uniq -c
will gives us how many files are in each directory (thesort
might be unnecessary here, not sure)sed
, I replace the directory name with1
, thus eliminating all possible weird characters, with just the count and1
remainingtr
d
here is essentially the same asNR
. I could have omitted inserting1
in thesed
command, and just printedNR
here, but I think this is slightly clearer.Up until the
tr
, the data is NUL-delimited, safe against all valid filenames.With zsh and bash, you can use
printf %q
to get a quoted string, which would not have newlines in it. So, you might be able to do something like:However, even though
**
is not supposed to expand for symlinks to directories, I could not get the desired output on bash 4.4.18(1) (Ubuntu 16.04).But zsh worked fine, and the command can be simplified:
D
enables this glob to select dot files,.
selects regular files (so, not symlinks), and:h
prints only the directory path and not the filename (likefind
's%h
) (See sections on Filename Generation and Modifiers). So with the awk command we just need to count the number of unique directories appearing, and the number of lines is the file count.