I haven't examined the output with symlinks but:
find . -type f -iname '*.c' -printf '%h\0' |
sort -z |
uniq -zc |
sed -zr 's/([0-9]) .*/\1 1/' |
tr '\0' '\n' |
awk '{f += $1; d += $2} END {print f, d}'
- The
find
command prints the directory name of each .c
file it finds.
sort | uniq -c
will gives us how many files are in each directory (the sort
might be unnecessary here, not sure)
- with
sed
, I replace the directory name with 1
, thus eliminating all possible weird characters, with just the count and 1
remaining
- enabling me to convert to newline-separated output with
tr
- which I then sum up with awk, to get the total number of files and the number of directories that contained those files. Note that
d
here is essentially the same as NR
. I could have omitted inserting 1
in the sed
command, and just printed NR
here, but I think this is slightly clearer.
Up until the tr
, the data is NUL-delimited, safe against all valid filenames.
With zsh and bash, you can use printf %q
to get a quoted string, which would not have newlines in it. So, you might be able to do something like:
shopt -s globstar dotglob nocaseglob
printf "%q\n" **/*.c | awk -F/ '{NF--; f++} !c[$0]++{d++} END {print f, d}'
However, even though **
is not supposed to expand for symlinks to directories, I could not get the desired output on bash 4.4.18(1) (Ubuntu 16.04).
$ shopt -s globstar dotglob nocaseglob
$ printf "%q\n" ./**/*.c | awk -F/ '{NF--; f++} !c[$0]++{d++} END {print f, d}'
34 15
$ echo $BASH_VERSION
4.4.18(1)-release
But zsh worked fine, and the command can be simplified:
$ printf "%q\n" ./**/*.c(D.:h) | awk '!c[$0]++ {d++} END {print NR, d}'
29 7
D
enables this glob to select dot files, .
selects regular files (so, not symlinks), and :h
prints only the directory path and not the filename (like find
's %h
) (See sections on Filename Generation and Modifiers). So with the awk command we just need to count the number of unique directories appearing, and the number of lines is the file count.
Best Answer
I would use the
find
command:.
specifies the path to search.-type d
makes this only apply to directories.-name ...
specifies the name of the directories this should apply to.-exec ... {} +
is the command that will be run for each collection of matchs.