Awk – Print Number of Consonant Occurrences for Each File

awktext processing

I am trying to count the occurrences of consonants in multiple files,
but I want the number of occurrences to be separately calculated for each file. 
I use

awk -v FS="" '{for ( i=1;i<=NF;i++){if($i ~/[bcdfghjklmnpqrtsvwxyzBCDEFGHJKLMNPQRTSVWXYZ]/) count_c++}} END {print FILENAME,count_c}' file1 file2

file1 looks like this:

bac Dfeg           
k87 eH

tRe        
rt up

file2 looks like this:

hi
rt2w
Prt

but it prints the occurrences of both files (output=file2 19). How could I change this so the output would be like:

file1 12
file2 7

Best Answer

With GNU awk for ENDFILE and IGNORECASE:

$ awk -v IGNORECASE=1 '
    { cnt += ( gsub(/[[:alpha:]]/,"&") - gsub(/[aeiou]/,"&") )}
    ENDFILE { print FILENAME, cnt+0; cnt=0 }
' file1 file2
file1 12
file2 7

or with any POSIX awk:

$ awk '
    { lc=tolower($0); cnt[FILENAME] += (gsub(/[[:alpha:]]/,"&",lc) - gsub(/[aeiou]/,"&",lc)) }
    END { for (i=1; i<ARGC; i++) print ARGV[i], cnt[ARGV[i]]+0 }
' file1 file2
file1 12
file2 7

If you only want to count the specific characters b, c, d, etc. instead of all alphabetic characters that aren't aeiou, then just change ( gsub(/[[:alpha:]]/,"&") - gsub(/[aeiou]/,"&") ) above to gsub(/[bcdfghjklmnpqrtsvwxyz]/,"&"))

Note that, unlike any approach that prints results in an FNR==1 clause, both of the above scripts will handle empty files correctly by printing the file name and 0 as the count.

Also note the cnt+0 in the first script - the +0 ensures that the value printed will be a numeric 0 rather than a null string if the first file is empty.

If the same file name can appear multiple times in the input then add FNR==1{cnt[FILENAME]=0} to the start of the script if you want it output multiple times or add if (!seen[ARGV[i]]++) { ... } around the print in the END section if you only want it output once.

See https://unix.stackexchange.com/a/642372/133219 for an answer to the followup question of also counting vowels.

Related Question