I am trying to count the occurrences of consonants in multiple files,
but I want the number of occurrences to be separately calculated for each file.
I use
awk -v FS="" '{for ( i=1;i<=NF;i++){if($i ~/[bcdfghjklmnpqrtsvwxyzBCDEFGHJKLMNPQRTSVWXYZ]/) count_c++}} END {print FILENAME,count_c}' file1 file2
file1 looks like this:
bac Dfeg
k87 eH
tRe
rt up
file2 looks like this:
hi
rt2w
Prt
but it prints the occurrences of both files (output=file2 19
). How could I change this so the output would be like:
file1 12
file2 7
Best Answer
With GNU awk for ENDFILE and IGNORECASE:
or with any POSIX awk:
If you only want to count the specific characters b, c, d, etc. instead of all alphabetic characters that aren't aeiou, then just change
( gsub(/[[:alpha:]]/,"&") - gsub(/[aeiou]/,"&") )
above togsub(/[bcdfghjklmnpqrtsvwxyz]/,"&"))
Note that, unlike any approach that prints results in an
FNR==1
clause, both of the above scripts will handle empty files correctly by printing the file name and 0 as the count.Also note the
cnt+0
in the first script - the+0
ensures that the value printed will be a numeric0
rather than a null string if the first file is empty.If the same file name can appear multiple times in the input then add
FNR==1{cnt[FILENAME]=0}
to the start of the script if you want it output multiple times or addif (!seen[ARGV[i]]++) { ... }
around the print in the END section if you only want it output once.See https://unix.stackexchange.com/a/642372/133219 for an answer to the followup question of also counting vowels.