Awk – Print Number of Consonant Occurrences for Each File

awktext processing

I am trying to count the occurrences of consonants in multiple files,
but I want the number of occurrences to be separately calculated for each file.
I use

awk -v FS="" '{for ( i=1;i<=NF;i++){if($i ~/[bcdfghjklmnpqrtsvwxyzBCDEFGHJKLMNPQRTSVWXYZ]/) count_c++}} END {print FILENAME,count_c}' file1 file2

file1 looks like this:

bac Dfeg           
k87 eH

tRe        
rt up

file2 looks like this:

hi
rt2w
Prt

but it prints the occurrences of both files (output=file2 19). How could I change this so the output would be like:

file1 12
file2 7

Best Answer

With GNU awk for ENDFILE and IGNORECASE:

$ awk -v IGNORECASE=1 '
    { cnt += ( gsub(/[[:alpha:]]/,"&") - gsub(/[aeiou]/,"&") )}
    ENDFILE { print FILENAME, cnt+0; cnt=0 }
' file1 file2
file1 12
file2 7

or with any POSIX awk:

$ awk '
    { lc=tolower($0); cnt[FILENAME] += (gsub(/[[:alpha:]]/,"&",lc) - gsub(/[aeiou]/,"&",lc)) }
    END { for (i=1; i<ARGC; i++) print ARGV[i], cnt[ARGV[i]]+0 }
' file1 file2
file1 12
file2 7

If you only want to count the specific characters b, c, d, etc. instead of all alphabetic characters that aren't aeiou, then just change ( gsub(/[[:alpha:]]/,"&") - gsub(/[aeiou]/,"&") ) above to gsub(/[bcdfghjklmnpqrtsvwxyz]/,"&"))

Note that, unlike any approach that prints results in an FNR==1 clause, both of the above scripts will handle empty files correctly by printing the file name and 0 as the count.

Also note the cnt+0 in the first script - the +0 ensures that the value printed will be a numeric 0 rather than a null string if the first file is empty.

If the same file name can appear multiple times in the input then add FNR==1{cnt[FILENAME]=0} to the start of the script if you want it output multiple times or add if (!seen[ARGV[i]]++) { ... } around the print in the END section if you only want it output once.

See https://unix.stackexchange.com/a/642372/133219 for an answer to the followup question of also counting vowels.

Related Solutions

Merging 2 files with based on field match

$ awk 'FNR==NR{a[$1]=$2;next} ($1 in a) {print $1,a[$1],$2}' file2 file1
aa 45 32
bb 31 15
cc 50 78

Explanation:

awk implicitly loops through each file, one line at a time. Since we gave it file2 as the first argument, it is read first. file1 is read second.

FNR==NR{a[$1]=$2;next}

NR is the number of lines that awk has read so far and FNR is the number of lines that awk has read so far from the current file. Thus, if FNR==NR, we are still reading the first named file: file2. For every line in file2, we assign a[$1]=$2.

Here, a is an associative array and a[$1]=$2 means saving file2's second column, denoted $2, as a value in array a using file2's first column, $1, as the key.

next tells awk to skip the rest of the commands and start over with the next line.
($1 in a) {print $1,a[$1],$2}

If we get here, that means that we are reading the second file: file1. If we saw the first field of the line in file2, as determined by the contents of array a, then we print out a line with the values of field 2 from both files.

Shell – Awk – output the second line of a number of .dat files to one file

Remove while loop and make use of shell brace expansion and also FNR, a built-in awk variable:

awk 'FNR==2{print $0 > "output.dat"}' file{1..80}.dat

Best Answer

Related Solutions

Merging 2 files with based on field match

Shell – Awk – output the second line of a number of .dat files to one file

Related Question