$ awk 'FNR==NR{a[$1]=$2;next} ($1 in a) {print $1,a[$1],$2}' file2 file1
aa 45 32
bb 31 15
cc 50 78
Explanation:
awk
implicitly loops through each file, one line at a time. Since we gave it file2
as the first argument, it is read first. file1
is read second.
FNR==NR{a[$1]=$2;next}
NR
is the number of lines that awk
has read so far and FNR
is the number of lines that awk
has read so far from the current file. Thus, if FNR==NR
, we are still reading the first named file: file2
. For every line in file2
, we assign a[$1]=$2
.
Here, a
is an associative array and a[$1]=$2
means saving file2's second column, denoted $2
, as a value in array a
using file2's first column, $1
, as the key.
next
tells awk
to skip the rest of the commands and start over with the next line.
($1 in a) {print $1,a[$1],$2}
If we get here, that means that we are reading the second file: file1
. If we saw the first field of the line in file2
, as determined by the contents of array a
, then we print out a line with the values of field 2 from both files.
With GNU awk for ENDFILE and IGNORECASE:
$ awk -v IGNORECASE=1 '
{ cnt += ( gsub(/[[:alpha:]]/,"&") - gsub(/[aeiou]/,"&") )}
ENDFILE { print FILENAME, cnt+0; cnt=0 }
' file1 file2
file1 12
file2 7
or with any POSIX awk:
$ awk '
{ lc=tolower($0); cnt[FILENAME] += (gsub(/[[:alpha:]]/,"&",lc) - gsub(/[aeiou]/,"&",lc)) }
END { for (i=1; i<ARGC; i++) print ARGV[i], cnt[ARGV[i]]+0 }
' file1 file2
file1 12
file2 7
If you only want to count the specific characters b, c, d, etc. instead of all alphabetic characters that aren't aeiou, then just change ( gsub(/[[:alpha:]]/,"&") - gsub(/[aeiou]/,"&") )
above to gsub(/[bcdfghjklmnpqrtsvwxyz]/,"&"))
Note that, unlike any approach that prints results in an FNR==1
clause, both of the above scripts will handle empty files correctly by printing the file name and 0 as the count.
Also note the cnt+0
in the first script - the +0
ensures that the value printed will be a numeric 0
rather than a null string if the first file is empty.
If the same file name can appear multiple times in the input then add FNR==1{cnt[FILENAME]=0}
to the start of the script if you want it output multiple times or add if (!seen[ARGV[i]]++) { ... }
around the print in the END section if you only want it output once.
See https://unix.stackexchange.com/a/642372/133219 for an answer to the followup question of also counting vowels.
Best Answer
Remove
while
loop and make use of shell brace expansion and alsoFNR
, a built-inawk
variable: