I have a TSV tab-separated file with 3 cols:
ID\tTEXT\tTYPE
To print the TYPE
column I do
cat /dataset.csv | awk -F $'\t' '{print $3}'
Those values are an enumeration of values like {CLASS_A,CLASS_B,CLASS_C}
, etc.
I need a inline way with AWK to count the number of occurrences (NF
?) of the column TYPE
when matching each value in the enumeration to obtain:
CLASS_A 1300
CLASS_B 450
CLASS_C 988
[UPDATE]
According to the solutions below, I'm putting here my last version of this script
#!/bin/bash
COL=$1
FILE=$2
awk -v col="$COL" -F $'\t' ' {c[$col]++}
END{
for (i in c) printf("%s\t%s\n",i,c[i])
}' $FILE
and the usage to count occurrences of rows in col 3 is
$ ./count_cols.sh 3 /myfile.csv
Best Answer
There is no need to use
cat
to read the file. AWK is perfectly capable to read it.A core
c[$3]++
statement should get the count of lines of each type.Then, at the end, just print (as tab separated values) all the counts:
Appended
Given the comment from the OP that:
I got to review the answer. I created this file:
That file, when used with the script, gives the correct result:
Which is the correct result.
Of course, the file
tabs
for 3 fields, andTo test that a file does comply with the first requirement, you may use this script:
Which checks that there are exactly two tabs per line, and
That the number of fields (as seen by awk) are actually three.
Adding a couple of test lines:
And running the script above:
detects the line ID 1 that has four tabs (two added) and doesn't get fooled by line ID 2 with a
\t
.As for the quoting and use of variables, that is something you should improve all by yourself.