I'm trying to build an awk statement to read this file:
A 1,2,3 *
A 4,5,6 **
B 1
B 4,5 *
and build a file like this:
A 1,2,3 * 3 1 0.333
A 4,5,6 ** 3 2 0.666
B 1 1 0 0
B 4,5 * 2 1 0.5
In this new file, the first three columns are the same as in the original file. The fourth column must contain the number of comma-separated elements in column 2. The fifth column must contain the number of characters in column 3. The last column contains the proportion of column 5 on column 4 (i.e., column 5 divided by column 4).
I'm trying the following code:
awk '{print $1"\t"$2"\t"$3"\t"(NF","$2 -1)"\t"length($3)"\t"(length($3)/(NF","$2-1))}' file1 > file2
But I got the following output:
A 1,2,3 * 3,0 1 0.333333
A 4,5,6 ** 3,3 2 0.666667
B 1 2,0 0 0
B 4,5 * 3,3 1 0.333333
I can't figure out what I'm doing wrong for column 4.
Best Answer
You seem to be hoping that
(NF","$2 -1)
will be treated as a function that will return the number of comma-delimited elements in field$2
- it won't.NF
is always the number of fields in the record.Instead, you can use awk's
split
functionsplit($2,a,",")
which splits field$2
into an arraya
and returns the number of elements. You can also tidy up the code by using setting the output filed separator to tab instead of using explicit "\t" in your print statement