Number of comma-separated fields in a text file

awktext processing

I'm trying to build an awk statement to read this file:

A   1,2,3   *
A   4,5,6   **
B   1
B   4,5     *

and build a file like this:

A   1,2,3   *    3   1   0.333
A   4,5,6   **   3   2   0.666
B   1            1   0   0
B   4,5     *    2   1   0.5

In this new file, the first three columns are the same as in the original file. The fourth column must contain the number of comma-separated elements in column 2. The fifth column must contain the number of characters in column 3. The last column contains the proportion of column 5 on column 4 (i.e., column 5 divided by column 4).

I'm trying the following code:

awk '{print $1"\t"$2"\t"$3"\t"(NF","$2 -1)"\t"length($3)"\t"(length($3)/(NF","$2-1))}' file1 > file2

But I got the following output:

A   1,2,3   *    3,0   1   0.333333
A   4,5,6   **   3,3   2   0.666667
B   1            2,0   0   0
B   4,5     *    3,3   1   0.333333

I can't figure out what I'm doing wrong for column 4.

Best Answer

You seem to be hoping that (NF","$2 -1) will be treated as a function that will return the number of comma-delimited elements in field $2 - it won't. NF is always the number of fields in the record.

Instead, you can use awk's split function split($2,a,",") which splits field $2 into an array a and returns the number of elements. You can also tidy up the code by using setting the output filed separator to tab instead of using explicit "\t" in your print statement

awk '{l2=split($2,a,","); OFS="\t"; print $1, $2, $3, l2, length($3), length($3)/l2}' file1
Related Question