How to count the occurence of a pattern in a line

text processing

I have a file which has three columns. Column 3 contains names of genes and it looks like this:

Rv0729,Rv0993,Rv1408  
Rv0162c,Rv0761c,Rv1862,Rv3086  
Rv2790c

How can I print the number of genes in each row?

Best Answer

You simply want to add a column with the count of columns in it. This may be done using awk:

$ awk -F ',' '{ printf("%d,%s\n", NF, $0) }' data.in
3,Rv0729,Rv0993,Rv1408
4,Rv0162c,Rv0761c,Rv1862,Rv3086
1,Rv2790c

NF is an awk variable containing the number of fields (columns) in the current record (row). We print this number followed by a comma and the rest of the row, for each row.

An alternative (same result, but may look a bit cleaner):

$ awk -F ',' 'BEGIN { OFS=FS } { print NF, $0 }' data.in

FS is the field separator which awk uses to split each record into fields, and we set that to a comma with -F ',' on the command line (as in the first solution). OFS is the output field separator, and we set that to be the same as FS before reading the first line of input.

Related Question