Text Processing – Removing Rows Containing NA in Every Column

awkbioinformaticsperltext processing

I have a tab delimited file which looks like this:

gene    v1  v2  v3  v4
g1  NA  NA  NA  NA
g2  NA  NA  2   3
g3  NA  NA  NA  NA
g4  1   2   3   2

The number of fields in every line is fixed and same.
I want to remove those rows from the above file where all the fields for every row from column 2 through last is NA. Then the output should look like:

gene    v1  v2  v3  v4
g2  NA  NA  2   3
g4  1   2   3   2 

Best Answer

With awk:

awk '{ for (i=2;i<=NF;i++) if ($i!="NA"){ print; break } }' file

Loop through the fields starting at the second field and print the line if a field not containing NA is found. Then break the loop.

Related Question