Text Processing – Removing Rows Containing NA in Every Column

awkbioinformaticsperltext processing

I have a tab delimited file which looks like this:

gene    v1  v2  v3  v4
g1  NA  NA  NA  NA
g2  NA  NA  2   3
g3  NA  NA  NA  NA
g4  1   2   3   2

The number of fields in every line is fixed and same.
I want to remove those rows from the above file where all the fields for every row from column 2 through last is NA. Then the output should look like:

gene    v1  v2  v3  v4
g2  NA  NA  2   3
g4  1   2   3   2

Best Answer

With awk:

awk '{ for (i=2;i<=NF;i++) if ($i!="NA"){ print; break } }' file

Loop through the fields starting at the second field and print the line if a field not containing NA is found. Then break the loop.

Related Solutions

Subtracting numbers from adjacent columns and successive rows using awk

You can do this as below. Defer the subtraction except for the first line but get its last column value as input for the subsequent line.

awk -F'\t' 'BEGIN { OFS = FS } NR == 1 { last = $4; print;  next }{ $5 = $3 - last; last = $4  }1' file

Best Answer

Related Solutions

Subtracting numbers from adjacent columns and successive rows using awk

Related Question