Subtracting numbers from adjacent columns and successive rows using awk

awk

I have a tab-separated file that looks like this:

NZ_CP023599.1   WP_003911075.1  302845  305406
NZ_CP023599.1   WP_003898428.1  471171  472583
NZ_CP023599.1   WP_003402248.1  534387  535157
NZ_CP023599.1   WP_003402301.1  552556  553950
NZ_CP023599.1   WP_003402318.1  558837  559697

I need to subtract the number in 4th column of each row from the number in 3rd column of the next line, and then print the difference in the next line as a 5th column.

The output would look like this:

NZ_CP023599.1   WP_003911075.1  302845  305406  
NZ_CP023599.1   WP_003898428.1  471171  472583  165765
NZ_CP023599.1   WP_003402248.1  534387  535157  61804
NZ_CP023599.1   WP_003402301.1  552556  553950  17399
NZ_CP023599.1   WP_003402318.1  558837  559697  4887

How do I go about this using awk?

Best Answer

You can do this as below. Defer the subtraction except for the first line but get its last column value as input for the subsequent line.

awk -F'\t' 'BEGIN { OFS = FS } NR == 1 { last = $4; print;  next }{ $5 = $3 - last; last = $4  }1' file

Related Solutions

Shell – Subtracting same column between two rows in awk

A solution with awk

awk '
    NR==1 {split($0,a)}
    NR==2 {split($0,b)}
    END {for(i=1;i<=NF;i++) printf "%d ", b[i]-a[i]}
' input.txt

gives a result of

0 0 8 6 4 2

Since awk interpret strings without valid numbers as 0 during arithmetic operations, in case you want to remove the results in which the source field contains non-numeric values, you can do this by adding an additional condition.

awk '
    NR==1 {split($0,a)}
    NR==2 {split($0,b)}
    END {
        for(i=1;i<=NF;i++)
        if(a[i] ~ /^[0-9]+$/ && b[i] ~ /^[0-9]+$/)
        printf "%d ", b[i]-a[i]
    }
' input.txt

gives a result of

8 6 4 2

text-processing – Extract Rows Using Different Info in Different Columns

Using awk and process input file only once:

awk 'min[$3, $5]!=""{ if(min[$3, $5]>$6){ line[$3, $5]=$0; min[$3, $5]=$6}; next }
                    { min[$3, $5]=$6; line[$3, $5]=$0 }
END{ for(x in line) print line[x] }' infile

To "keep lines with equal minimum values" in 6^th column:

awk 'min[$3, $5]!=""{ if(min[$3, $5] >$6){ line[$3, $5]=$0; min[$3, $5]=$6 };
                      if(min[$3, $5]==$6){ line[$3, $5]=line[$3, $5] ORS $0 }; next
                    }
                    { min[$3, $5]=$6; line[$3, $5]=$0 }
END{ for(x in line) print line[x] }' infile

Best Answer

Related Solutions

Shell – Subtracting same column between two rows in awk

text-processing – Extract Rows Using Different Info in Different Columns

Related Question