Linux – Split a File into Rows Based on Column Values

awkgreplinuxsedtext processing

Input file looks something like this:

chr1    1    G    300
chr1    2    A    500
chr1    3    C    200
chr4    1    T    35
chr4    2    G    400
chr4    3    C    435
chr4    4    A    223
chr4    5    T    400
chr4    6    G    300
chr4    7    G    340
chr4    8    C    400

The actual file is too big to process, so I want to output a smaller file filtering by chromosome (column 1) and position (column 2) within a specific range.

For example, I'm looking for a Linux command (sed, awk, grep, etc.) that will filter by chr4 from positions 3 to 7. The desired final output is:

chr4    3    C    435
chr4    4    A    223
chr4    5    T    400
chr4    6    G    300
chr4    7    G    340

I don't want to modify the original file.

Best Answer

The solution for potentially unsorted input file:

sort -k1,1 -k2,2n file | awk '$1=="chr4" && $2>2 && $2<8'

The output:

chr4    3    C    435
chr4    4    A    223
chr4    5    T    400
chr4    6    G    300
chr4    7    G    340

If the input file is sorted it's enough to use:

awk '$1=="chr4" && $2>2 && $2<8' file
Related Question