Delete everything after second underscore

bioinformaticscommand linetext processing

I want to delete all the text after the second underscore (including the underscore itself), but not on every line. Every of the target lines begin with a pattern (>gi_).

EXAMPLE.

Input

>gi_12_pork_cat

ACGT

>gi_34_pink_blue

CGTA

Output

>gi_12

ACGT

>gi_34

CGTA

Best Answer

$ awk -F_ 'BEGIN {OFS="_"} /^>gi/ {print $1,$2} ! /^>gi/ {print}' input
>gi_12
ACGT
>gi_34
CGTA
Related Question