Delete everything after second underscore

bioinformaticscommand linetext processing

I want to delete all the text after the second underscore (including the underscore itself), but not on every line. Every of the target lines begin with a pattern (>gi_).

EXAMPLE.

Input

>gi_12_pork_cat

ACGT

>gi_34_pink_blue

CGTA

Output

>gi_12

ACGT

>gi_34

CGTA

Best Answer

$ awk -F_ 'BEGIN {OFS="_"} /^>gi/ {print $1,$2} ! /^>gi/ {print}' input
>gi_12
ACGT
>gi_34
CGTA

Related Solutions

Text Processing – How to Insert a New Line After Every N Lines

With awk:

awk ' {print;} NR % 2 == 0 { print ""; }' inputfile

With sed (GNU extension):

sed '0~2 a\\' inputfile

With bash:

#!/bin/bash
lines=0
while IFS= read -r line
do
    printf '%s\n' "${line}"
    ((lines++ % 2)) && echo
done < "$1"

Delete everything before “/” on every line

Using cut :

$ cut -sd'/' -f2 file.txt   ##This will print only the lines containing /
7fad416d-f2b3-4259-b98d-2449957a3123
8a8589bf-49e3-4cd7-af15-6753067355c6

The following suggestions assumes that / appears only once in a line :

Using grep :

$ grep -o '[^/]*$' file.txt  ##This will print the lines not having / too
7fad416d-f2b3-4259-b98d-2449957a3123
8a8589bf-49e3-4cd7-af15-6753067355c6

If you have / in all of the lines, you can use these too:

Using bash parameter expansion:

$ while read line; do echo "${line#*/}"; done <file.txt 
7fad416d-f2b3-4259-b98d-2449957a3123
8a8589bf-49e3-4cd7-af15-6753067355c6

Or python :

#!/usr/bin/env python2
with open('file.txt') as f:
    for line in f:
        print line.split('/')[1].rstrip()

Note that as far as your example is concerned all of the above suggestions are valid.

Best Answer

Related Solutions

Text Processing – How to Insert a New Line After Every N Lines

Delete everything before “/” on every line

Related Question