Remove fields containing specific string

perlsedtext processing

I have file1 containing multiple tab-separated fields, in which I would like to remove only the fields containing a specific string, in my case the underscore character _ (not removing all the row):

cat file1
357M        2054_
357_        154=        1900_
511_        419X        1481_        34=

I would like to obtain the following:

cat file2
357M
154=
419X        34=

I managed to remove the fields as follows:

cat file1 | perl -pe 's/\w+_\s*//g'
357M    154=        419X        34=

But the format is not good, because I would like not to alter the number of columns.

I also tried:

cat file1 | sed 's/[0-9]*_//g'
357M
          154=
          419X         34=

But I would like to get rid of those empty columns.

A brute force approach that actually also works:

cat file1 | sed 's/[0-9]*_//g' | tr -s '\t' '\t' | sed 's/^[ \t]*//g'
357M
154=
419X         34=

This last command: (1) removes all fields containing a underscore; (2) replaces multiple tabs in a row with just one tab; (3) removes leading tabs. Not so elegant though.

Any suggestions?

Best Answer

You could use this simple sed.

sed 's/\w*_\s*//;/^$/d' infile.txt 

/^$/d will delete empty lines where the line is including only one field ending with underscore foo_ or _ alone.

Giving result:

357M
154=
419X    34=
Related Question