Ubuntu – How to remove lines with a number less than 60 in column 3

command linetext processing

I have a large file. I need to remove all lines in a file which have a number less than 60 in column 3.

Example file:

35110   Bacteria(100)   Proteobacteria(59)  Alphaproteobacteria(59)
12713   Bacteria(100)   Bacteroidetes(100)  Bacteroidia(100)

Desired output:

12713   Bacteria(100)   Bacteroidetes(100)  Bacteroidia(100)

Best Answer

No need for Gawk extensions:

awk -F '[()]' '$4 >= 60'

Here the awk field tokenizer specified via -F is a regex set []: fields get separated by either an opening or closing parenthesis, hence you see the number of your 3rd column is the 4th awk field.

Related Solutions

Ubuntu – How to remove all lines in a file that are less than 6 characters

There are many ways to do this.

Using grep:

grep -E '^.{6,}$' file.txt >out.txt

Now out.txt will contain lines having six or more characters.

Reverse way:

grep -vE '^.{,5}$' file.txt >out.txt

Using sed, removing lines of length 5 or less:

sed -r '/^.{,5}$/d' file.txt

Reverse way, printing lines of length six or more:

sed -nr '/^.{6,}$/p' file.txt

You can save the output in a different file using > operator like grep or edit the file in-place using -i option of sed:

sed -ri.bak '/^.{6,}$/' file.txt

The original file will be backed up as file.txt.bak and the modified file will be file.txt.

If you do not want to keep a backup:

sed -ri '/^.{6,}$/' file.txt

Using shell, Slower, Don't do this, this is just for the sake of showing another method:

while IFS= read -r line; do [ "${#line}" -ge 6 ] && echo "$line"; done <file.txt

Using python,even slower than grep, sed:

#!/usr/bin/env python2
with open('file.txt') as f:
    for line in f:
        if len(line.rstrip('\n')) >= 6:
            print line.rstrip('\n')

Better use list comprehension to be more Pythonic:

#!/usr/bin/env python2
with open('file.txt') as f:
     strip = str.rstrip
     print '\n'.join([line for line in f if len(strip(line, '\n')) >= 6]).rstrip('\n')

Ubuntu – Finding the lines with the lowest value in their third column given grep results

With GNU sort:

grep -E '(^1848|^[0-9]{4},1848)' file | sort -t, -k3n | head -n 5

(if the first column may have less or more than exactly 4 digits, replace {4} with +)

Output:

1848,2606,9.783802450936204
1848,2609,10.30355814063016
1848,2600,10.635270124233982
1848,2604,10.636275056472996
1848,2612,10.741178028606866

Best Answer

Related Solutions

Ubuntu – How to remove all lines in a file that are less than 6 characters

Ubuntu – Finding the lines with the lowest value in their third column given grep results

Related Question