Ubuntu – How to remove all lines in a file that are less than 6 characters

command linetext processing

I have a file containing approximately 10 million lines.

I want to remove all lines in the file that are less than six characters.

How do I do this?

Best Answer

There are many ways to do this.

Using grep:

grep -E '^.{6,}$' file.txt >out.txt

Now out.txt will contain lines having six or more characters.

Reverse way:

grep -vE '^.{,5}$' file.txt >out.txt

Using sed, removing lines of length 5 or less:

sed -r '/^.{,5}$/d' file.txt

Reverse way, printing lines of length six or more:

sed -nr '/^.{6,}$/p' file.txt 

You can save the output in a different file using > operator like grep or edit the file in-place using -i option of sed:

sed -ri.bak '/^.{6,}$/' file.txt 

The original file will be backed up as file.txt.bak and the modified file will be file.txt.

If you do not want to keep a backup:

sed -ri '/^.{6,}$/' file.txt

Using shell, Slower, Don't do this, this is just for the sake of showing another method:

while IFS= read -r line; do [ "${#line}" -ge 6 ] && echo "$line"; done <file.txt

Using python,even slower than grep, sed:

#!/usr/bin/env python2
with open('file.txt') as f:
    for line in f:
        if len(line.rstrip('\n')) >= 6:
            print line.rstrip('\n')

Better use list comprehension to be more Pythonic:

#!/usr/bin/env python2
with open('file.txt') as f:
     strip = str.rstrip
     print '\n'.join([line for line in f if len(strip(line, '\n')) >= 6]).rstrip('\n')