There are many ways to do this.
Using grep
:
grep -E '^.{6,}$' file.txt >out.txt
Now out.txt
will contain lines having six or more characters.
Reverse way:
grep -vE '^.{,5}$' file.txt >out.txt
Using sed
, removing lines of length 5 or less:
sed -r '/^.{,5}$/d' file.txt
Reverse way, printing lines of length six or more:
sed -nr '/^.{6,}$/p' file.txt
You can save the output in a different file using >
operator like grep
or edit the file in-place using -i
option of sed
:
sed -ri.bak '/^.{6,}$/' file.txt
The original file will be backed up as file.txt.bak
and the modified file will be file.txt
.
If you do not want to keep a backup:
sed -ri '/^.{6,}$/' file.txt
Using shell, Slower, Don't do this, this is just for the sake of showing another method:
while IFS= read -r line; do [ "${#line}" -ge 6 ] && echo "$line"; done <file.txt
Using python
,even slower than grep
, sed
:
#!/usr/bin/env python2
with open('file.txt') as f:
for line in f:
if len(line.rstrip('\n')) >= 6:
print line.rstrip('\n')
Better use list comprehension to be more Pythonic:
#!/usr/bin/env python2
with open('file.txt') as f:
strip = str.rstrip
print '\n'.join([line for line in f if len(strip(line, '\n')) >= 6]).rstrip('\n')
With GNU sort:
grep -E '(^1848|^[0-9]{4},1848)' file | sort -t, -k3n | head -n 5
(if the first column may have less or more than exactly 4 digits, replace {4}
with +
)
Output:
1848,2606,9.783802450936204
1848,2609,10.30355814063016
1848,2600,10.635270124233982
1848,2604,10.636275056472996
1848,2612,10.741178028606866
Best Answer
No need for Gawk extensions:
Here the awk field tokenizer specified via
-F
is a regex set[]
: fields get separated by either an opening or closing parenthesis, hence you see the number of your 3rd column is the 4th awk field.