How can I delete all lines in a text file which have fewer than 'x' letters OR numbers OR symbols? I can't use awk 'length($0)>'
as it will include spaces.
How to delete all lines in a text file which have less than ‘x’ characters
awksedtext processing
Best Answer
Assuming you want to delete lines that contain less than
n
graphical symbols:This deletes all characters that does not match
[[:graph:]]
. If the length of the string that remains is greater than or equal ton
, the (unmodified) line is printed.The value of
n
is given on the command line.[[:graph:]]
is equivalent to[[:alnum:][:punct:]]
, which in turn is the same as[[:alpha:][:digit:][:punct:]]
. It is roughly the same as[[:print:]]
but does not match spaces.Instead of
[^[:graph:]]
, you could possibly use[[:blank:]]
to delete all tabs or spaces.With
sed
, following the aboveawk
code almost literally,or, simplified (only counting non-blank characters),
This first saves the current line into the hold space with
h
. It then deletes all non-graph characters (or blank characters in the second variation) on the line withs///g
. If the line then contains less than 5 characters (change this to whatever number you want, or change the number of dots in the second variation), the line is deleted. Else, the stored line is fetched from the hold space withg
and (implicitly) printed.