Remove Lines with Field Value Less Than or Equal to 3 Using sed or awk

awkgrepsedshellshell-script

I need to remove every line that has a value of 2 or less in the 8th field (column).

My data looks like this:

12-31   Airport 189 379 41  49.70946503 -124.91377258   2   2880    30.8
01-01   AlberniElementary   165 331 16  49.26100922 -124.80662537   4   5760    26.1
01-09   BamfieldMarine  161 323 23  48.83490372 -125.13572693   2   2875    27.4
01-10   BamfieldMarine  161 323 23  48.83490372 -125.13572693   3   3068    38.6

I understand that using awk I can strip off the values desired and print them to another file, and I understand that sed would edit the current file. In either case, I need to retain the original file.

Note:
Please provide thorough explanations with your solutions. It is not suffice to just write the command, please annotate suggestions.

Further note: The data has a header line, so most likely solution will need to

awk 'FNR >1'

I suppose?

Best Answer

You almost got it.

 awk '(NR>1) && ($8 > 2 ) ' foo > bar

where

NR is number of record (that is number of line)
$8 is eight field
&& is logical and
foo is the original file, unchanged
bar resulting file
implicit default action is to print the current input line

Note that header is striped from foo to bar, to keep it

 awk '(NR==1) || ($8 > 2 ) ' foo > bar

where

|| is logical or
input line is printed if NR==1 or if $8 > 2

Update #1

To specify a range

( ($8 >= -4) && ( $8 <= 4 ) ) 8th field from -4 to 4
(NR == 1 ) || ( ($8 >= -4) && ( $8 <= 4 ) ) same, including header

Related Solutions

Using sed/awk to remove anything after first space

Sed

sed 's/\s.*$//'

Grep

grep -o '^\S*'

Awk

awk '{print $1}'

As pointed out in the comments, -o isn't POSIX; however both GNU and BSD have it, so it should work for most people.

Also, \s/\S may not be on all systems, if yours doesn't recognize it you can use a literal space, or if you want space and tab, those in a bracket expression ([...]), or the [[:blank:]] character class (note that strictly speaking \s is equivalent to [[:space:]] and includes vertical spacing characters as well like CR, LF or VT which you probably don't care about).

The awk one assumes the lines don't start with a blank character.

Shell – Get files with a name containing a date value less than or equal to a given input date

You can use awk and its string comparison operator.

ls | awk '$0 < "3_20150415"'

In a variable:

max=3_20150414 export max
ls | LC_ALL=C awk '$0 <= ENVIRON["max"] "z"'

concatenating with "z" here makes sure that the comparison is a string comparison, and allows any time on that day since in the C locale, digits sort before z.

Best Answer

Update #1

Related Solutions

Using sed/awk to remove anything after first space

Shell – Get files with a name containing a date value less than or equal to a given input date

Related Question