Awk to remove line if argument is encountered in a specific column

awktext processing

I need to cycle through an entire file of unknown size and remove any line in which a given word (passed in as argument 1) appears in a specified column. In addition, I need to keep track of how many lines are removed. I'm assuming this is a job for awk, but I'm having a lot of trouble with it. I've tried working with awk match, but in addition to some other syntactical issues, I'm having trouble getting it to translate the argument into a word.

Example (File.txt):

Katie 1234 4567 blue
Ben 3456 2345 purple
Alex 7896 6789 blue

$ script.sh blue 4

Edits file to:

Ben 3456 2345 purple

And outputs: 2 lines removed

I'm more interested in understanding what I'm doing than just getting the code.

Best Answer

#!/bin/sh
awk -v value="$1" -v column="$2" '
  $column == value {++removed; next}
  1 {print}
  END {print removed " lines removed" >"/dev/stderr"}
' <File.txt >File.txt.tmp &&
mv File.txt.tmp File.txt

Explanations:

  • -v value="$1" sets the awk variable value to the shell script's first argument.
  • For each line, if the condition $column == value is true, the code in the braces is executed.
    • $column is the content of the column number column (starting at 1).
    • ++removed increments a counter of removed lines. The variable starts at 0.
    • next skips to the next input line, so that the print instruction won't be executed when the condition is true.
  • 1 {print} prints every line that didn't cause the next directive to be executed. (1 is an always-true condition.)
  • END {…} executes the code inside the braces at the end of the input.
  • The awk code writes to a temporary file which is then moved into place.
Related Question