Awk to remove line if argument is encountered in a specific column

awktext processing

I need to cycle through an entire file of unknown size and remove any line in which a given word (passed in as argument 1) appears in a specified column. In addition, I need to keep track of how many lines are removed. I'm assuming this is a job for awk, but I'm having a lot of trouble with it. I've tried working with awk match, but in addition to some other syntactical issues, I'm having trouble getting it to translate the argument into a word.

Example (File.txt):

Katie 1234 4567 blue
Ben 3456 2345 purple
Alex 7896 6789 blue

$ script.sh blue 4

Edits file to:

Ben 3456 2345 purple

And outputs: 2 lines removed

I'm more interested in understanding what I'm doing than just getting the code.

Best Answer

#!/bin/sh
awk -v value="$1" -v column="$2" '
  $column == value {++removed; next}
  1 {print}
  END {print removed " lines removed" >"/dev/stderr"}
' <File.txt >File.txt.tmp &&
mv File.txt.tmp File.txt

Explanations:

-v value="$1" sets the awk variable value to the shell script's first argument.
For each line, if the condition $column == value is true, the code in the braces is executed.
- $column is the content of the column number column (starting at 1).
- ++removed increments a counter of removed lines. The variable starts at 0.
- next skips to the next input line, so that the print instruction won't be executed when the condition is true.
1 {print} prints every line that didn't cause the next directive to be executed. (1 is an always-true condition.)
END {…} executes the code inside the braces at the end of the input.
The awk code writes to a temporary file which is then moved into place.

Related Solutions

Modify Specific Column with Sed or Awk – Text Processing Guide

Another solution with awk using sub:

awk -F, 'sub("^[0-9]+\\s","",$3)' OFS=, file

Output:

qw12er,foo,bn5mgh
rt8yp,foo,gh78jk
bn852mv,foo,78ghjkh
tgbr,foo,ujmyhn
wsx2d,foo,ui52ohn
tgbr,foo,ujmyhn
ikl896o,foo,wsxdc52

Explanation:

-F,: set the comma as input field separator
OFS=,: set the comma as output filed separator (a space by default)
sub("^[0-9]+\\s","",$3): erase numbers followed by a space at the beginning of the string $3 and print the current line (because "print" is the default action in awk)

In this way you can edit the desired column and print all the others (that in general may be many).

How to print a specific column condition using awk

Just replace print with printf to print the value on the same line, and insert new line after finishing the iteration on the the line:

awk '{{for(i=1;i<=NF;i++)if($i == "name:") printf $(i+1)" "$(i+2)" "} print ""; }' yourfile

The output:

1 AAA 
2 AAA 3 BBB 
1 AAA 2 BBB 5 CCC

Best Answer

Related Solutions

Modify Specific Column with Sed or Awk – Text Processing Guide

How to print a specific column condition using awk

Related Question