I am looking for a way to sort a list and print all lines, whose first column appears only once – i.e., match only on the first column.
For example, I have a file where the first column is a path and the second column contains a 'type'
/path/foo/1 footsy
/path/foo/1 barsy
/path/foo/X barsy
/path/bar/2 footsy
/path/bar/2 barsy
/path/foo/Y footsy
(the file is actually sorted -k1,1)
Now, I would like to extract only cases like
/path/foo/X barsy
/path/foo/Y footsy
I am thinking about some way with awk, where I would have to store the previous line and compare the first field of the previous line to the corresponding field in the current line. But I have not yet an idea how to get it done 🙁
I tried to adapt a solution found in another question but it is not really working as hoped
awk '{
prev=$0; path=$1; type=$2
getline
if ($1 != $path) {
print prev
}
}'
Best Answer
awk
normally reads each line of the input and invokes the script on it. The cases where you would usegetline
are few and far between. When your script is run with six lines of input, this is an overview of what happens:Obviously this isn’t going to work.
Secondly, you made a common mistake in your
awk
code. Inawk
, fields from the input are referenced as$number
and variables are referenced asvariable_name
. This is different from shell scripts, where command line arguments are referenced as$number
and variables are referenced as$variable_name
. Your testshould be
Your overall approach is flawed. You can’t identify strings that occur only once in the file by looking at two lines at a time. I believe that you can do it by looking at three lines at a time (i.e., by keeping the two previous lines in variables), but things like that get complicated and messy. It’s probably simpler to count occurrences. Here’s a minimal modification on your script to do that.
I deleted
type
, since you never used it.Disclosure: This is essentially the same as the last part of glenn’s answer.