I'd like to remove from a given column ($2 in the example) the duplicate fields (comma separated).
Input file:
A 1,2,3,4
B 4,5,6,3
C 2,15
Expected output:
A 1,2,3,4
B 5,6
C 15
awk
I'd like to remove from a given column ($2 in the example) the duplicate fields (comma separated).
Input file:
A 1,2,3,4
B 4,5,6,3
C 2,15
Expected output:
A 1,2,3,4
B 5,6
C 15
Best Answer
You can run the above like so:
How it works
First calling
perl
with-lpe
does the following 3 things.-l[octal]
enable line ending processing, specifies line terminator-p
assume loop like -n but print line also, like sed-e program
one line of program (several -e's allowed, omit programfile)This essentially take the file in, strips off the newlines, operates on a line, and then tacks a newline character back onto it when it's done. So it's just looping through the file and executing our Perl code against each in turn.
As for the actual Perl code:
\s
means a spacing character (the five characters[ \f\n\r\t]
and\v
in newer versions ofperl
, like[[:space:]]
).\K
Keep the stuff left of the \K, don't include it in $&\S+
one or more characters not in the set [ \f\n\r\t\v]The
join ",",
is going to take the results and rejoin each field so that it's separated by a comma.The
split ",", $&
will take the matches that were found by the\S+
and split them into just the fields, without the comma.The
grep {!$seen{$_}++}
will take each field's number, add it to the hash,$seen{}
where each field's number is$_
as we go through each of them. Each time a field number is "seen" it's counted via the++
operator,$seen{$_}++
.The
grep{!$seen{$_}++}
will return a field value if it's only been seen once.Modified to see what's happening
If you use this modified abomination you can see what's going on as this Perl one liner moves across the lines from the file.
This is showing you the contents of
$seen{}
at the end of processing a line from the file. Let's take the 2nd line of the file.And here's what my modified version shows that line as:
So this is saying that we've seen field # 6 (1 time), field # 4 (2 times), etc. and field # 5 (1 time). So when
grep{...}
returns the results it will only return results from this array if it was present in this line (4,5,6,3) and if we've seen it only 1 time (6,1,15,5). The intersection of these 2 lists is (5,6) and so that's what gets returned bygrep
.References