How to remove a specific, duplicated line within a file

duplicatetext processinguniq

I'm looking for a way to remove one specific line from a bunch of files, but only if it occurs more than once in that file. Other lines should be kept, even if they are duplicates.

For example, a file like this where I would like to remove the duplicates of AAA

AAA
BBB
AAA
BBB
CCC

should become

AAA
BBB
BBB
CCC

I guess I should use sed but I have no idea how to write the command.

Best Answer

With GNU sed:

sed '0,/^AAA$/b;//d'

That is, let everything through (b branches off like a continue) up to the first AAA (from the 0th line (that is even before the first line) and the first one matching /^AAA$/ (which could be the first line)), and then for the remaining lines, delete every occurrence of AAA (an empty // pattern reuses the last pattern).

GNU sed is needed for the 0 address (and the ability to have other commands after the b one in the same expression, though that could be easily worked around in other implementations by using two -e expressions)

With awk:

awk '$0 != "AAA" || !n++'

(or for a regexp pattern: awk '!/^AAA$/ || !n++')

a shorthand for:

awk '! (&0 == "AAA" && count > 0) {print; count++}'
Related Question