how can I remove non-duplicate lines from text file using any linux program linke sed, awk or any other?
Example:
abc
bbc
abc
bbc
ccc
bbc
Result:
abc
bbc
abc
bbc
bbc
Second list have removed ccc because it didn't have duplicate lines.
Is it also possible to remove lines, that are non-duplicate AND lines that have only 2 duplicates, and leave those who have more then 2 duplicates lines?
Best Answer
The solutions posted by others do not work on my Debian Jessie: they keep a single copy of any duplicate line, while it is my understanding of the OP that all copies of the duplicate lines are to be kept. If I have understood the OP right, then ...
The following command
removes all duplicate lines.
The following command
outputs all the duplicates, but not the original copy: i.e., if a line appears n times, it outputs the line n-1 times.
Then the command
solves your problem. The lines are not in the original order.
If you want lines which have two or more duplicates, you can now iterate the above:
keeps n-2 copies of the lines which have n>1 duplicates. Now
removes all duplicate lines from the temp file, and you can now obtain what you wish (i.e. only the lines with n>1 duplicates) as follows:
If you need to do this for lines which appear N or more times, the following command
is simpler than chaining N times the command
awk 'seen[$0]++' file
.