Linux – Remove non-duplicate lines in Linux

awklinuxtext manipulationuniq

how can I remove non-duplicate lines from text file using any linux program linke sed, awk or any other?

Example:

abc
bbc
abc
bbc
ccc
bbc

Result:

abc
bbc
abc
bbc
bbc

Second list have removed ccc because it didn't have duplicate lines.

Is it also possible to remove lines, that are non-duplicate AND lines that have only 2 duplicates, and leave those who have more then 2 duplicates lines?

Best Answer

The solutions posted by others do not work on my Debian Jessie: they keep a single copy of any duplicate line, while it is my understanding of the OP that all copies of the duplicate lines are to be kept. If I have understood the OP right, then ...

The following command
```
awk '!seen[$0]++' file
```
removes all duplicate lines.
The following command
```
awk 'seen[$0]++' file 
```
outputs all the duplicates, but not the original copy: i.e., if a line appears n times, it outputs the line n-1 times.
Then the command
```
awk 'seen[$0]++' file > temp && awk '!seen[$0]++' file >> temp
```
solves your problem. The lines are not in the original order.
If you want lines which have two or more duplicates, you can now iterate the above:
```
awk 'seen[$0]++' file | awk 'seen[$0]++' > temp
```
keeps n-2 copies of the lines which have n>1 duplicates. Now
```
awk '!seen[$0]++' temp > temp1 
```
removes all duplicate lines from the temp file, and you can now obtain what you wish (i.e. only the lines with n>1 duplicates) as follows:
```
cat temp1 >> temp; cat temp1 >> temp
```
If you need to do this for lines which appear N or more times, the following command
```
  awk 'seen[$0]++ && seen[$0] > N' file 
```
is simpler than chaining N times the command awk 'seen[$0]++' file.

Related Solutions

Linux – Remove duplicate files, but only if they are in same folder

You can use fdupes without -r so it doesn't descend to subdirectories. This prints a list of duplicate files:

find . -type d -exec fdupes -n {} \;

-n ignores empty files. Add -dN (--delete --noprompt) to delete all except the first duplicate file.

You can install fdupes on OS X with brew install fdupes.

Replace Line Breaks on matching lines, using AWK

The trick, in this case, is not to think of it as “working on multiple lines”.

awk '/"555"$/ {print; next} {printf "%s ", $0}'

prints each line that is the last line of a group normally, and then says next to go on to the next input line (not processing the remaining command). The remaining command, which becomes a default case, prints the current line, followed by a space, without a newline. This could also be done as

awk '{if ($0 ~ /"555"$/) print; else printf "%s ", $0}'

Best Answer

Related Solutions

Linux – Remove duplicate files, but only if they are in same folder

Replace Line Breaks on matching lines, using AWK

Related Question