I have a big txt file in which values are are repeating many times. Is there some command that I can use that will go through file and if one value appears once do not repeat it again?
SO4
HOH
CL
BME
HOH
SO4
HOH
CL
BME
HOH
SO4
HOH
SO4
HOH
CL
BME
HOH
SO4
HOH
CL
BME
HOH
CL
So it should look something like this:
S04
HOH
CL
BME
The thing is that I have huge number of different values, so can't do it manualy like here.
Best Answer
You could use the command
sort
with the option--unique
:If you want to write result to FILE instead of standard output, use the option
--output=FILE
:The command
uniq
also could be applied. In this case the identical lines must be consequential, so the input must be sorted preliminary - thanks to @RonJohn for this note:I like the
sort
command for similar cases, because of its simplicity, but if you work with large arrays theawk
approach from John1024's answer could be more powerful. Here is a time comparison between the mentioned approaches, applied on a file (based on the above example) with almost 5 million lines:Other significant difference is that mentioned by @Ruslan:
Here is an illustration:
In the above example, the loop (shown below) generates 500 random combinations, each with a length of three characters, of the letters A-D. These combinations are piped to
awk
orsort
.