Ubuntu – Command that will only print value once although it appears many times

bashcommand line

I have a big txt file in which values are are repeating many times. Is there some command that I can use that will go through file and if one value appears once do not repeat it again?

SO4
HOH
CL
BME
HOH
SO4
HOH
CL
BME
HOH
SO4
HOH
SO4
HOH
CL
BME
HOH
SO4
HOH
CL
BME
HOH
CL

So it should look something like this:

S04   
HOH  
CL   
BME 

The thing is that I have huge number of different values, so can't do it manualy like here.

Best Answer

You could use the command sort with the option --unique:

sort -u input-file

If you want to write result to FILE instead of standard output, use the option --output=FILE:

sort -u input-file -o output-file

The command uniq also could be applied. In this case the identical lines must be consequential, so the input must be sorted preliminary - thanks to @RonJohn for this note:

sort input-file | uniq > output-file

I like the sort command for similar cases, because of its simplicity, but if you work with large arrays the awk approach from John1024's answer could be more powerful. Here is a time comparison between the mentioned approaches, applied on a file (based on the above example) with almost 5 million lines:

$ cat input-file | wc -l
20000000

$ TIMEFORMAT=%R
$ time sort -u input-file | wc -l
64
7.495

$ time sort input-file | uniq | wc -l
64
7.703

$ time awk '!a[$0]++' input-file | wc -l      # from John1024's answer
64
1.271

$ time datamash rmdup 1 < input-file | wc -l  # from αғsнιη's answer
64
0.770

Other significant difference is that mentioned by @Ruslan:

sort -u will only print the result once the input has ended, while this awk command will do print each new result line on the fly (this may be more important for piped input than file).

Here is an illustration:

enter image description here

In the above example, the loop (shown below) generates 500 random combinations, each with a length of three characters, of the letters A-D. These combinations are piped to awk or sort.

for i in {1..500}; do cat /dev/urandom | tr -dc A-D | head -c 3; echo; done
Related Question