Ubuntu – How to find a single unique line in a file

command linetext processing

I'm trying to find a way to find and print only lines from a file that don't have duplicates. If this is my file:

A
A
B
B
C
C
Y
Z

I am trying to print out only

Y
Z

Unfortunately, I keep getting

A
B
C
Y
Z

I have tried sort -u, sort | uniq -u, and grep | sort | uniq -u with the same results. I was eventually able to achieve my goal of finding the unique line using uniq -c and looking for the line that only appears one time, but I would like to know how to do this properly in the future.

Best Answer

AWK solution

$ awk '{arr[$0]++};END{for(var in arr) if (arr[var] == 1) print var}' input.txt                                          
Y
Z

{arr[$0]++}; creates associative array of line-number pairs. If a line is unique in the file, array item that corresponds to the line value will be 1, otherwise - greater than 1
END block is executed when we have reached end of file. We iterate over array items using for(value in array) loop and print the value if the corresponding array item equals to 1, as mentioned before.

Python 3

Same idea as the awk one. Here we use OrderedDict class to create a dictionary of lines and their counts with preserved order.

#!/usr/bin/env python3
import sys
from collections import OrderedDict

if len(sys.argv) != 2:
   sys.stderr.write(">>> Script requires a file argument")
   sys.exit(1)

for arg in sys.argv[1:]:
    lines = OrderedDict()
    with open(sys.argv[1]) as fd:
        for line in fd:
            tmp = line.strip()
            if tmp in lines.keys():
                lines[tmp] = lines[tmp] + 1
            else:
                lines[tmp] = 1

    for line,count in lines.items():
        if count == 1:
            print(line)

And here it is in action:

$ ./get_unique_lines.py  input.txt                                                                                       
Y
Z

Perl

Again, same idea as Python script, and we're using ordered hash (see also the Tie::IxHash documentation ).

#!/usr/bin/perl
use strict;
use warnings;
use Tie::IxHash;

tie my %linehash, "Tie::IxHash" or die $!;

open(my $fp,'<',$ARGV[0])  or die $!;
while(my $line = <$fp> ){
    chomp $line;
    $linehash{$line}++;
}
close($fp);

for my $key (keys %linehash) {
    printf("%s\n",$key) unless $linehash{$key} > 1;
}

Test run:

$ ./get_unique_lines.pl input.txt                                                                                        
Y
Z

sort and uniq variations

Have been mentioned in the comments multiple times already.

$ sort input.txt | uniq -u                                                                                               
Y
Z

$ uniq -u input.txt                                                                                                      
Y
Z

Related Solutions

Ubuntu – Find out if lines of a file is sorted

#!/usr/bin/perl -w
use strict;

unless ( @ARGV == 1 && -f -r $ARGV[0] ) {
    die "Expected single file argument!\n";
}

my %cols;
my $ind = 0;

while (<>) {
    chomp;
    next if /^\s*($|#)/;
    ( @{ $cols{col1} }[$ind], @{ $cols{col2} }[$ind], @{ $cols{col3} }[$ind] ) = split;
    $ind++;
}

my @sorted1 = map { ${ $cols{col1} }[$_] } sort {
    ${ $cols{col1} }[$a] <=> ${ $cols{col1} }[$b] or
    ${ $cols{col2} }[$a] <=> ${ $cols{col2} }[$b] or
    ${ $cols{col3} }[$a] <=> ${ $cols{col3} }[$b]
} keys @{ $cols{col1} };
my @sorted2 = map { ${ $cols{col2} }[$_] } sort {
    ${ $cols{col1} }[$a] <=> ${ $cols{col1} }[$b] or
    ${ $cols{col2} }[$a] <=> ${ $cols{col2} }[$b] or
    ${ $cols{col3} }[$a] <=> ${ $cols{col3} }[$b]
} keys @{ $cols{col2} };

if ( "@sorted1" eq "@{ $cols{col1} }" and "@sorted2" eq "@{ $cols{col2} }") {
    print "File is sorted!\n"
}
else { print "File is unsorted!\n" };
__END__

If the columns are:

X1 Y1 Z1  
X2 Y2 Z2

Sort will be:

if (x1 > x2) then X1 Y1 Z1 > X2 Y2 Z2
if (X1 == X2) && (Y1 > Y2) then X1 Y1 Z1 > X2 Y2 Z2

To add more columns into the sort order, copy the pattern for the first two. I hope that's what you asked for.

Ubuntu – How to find all patterns between two characters

First of all, your grep -Po '"\K[^"]*' file idea fails because grep sees both "One" and ". the second is here" as being inside quotes. Personally, I'd probably just do

$ grep -oP '"[^"]+"' file | tr -d '"'
One
Two 
 Three 
Four

But that is two commands. To do it with a single command, you could use one of:

Perl
```
$ perl -lne '@F=/"\s*([^"]+)\s*"/g; print for @F' file 
One
Two 
Three 
Four
```
Here, the @F array holds all matches of the regex (a quote, followed by as many non-" as possible until the next "). The print for @F just means "print each element of @F.

Perl

$ perl -F'"' -lne 'for($i=1;$i<=$#F;$i+=2){print $F[$i]}' file 
One
Two 
 Three 
Four

To remove leading/trailing spaces from each match, use this:

perl -F'"' -lne 'for($i=1;$i<=$#F;$i+=2){$F[$i]=~s/^\s*|\s$//; print $F[$i]}' file

Here, Perl is behaving like awk. The -a switch causes it to automatically split input lines into fields on the character given by -F. Since I have given it ", the fields are:

$ perl -F'"' -lne 'for($i=0;$i<=$#F;$i++){print "Field $i: $F[$i]"}' file 
Field 0: first matched is 
Field 1: One
Field 2: . the second is here
Field 3: Two 
Field 0: and here are in second line
Field 1:  Three 
Field 2: 
Field 3: Four
Field 4: .

Because we are looking for text between two consecutive field separators, we know we want every second field. So, for($i=1;$i<=$#F;$i+=2){print $F[$i]} will print the ones we care about.

The same idea but in awk:

$ awk -F'"' '{for(i=2;i<=NF;i+=2){print $(i)}}' file 
One
Two 
 Three 
Four

Best Answer

AWK solution

Python 3

Perl

sort and uniq variations

Related Solutions

Ubuntu – Find out if lines of a file is sorted

Ubuntu – How to find all patterns between two characters

Related Question