Trying to sort two list of numbers and using uniq to get the intersection

sortuniq

I have a file A and B so I used to the following command…

(sort -n A B) | uniq -d

which should give me the numbers which occur in both files.

These are the numbers I get from sort -n A B but when I pipe it to uniq -d I only get 11 and not 2. What am I doing wrong?

Best Answer

As the comments indicate, the problem seems likely to be blanks or carriage returns. Either of the following should do the trick:

$ (sort -n A B) | sed -E 's/[^[:alnum:]]+$//' | uniq -d
$ (sort -n A B) | tr -d '\r ' | uniq -d

Some flavors of GNU sed use -r instead to get Extended Regular Expressions. tr is certainly simpler but also more brutal in that it removes the characters whether or not they're trailing.

Related Solutions

Sort – How to Sort on Two Fields, Second Then First

A key specification like -k2 means to take all the fields from 2 to the end of the line into account. So Villamor 44 ends up before Villamor 50. Since these two are not equal, the first comparison in sort -k2 -k1 is enough to discriminate these two lines, and the second sort key -k1 is not invoked. If the two Villamors had had the same age, -k1 would have caused them to be sorted by first name.

To sort by a single column, use -k2,2 as the key specification. This means to use the fields from #2 to #2, i.e. only the second field.

sort -k2 -k3 <people.txt is redundant: it's equivalent to sort -k2 <people.txt. To sort by last names, then first names, then age, run the following command:

sort -k2,2 -k1,1 <people.txt

or equivalently sort -k2,2 -k1 <people.txt since there are only these three fields and the separators are the same. In fact, you will get the same effect from sort -k2,2 <people.txt, because sort uses the whole line as a last resort when all the keys in a subset of lines are identical.

Also note that the default field separator is the transition between a non-blank and a blank, so the keys will include the leading blanks (in your example, for the first line, the first key will be "Emily", but the second key " Bedford". Add the -b option to strip those blanks:

sort -b -k2,2 -k1,1

It can also be done on a per-key basis by adding the b flag at the end of the key start specification:

sort -k2b,2 -k1,1 <people.txt

But something to bear in mind: as soon as you add one such flag to the key specification, the global flags (like -n, -r...) no longer apply to them so it's better to avoid mixing per-key flags and global flags.

Sort and Uniq in Awk – How to Use

To sort you can use a pipe also inside of an awk command, as in:

awk '{ print ... | "sort ..." }'

The syntax means that all respective lines of the data file will be passed to the same instance of sort.

Of course you can also do that equivalently on shell level:

awk '{ print ... }' | sort ...

Or you can use GNU awk which has a couple sort functions natively defined.

The uniq is in awk typically accomplished by saving the "unique data element or key" in an associative array and checking whether new data need to be memorized. One example to illustrate:

awk '!a[$0]++'

This means: If the current line is not in the array then the condition is true and the default action to print the line triggered. Subsequent lines with the same data will result in a false condition and the data will not be printed.

Best Answer

Related Solutions

Sort – How to Sort on Two Fields, Second Then First

Sort and Uniq in Awk – How to Use

Related Question