Shell – How to get the unique count of a particular part of a string

grepscriptingshellsorttext processing

I have a set of data in a file.

psf7433-nlhrms
unit7433-nobody
unit7333-opera
bpx7333-operations
app7333-osm
unit7330-partners
psf7331-pdesmond
unit7333-projm
mnp7330-redirect
unit7333-retailbanking
cpq7333-rkarmer
unit6333-sales
ring7323-support
unit7133-telco
post7323-uadb
sun7335-ukhrms
burp7133-wfnmreply

How to ignore the starting alphabetic characters in each line and the characters after the numeric and get the count of the unique numbers.
(or)
How to retrieve only the numeric value in each line and get their unique count.

Considering we manage to extract only the numeric values, we will get this.

7433
7433
7333
7333
7333
7330
7331
7333
7330
7333
7333
6333
7323
7133
7323
7335
7133

Now, I want the unique count of the retrieved numeric values. So ignoring the repetitions, I should get the following final output.

8

I am unable to do this either by using awk or sed or even simple grep | cut

I do not want the list of extracted values, I want only the final count as the answer.

Help me!

Best Answer

With grep, filter out just the numbers:

grep -Eo '[0-9]+-' file | sort -u | wc -l
  • [0-9] Matches any character between 0 and 9 (any digit).
  • + in extended regular expressions stands for at least one character (that's why the -E option is used with grep). So [0-9]+- matches one or more digits, followed by -.
  • -o only prints the part that matched your pattern, so given input abcd23-gf56, grep will only print 23-.
  • sort -u sorts and filters unique entries (due to -u), and wc -l counts the number of lines in input (hence, the number of unique entries).
Related Question