PostgreSQL – Show Only Duplicate Values

duplicationpostgresql

How to select rows with duplicate values (non-distinct rows) and summarize with a count of the number of duplicate values in order of the most frequent duplicates?

Given a table of:

  • person_
    • name_
    • favorite_color_

…with this data:

Bob       Brown
Lucy      Lavender
Orin      Orange
Liam      Lavender
Betty     Brown
Blair     Brown

…show:

3         Brown
2         Lavender

We omit “Orange” from the list as it is distinct, with no duplicate siblings.

I am most interested in Postgres, but I expect this can be done in standard SQL.

Best Answer

That can be done with a simple SELECT statement.

SELECT count(*) AS count_ , favorite_color_ 
FROM person_ 
GROUP BY favorite_color_
HAVING count(*) > 1
ORDER BY count_ DESC
;

The count(*) command is an aggregate function supported by Postgres and gives the number of rows with duplicate values in a column specified by the GROUP BY clause.

To omit the values without any duplicates, add the HAVING clause. A HAVING is similar to a WHERE but applies to each row being generated to represent the grouping.

To get results to sort with most frequent duplicate values at the top, use the ORDER BY clause. We reference the column alias count_ that we created in the first line with the AS clause.