Postgresql – List distinct column values where those rows share other column values

postgresql

I'm trying to write a query to detect possibly-invalid data in a PostgreSQL table. We have a table of city names like this:

# `city_names`
id    | name     |   language | dialect | city_id
------------------------------------------------
01    | London   |   A        | A1      | 1
02    | London   |   A        | A2      | 1
03    | London   |   B        | B1      | 2
04    | London   |   B        | B2      | 3

In our domain:

  • It's fine that rows 01 and 02 both map "London" to city 1; the dialects don't happen to differ
  • It's fine that row 03 maps "London" to city 2; in that language, the name may refer to a different city
  • It's suspicious that row 04 maps "London" to city 3, because we already have a mapping to city 2 in the same language

I want to write a query that selects only rows 03 and 04 so that a human can decide whether one of them points to the wrong city.

I can solve this problem procedurally, but I'm having trouble doing it in SQL. For example, if I GROUP BY language and name, I lose the city_id values from the individual rows.

Basically my goal is: "If there's more than one city_id for the same name and language, list those city_ids."

How can I do this?

Best Answer

This query will do. The trick is to use COUNT(DISTINCT city_id):

SQL Fiddle

PostgreSQL 8.3.20 Schema Setup:

CREATE TABLE Table1
    ("id" int, "name" varchar(6), "language" varchar(1), "dialect" varchar(2), "city_id" int)
;

INSERT INTO Table1
    ("id", "name", "language", "dialect", "city_id")
VALUES
    (01, 'London', 'A', 'A1', 1),
    (02, 'London', 'A', 'A2', 1),
    (03, 'London', 'B', 'B1', 2),
    (04, 'London', 'B', 'B2', 3)
;

Query 1:

select t.*, d.dups
FROM table1 t INNER JOIN
(
  select name, language, count(distinct city_id) as dups
  from table1
  group by name, language
  having count(distinct city_id) > 1
) d
ON t.name = d.name and t.language = d.language 

Results:

| ID |   NAME | LANGUAGE | DIALECT | CITY_ID | DUPS |
|----|--------|----------|---------|---------|------|
|  3 | London |        B |      B1 |       2 |    2 |
|  4 | London |        B |      B2 |       3 |    2 |