PostgreSQL – Create table as select with distinct on specific columns

postgresql

I need a script to create a table . The problem is my select for creating the table contains equal values in some rows. I need to use distinct for these columns. The other columns' values can come from any matching row.

My current result table has data like this:

|   CITY   |   STREET   |   STREET_NUM   |   VAL_X   |   VAL_Y   |
------------------------------------------------------------------
| CityA    | Street abc |   5            | 11.5      |   0.5     |
| CityA    | Street abc |   5            | 15.4      |   1.8     |
| CityA    | Street abc |   5            | 12.4      |   2.8     |
| CityB    | Street xyz |   18           |  5.4      |   1.9     |
| CityB    | Street xyz |   18           |  8.4      |   1.1     |
| CityC    | Street klm |   55           |  9.6      |   0.8     |

But I need data like this:

|   CITY   |   STREET   |   STREET_NUM   |   VAL_X   |   VAL_Y   |
------------------------------------------------------------------
| CityA    | Street abc |   5            | 11.5      |   0.5     |
| CityB    | Street xyz |   18           |  5.4      |   1.9     |
| CityC    | Street klm |   55           |  9.6      |   0.8     |

For columns city, street and street_num I need to apply distinct. val_x and val_y should be used anyone, for example first of that group with same city, street and street_num.

Can you give me advice how to edit this script?

Best Answer

"VAL_X" and "VAL_Y" chosen through some aggregate function

You should consider using GROUP BY for the columns whose values you consider that should be "distinct" (as a group), and, for the rest of columns, choose an appropriate aggregate function (for instance, MIN):

CREATE TABLE my_result AS 
SELECT
  city, street, streetnum, min(val_x) AS val_x, min(val_y) AS val_y
FROM
  tableA
WHERE
  true /* your condition goes here */ 
GROUP BY
  city, street, streetnum

If you need to put together values from several tables, UNION ALL of them before you GROUP BY:

CREATE TABLE my_result AS 
SELECT
  city, street, streetnum, min(val_x) AS val_x, min(val_y) AS val_y
FROM
  (
  SELECT city, street, streetnum, val_x, val_y FROM tableA
  UNION ALL
  SELECT city, street, streetnum, val_x, val_y FROM tableB
  UNION ALL
  SELECT city, street, streetnum, val_x, val_y FROM tableC
  ) AS s0
WHERE
  true /* your condition goes here */ 
GROUP BY
  city, street, streetnum ;

Using always "VAL_X" and "VAL_Y" from same row, using a WINDOW

If you need to make sure your values are always from the same row, the best way is to use a WINDOW in your query: PARTITION BY "CITY", "STREET", "STREET_NUM" and ORDER BY "VAL_X", "VAL_Y", and choose the first row of every partition.

You can do this with two steps:

1) Add the row_num() to every partition:

SELECT 
  *,   
  (row_number() OVER (PARTITION BY "CITY", "STREET", "STREET_NUM" ORDER BY "VAL_X", "VAL_Y")) AS rn
FROM 
  table_a

  |  CITY |     STREET | STREET_NUM | VAL_X | VAL_Y | rn |
  |-------|------------|------------|-------|-------|----|
  | CityA | Street abc |          5 |  11.5 |   0.5 |  1 |
  | CityA | Street abc |          5 |  12.4 |   2.8 |  2 |
  | CityA | Street abc |          5 |  15.4 |   1.8 |  3 |
  | CityB | Street xyz |         18 |   5.4 |   1.9 |  1 |
  | CityB | Street xyz |         18 |   8.4 |   1.1 |  2 |
  | CityC | Street klm |         55 |   9.6 |   0.8 |  1 |

2) At this point, choose only the rows WHERE rn=1 (and ORDER them, if necessary):

SELECT
   "CITY", "STREET", "STREET_NUM", "VAL_X", "VAL_Y"
FROM
  (
  SELECT 
    *,   
    (row_number() OVER (PARTITION BY "CITY", "STREET", "STREET_NUM" ORDER BY "VAL_X", "VAL_Y")) AS rn
  FROM 
    table_a
  ) AS table_a_grouped 
WHERE
  rn = 1
ORDER BY 
  "CITY", "STREET", "STREET_NUM" 

The result is:

|  CITY |     STREET | STREET_NUM | VAL_X | VAL_Y |
|-------|------------|------------|-------|-------|
| CityA | Street abc |          5 |  11.5 |   0.5 |
| CityB | Street xyz |         18 |   5.4 |   1.9 |
| CityC | Street klm |         55 |   9.6 |   0.8 |

You can see the example at SQLFiddle