PostgreSQL – Using DISTINCT and ORDER BY in json_agg

jsonpostgresql

I'm trying to get a sorted aggregate of unique values in postgresql. SQL Fiddle

The result should be a json array of unique objects, sorted by index field. The value field can be used as unique key in this case if needed.

The structure and data is similar to this:

CREATE TABLE test (
  index INTEGER,
  value INTEGER,
  tag VARCHAR
);

-- In the real query, this table is the result of a join. 
-- That is the explanation for duplicated rows.
INSERT INTO test VALUES
  (1, 1, 'a'),
  (1, 1, 'a'),
  (1, 1, 'a'),
  (2, 1, 'a'),
  (3, 2, 'a')
;

Getting an aggregate of distinct rows works:

SELECT json_agg(DISTINCT test.*) FROM test GROUP BY tag;

[{"index":1,"value":1,"tag":"a"}, {"index":2,"value":1,"tag":"a"}, {"index":3,"value":2,"tag":"a"}]

Order by without distinct also works:

SELECT json_agg(test.* ORDER BY index) FROM test GROUP BY tag;

[{"index":1,"value":1,"tag":"a"}, {"index":1,"value":1,"tag":"a"}, {"index":1,"value":1,"tag":"a"}, {"index":2,"value":1,"tag":"a"}, {"index":3,"value":2,"tag":"a"}]

The problem happens if using both:

SELECT json_agg(DISTINCT test.* ORDER BY index) FROM test GROUP BY tag;

ERROR: in an aggregate with DISTINCT, ORDER BY expressions must appear in argument list Position: 42

If I try to add the ORDER BY column I get an error, since json_agg expects only one value:

SELECT json_agg(DISTINCT (SELECT test.index, test.*) ORDER BY index) FROM test GROUP BY tag;

ERROR: subquery must return only one column Position: 28

Best Answer

WITH cte AS (SELECT DISTINCT * FROM test)
SELECT json_agg(cte.* ORDER BY index) FROM cte GROUP BY tag;

"VAL_X" and "VAL_Y" chosen through some aggregate function

You should consider using GROUP BY for the columns whose values you consider that should be "distinct" (as a group), and, for the rest of columns, choose an appropriate aggregate function (for instance, MIN):

CREATE TABLE my_result AS 
SELECT
  city, street, streetnum, min(val_x) AS val_x, min(val_y) AS val_y
FROM
  tableA
WHERE
  true /* your condition goes here */ 
GROUP BY
  city, street, streetnum

If you need to put together values from several tables, UNION ALL of them before you GROUP BY:

CREATE TABLE my_result AS 
SELECT
  city, street, streetnum, min(val_x) AS val_x, min(val_y) AS val_y
FROM
  (
  SELECT city, street, streetnum, val_x, val_y FROM tableA
  UNION ALL
  SELECT city, street, streetnum, val_x, val_y FROM tableB
  UNION ALL
  SELECT city, street, streetnum, val_x, val_y FROM tableC
  ) AS s0
WHERE
  true /* your condition goes here */ 
GROUP BY
  city, street, streetnum ;

Using always "VAL_X" and "VAL_Y" from same row, using a WINDOW

If you need to make sure your values are always from the same row, the best way is to use a WINDOW in your query: PARTITION BY "CITY", "STREET", "STREET_NUM" and ORDER BY "VAL_X", "VAL_Y", and choose the first row of every partition.

You can do this with two steps:

1) Add the row_num() to every partition:

SELECT 
  *,   
  (row_number() OVER (PARTITION BY "CITY", "STREET", "STREET_NUM" ORDER BY "VAL_X", "VAL_Y")) AS rn
FROM 
  table_a

  |  CITY |     STREET | STREET_NUM | VAL_X | VAL_Y | rn |
  |-------|------------|------------|-------|-------|----|
  | CityA | Street abc |          5 |  11.5 |   0.5 |  1 |
  | CityA | Street abc |          5 |  12.4 |   2.8 |  2 |
  | CityA | Street abc |          5 |  15.4 |   1.8 |  3 |
  | CityB | Street xyz |         18 |   5.4 |   1.9 |  1 |
  | CityB | Street xyz |         18 |   8.4 |   1.1 |  2 |
  | CityC | Street klm |         55 |   9.6 |   0.8 |  1 |

2) At this point, choose only the rows WHERE rn=1 (and ORDER them, if necessary):

SELECT
   "CITY", "STREET", "STREET_NUM", "VAL_X", "VAL_Y"
FROM
  (
  SELECT 
    *,   
    (row_number() OVER (PARTITION BY "CITY", "STREET", "STREET_NUM" ORDER BY "VAL_X", "VAL_Y")) AS rn
  FROM 
    table_a
  ) AS table_a_grouped 
WHERE
  rn = 1
ORDER BY 
  "CITY", "STREET", "STREET_NUM"

The result is:

|  CITY |     STREET | STREET_NUM | VAL_X | VAL_Y |
|-------|------------|------------|-------|-------|
| CityA | Street abc |          5 |  11.5 |   0.5 |
| CityB | Street xyz |         18 |   5.4 |   1.9 |
| CityC | Street klm |         55 |   9.6 |   0.8 |

You can see the example at SQLFiddle

Best Answer

Related Solutions

PostgreSQL – Insert Distinct Values from One Table into Another with Constraints

PostgreSQL – Create Table as Select with DISTINCT on Specific Columns

"VAL_X" and "VAL_Y" chosen through some aggregate function

Using always "VAL_X" and "VAL_Y" from same row, using a WINDOW

Related Question