PostgreSQL – How to Get a Unique Array

arraypostgresql

The following

SELECT ARRAY[a,b,c,d]
FROM ( VALUES
  ('foo', 'bar', 'foo', 'baz' )
) AS t(a,b,c,d);

Returns {foo,bar,foo,baz} of type text[]. I would like to get {foo,bar,baz} of type text[] with one of the duplicate bar elements removed? Does PostgreSQL have a unique function that works on a text-array, or an anyarray of anyelement?

Best Answer

While there is no function to accomplish that, you can use

unnest() to convert an array of elements, to a table of rows of one-column,
DISTINCT to remove duplicates
ARRAY(query) to recreate the row.

That idiom looks like,

ARRAY( SELECT DISTINCT ... FROM unnest(arr) )

And in practice is applied like this,

SELECT ARRAY(SELECT DISTINCT e FROM unnest(ARRAY[a,b,c,d]) AS a(e))
FROM ( VALUES
  ('foo', 'bar', 'foo', 'baz' )
) AS t(a,b,c,d);

If you want it sorted you can do,

SELECT ARRAY(SELECT DISTINCT e FROM unnest(ARRAY[a,b,c,d]) AS a(e) ORDER BY e)
FROM ( VALUES
  ('foo', 'bar', 'foo', 'baz' )
) AS t(a,b,c,d);

And that can all can be written with CROSS JOIN LATERAL which is much cleaner,

SELECT ARRAY(
  SELECT DISTINCT e
  FROM ( VALUES
    ('foo', 'bar', 'foo', 'baz' )
  ) AS t(a,b,c,d)
  CROSS JOIN LATERAL unnest(ARRAY[a,b,c,d]) AS a(e)
  -- ORDER BY e; -- if you want it sorted
);

Answer inspired by RhodiumToad on irc.freenode.net/#postgresql

`DISTINCT ON()`

Just as a side note, this is precisely what DISTINCT ON() does (not to be confused with DISTINCT)

SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. For example

So if you were to write,

SELECT myFirstAgg(z)
FROM foo
GROUP BY x,y;

It's effectively

SELECT DISTINCT ON(x,y) z
FROM foo;
-- ORDER BY z;

In that it takes the first z. There are two important differences,

You can also select other columns at no cost of further aggregation..

SELECT DISTINCT ON(x,y) z, k, r, t, v
FROM foo;
-- ORDER BY z, k, r, t, v;

Because there is no GROUP BY you can not use (real) aggregates with it.

CREATE TABLE foo AS
SELECT * FROM ( VALUES
  (1,2,3),
  (1,2,4),
  (1,2,5)
) AS t(x,y,z);

SELECT DISTINCT ON (x,y) z, sum(z)
FROM foo;

-- fails, as you should expect.
SELECT DISTINCT ON (x,y) z, sum(z)
FROM foo;

-- would not otherwise fail.
SELECT myFirstAgg(z), sum(z)
FROM foo
GROUP BY x,y;

Don't forget `ORDER BY`

Also, while I didn't bold it then I will now

Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. For example

Always use an ORDER BY with DISTINCT ON

Using an Ordered-Set Aggregate Function

I imagine a lot of people are looking for first_value, Ordered-Set Aggregate Functions. Just wanted to throw that out there. It would look like this, if the function existed:

SELECT a, b, first_value() WITHIN GROUP (ORDER BY z)    
FROM foo
GROUP BY a,b;

But, alas you can do this.

SELECT a, b, percentile_disc(0) WITHIN GROUP (ORDER BY z)   
FROM foo
GROUP BY a,b;

PostgreSQL – Pattern Match Against Array Elements

You can accomplish what you're asking for by doing something like this.

I created a table and data to help better illustrate what I'm doing.

CREATE TABLE logs (id serial NOT NULL PRIMARY KEY, lognum int, flightnum int);

INSERT INTO logs (lognum, flightnum) VALUES (1,6);
INSERT INTO logs (lognum, flightnum) VALUES (1,7);
INSERT INTO logs (lognum, flightnum) VALUES (1,8);
INSERT INTO logs (lognum, flightnum) VALUES (2,80);
INSERT INTO logs (lognum, flightnum) VALUES (3,12);
INSERT INTO logs (lognum, flightnum) VALUES (4,8008);

postgres@[local]:5432:postgres:=# SELECT * FROM logs;
 id | lognum | flightnum 
----+--------+-----------
 13 |      1 |         6
 14 |      1 |         7
 15 |      1 |         8
 16 |      2 |        80
 17 |      3 |        12
 18 |      4 |      8008
(6 rows)

Time: 0.188 ms
postgres@[local]:5432:postgres:=#

Note that 1, 2, and 4 have flights with 8, 80, and 8008.

Now, using this query, find the lognums with 8 as a flightnum. This takes your original query with an array_agg, wraps another query around it to generates subscripts for each member in the array, which can be arbitrarily large. Finally, an outer query wraps that, which uses the generated subscript to allow you to do a comparison against each of the members of the flightnums array to see if they are LIKE '8%'.

SELECT lognum FROM ( 
                      SELECT lognum, 
                      flightnums, 
                      generate_subscripts(flightnums, 1) AS s 
                      FROM ( 
                             SELECT lognum, 
                                    array_agg(flightnum) AS flightnums 
                                    FROM logs 
                                         GROUP BY lognum
                           ) AS t1
                   ) AS t2 
            WHERE flightnums[s]::text LIKE '8%' ORDER BY lognum;

Which gives you the following output

postgres@[local]:5432:postgres:=# SELECT lognum FROM (SELECT lognum, flightnums, generate_subscripts(flightnums, 1) AS s FROM ( SELECT lognum, array_agg(flightnum) AS flightnums FROM logs GROUP BY lognum) AS t1) AS t2 WHERE flightnums[s]::text LIKE '8%' ORDER BY lognum;
 lognum 
--------
      1
      2
      4
(3 rows)

Time: 0.338 ms
postgres@[local]:5432:postgres:=#

As would be expected from the data above.

For further array manipulation needs, I would recommend reading up on their chapter in the PostgreSQL documentation arrays

Best Answer

Related Solutions

PostgreSQL – Is There a Type-Safe first() Aggregate Function?

DISTINCT ON()

Don't forget ORDER BY

Using an Ordered-Set Aggregate Function

PostgreSQL – Pattern Match Against Array Elements

Related Question

`DISTINCT ON()`

Don't forget `ORDER BY`