DISTINCT ON()
Just as a side note, this is precisely what DISTINCT ON()
does (not to be confused with DISTINCT
)
SELECT DISTINCT ON ( expression [, ...] )
keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON
expressions are interpreted using the same rules as for ORDER BY
(see above). Note that the "first row" of each set is unpredictable unless ORDER BY
is used to ensure that the desired row appears first. For example
So if you were to write,
SELECT myFirstAgg(z)
FROM foo
GROUP BY x,y;
It's effectively
SELECT DISTINCT ON(x,y) z
FROM foo;
-- ORDER BY z;
In that it takes the first z
. There are two important differences,
You can also select other columns at no cost of further aggregation..
SELECT DISTINCT ON(x,y) z, k, r, t, v
FROM foo;
-- ORDER BY z, k, r, t, v;
Because there is no GROUP BY
you can not use (real) aggregates with it.
CREATE TABLE foo AS
SELECT * FROM ( VALUES
(1,2,3),
(1,2,4),
(1,2,5)
) AS t(x,y,z);
SELECT DISTINCT ON (x,y) z, sum(z)
FROM foo;
-- fails, as you should expect.
SELECT DISTINCT ON (x,y) z, sum(z)
FROM foo;
-- would not otherwise fail.
SELECT myFirstAgg(z), sum(z)
FROM foo
GROUP BY x,y;
Don't forget ORDER BY
Also, while I didn't bold it then I will now
Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. For example
Always use an ORDER BY
with DISTINCT ON
Using an Ordered-Set Aggregate Function
I imagine a lot of people are looking for first_value
, Ordered-Set Aggregate Functions. Just wanted to throw that out there. It would look like this, if the function existed:
SELECT a, b, first_value() WITHIN GROUP (ORDER BY z)
FROM foo
GROUP BY a,b;
But, alas you can do this.
SELECT a, b, percentile_disc(0) WITHIN GROUP (ORDER BY z)
FROM foo
GROUP BY a,b;
You can accomplish what you're asking for by doing something like this.
I created a table and data to help better illustrate what I'm doing.
CREATE TABLE logs (id serial NOT NULL PRIMARY KEY, lognum int, flightnum int);
INSERT INTO logs (lognum, flightnum) VALUES (1,6);
INSERT INTO logs (lognum, flightnum) VALUES (1,7);
INSERT INTO logs (lognum, flightnum) VALUES (1,8);
INSERT INTO logs (lognum, flightnum) VALUES (2,80);
INSERT INTO logs (lognum, flightnum) VALUES (3,12);
INSERT INTO logs (lognum, flightnum) VALUES (4,8008);
postgres@[local]:5432:postgres:=# SELECT * FROM logs;
id | lognum | flightnum
----+--------+-----------
13 | 1 | 6
14 | 1 | 7
15 | 1 | 8
16 | 2 | 80
17 | 3 | 12
18 | 4 | 8008
(6 rows)
Time: 0.188 ms
postgres@[local]:5432:postgres:=#
Note that 1, 2, and 4 have flights with 8, 80, and 8008.
Now, using this query, find the lognums
with 8 as a flightnum
. This takes your original query with an array_agg
, wraps another query around it to generates subscripts for each member in the array, which can be arbitrarily large. Finally, an outer query wraps that, which uses the generated subscript to allow you to do a comparison against each of the members of the flightnums
array to see if they are LIKE '8%'
.
SELECT lognum FROM (
SELECT lognum,
flightnums,
generate_subscripts(flightnums, 1) AS s
FROM (
SELECT lognum,
array_agg(flightnum) AS flightnums
FROM logs
GROUP BY lognum
) AS t1
) AS t2
WHERE flightnums[s]::text LIKE '8%' ORDER BY lognum;
Which gives you the following output
postgres@[local]:5432:postgres:=# SELECT lognum FROM (SELECT lognum, flightnums, generate_subscripts(flightnums, 1) AS s FROM ( SELECT lognum, array_agg(flightnum) AS flightnums FROM logs GROUP BY lognum) AS t1) AS t2 WHERE flightnums[s]::text LIKE '8%' ORDER BY lognum;
lognum
--------
1
2
4
(3 rows)
Time: 0.338 ms
postgres@[local]:5432:postgres:=#
As would be expected from the data above.
For further array manipulation needs, I would recommend reading up on their chapter in the PostgreSQL documentation arrays
Best Answer
While there is no function to accomplish that, you can use
unnest()
to convert an array of elements, to a table of rows of one-column,DISTINCT
to remove duplicatesARRAY(query)
to recreate the row.That idiom looks like,
And in practice is applied like this,
If you want it sorted you can do,
And that can all can be written with
CROSS JOIN LATERAL
which is much cleaner,