I was just reviewing some old code written for pre-8.4 PostgreSQL, and I saw something really nifty. I remember having a custom function do some of this back in the day, but I forgot what pre-array_agg()
looked like. For review, modern aggregation is written like this.
SELECT array_agg(x ORDER BY x DESC) FROM foobar;
However, once upon a time, it was written like this,
SELECT ARRAY(SELECT x FROM foobar ORDER BY x DESC);
So, I tried it with some test data..
CREATE TEMP TABLE foobar AS
SELECT * FROM generate_series(1,1e7)
AS t(x);
The results were surprising.. The #OldSchoolCool way was massively faster: a 25% speedup. Moreover, simplifying it without the ORDER, showed the same slowness.
# EXPLAIN ANALYZE SELECT ARRAY(SELECT x FROM foobar);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Result (cost=104425.28..104425.29 rows=1 width=0) (actual time=1665.948..1665.949 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Seq Scan on foobar (cost=0.00..104425.28 rows=6017728 width=32) (actual time=0.032..716.793 rows=10000000 loops=1)
Planning time: 0.068 ms
Execution time: 1671.482 ms
(5 rows)
test=# EXPLAIN ANALYZE SELECT array_agg(x) FROM foobar;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=119469.60..119469.61 rows=1 width=32) (actual time=2155.154..2155.154 rows=1 loops=1)
-> Seq Scan on foobar (cost=0.00..104425.28 rows=6017728 width=32) (actual time=0.031..717.831 rows=10000000 loops=1)
Planning time: 0.054 ms
Execution time: 2174.753 ms
(4 rows)
So, what's going on here. Why is array_agg, an internal function so much slower than the planner's SQL voodoo?
Using "PostgreSQL 9.5.5 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005, 64-bit"
Best Answer
There is nothing "old school" or "outdated" about an ARRAY constructor (That's what
ARRAY(SELECT x FROM foobar)
is). It's modern as ever. Use it for simple array aggregation.The manual:
The aggregate function
array_agg()
is more versatile in that it can be integrated in aSELECT
list with more columns, possibly more aggregations in the sameSELECT
, and arbitrary groups can be formed withGROUP BY
. While an ARRAY constructor can only return a single array from aSELECT
returning a single column.I did not study the source code, but it would seem obvious that a much more versatile tool is also more expensive.
One notable difference: the ARRAY constructor returns an empty array (
{}
) if no rows qualify.array_agg()
returnsNULL
for the same.