EDIT: docker image for testing with the same data (obfuscated)
$ docker run --rm --name pg -d homer5439/pgagg
$ docker exec -ti pg bash
# createdb -U postgres test; zcat /tmp/corr.sql.gz | psql -U postgres test
# psql -U postgres test
and you can run the examples below.
I was wondering why the following query takes more than 30 seconds on postgresql 12.2:
SELECT
contract_id,
array_agg(corr) AS corr
FROM
corr
GROUP BY contract_id;
More info: the corr
table has about 150000 records. Each record has the following structure:
gse=# \d corr ;
Table "public.corr"
Column | Type | Collation | Nullable | Default
-------------+---------+-----------+----------+-----------------------------------------------------------------
corr_id | integer | | not null | nextval('corr_corr_id_seq'::regclass)
contract_id | integer | | |
start_date | date | | |
corr_type | text | | |
descr | text | | |
credit | numeric | | |
detail | text | | |
Indexes:
"corr_pkey" PRIMARY KEY, btree (corr_id)
"corr_contract_id_idx" btree (contract_id)
The detail
field contains text up to 2/3 MB in length (about 10% of the records have detail that long; the others have some (~10-20) kilobytes).
The number of distinct contract_id
values is currently 2317.
Following other suggestions found on this site, I've tried changing the value of work_mem
up to 10GB without noticeable changes.
Here's the query plan:
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=9883.29..9911.57 rows=2263 width=36) (actual time=1184.971..1357.309 rows=2317 loops=1)
Output: contract_id, array_agg(corr.*)
Group Key: corr.contract_id
Buffers: shared hit=78012 read=49899
-> Seq Scan on public.corr (cost=0.00..9320.19 rows=112619 width=571 (actual time=0.057..959.359 rows=112619 loops=1)
Output: contract_id, corr.*
Buffers: shared hit=78012 read=49899
Planning Time: 0.131 ms
Execution Time: 1357.747 ms
Best Answer
When you running "alone", if you mean running it in "psql" but just without the EXPLAIN ANALYZE, then "psql" is reading the entire result set into memory, and going over that set deciding how long the longest entry is for each column so that it can format it to that length. That can be slow, though I would not expect 30 seconds for 2317 lines!
Except I see you don't have a "corr" column in your "corr" table, so what you are aggregating is the entire row records, which you already said has a column that can be very wide. So you are slinging a huge amount of data at psql, and it takes a long time to process it, plus maybe doing some swap.