I have a query where I want to fetch the first few rows from the table datasets ordered by the date_added column. The column that is sorted by is indexed, so the basic version of this table is very fast:
SELECT datasets.id FROM datasets ORDER BY date_added LIMIT 25
"Limit (cost=0.28..6.48 rows=25 width=12) (actual time=0.040..0.092 rows=25 loops=1)"
" -> Index Scan using datasets_date_added_idx2 on datasets (cost=0.28..1244.19 rows=5016 width=12) (actual time=0.037..0.086 rows=25 loops=1)"
"Planning time: 0.484 ms"
"Execution time: 0.139 ms"
But I have an issue once I make the query a bit more complicated. I want to join another table representing a many-to-many relationship and aggregate the results in an array column. To do that I need to add a GROUP BY id clause:
SELECT datasets.id FROM datasets GROUP BY datasets.id ORDER BY date_added LIMIT 25
"Limit (cost=551.41..551.47 rows=25 width=12) (actual time=9.926..9.931 rows=25 loops=1)"
" -> Sort (cost=551.41..563.95 rows=5016 width=12) (actual time=9.924..9.926 rows=25 loops=1)"
" Sort Key: date_added"
" Sort Method: top-N heapsort Memory: 26kB"
" -> HashAggregate (cost=359.70..409.86 rows=5016 width=12) (actual time=7.016..8.604 rows=5016 loops=1)"
" Group Key: datasets_id"
" -> Seq Scan on datasets (cost=0.00..347.16 rows=5016 width=12) (actual time=0.009..1.574 rows=5016 loops=1)"
"Planning time: 0.502 ms"
"Execution time: 10.235 ms"
Just by adding the GROUP BY clause the query now does a full scan of the datasets table instead of using the index on the date_added column like previously.
A simplified version of the actual query I want to do is the following:
SELECT
datasets.id,
array_remove(array_agg(other_table.some_column), NULL) AS other_table
FROM datasets
LEFT JOIN other_table
ON other_table.id = datasets.id
GROUP BY datasets.id
ORDER BY date_added
LIMIT 25
Why does the GROUP BY clause cause the index to be ignored and forces a full table scan? And is there a way to rewrite this query to get it to use the index on the column it is sorted by?
I'm using Postgres 9.5.4 on Windows, the table in question has currently 5000 rows, but it could have a few hundred thousand. I ran ANALYZE manually on both tables before the EXPLAIN ANALYZE.
Table definitions:
CREATE TABLE public.datasets
(
id integer NOT NULL DEFAULT nextval('datasets_id_seq'::regclass),
date_added timestamp with time zone,
...
CONSTRAINT datasets_pkey PRIMARY KEY (id)
)
CREATE TABLE public.other_table
(
id integer NOT NULL,
some_column integer NOT NULL,
CONSTRAINT other_table_pkey PRIMARY KEY (id, some_column)
)
The output of \d datasets
with irrelevant columns anonymized:
Table "public.datasets"
Column | Type | Modifiers
---------------------------------+--------------------------+------------------------------------------------------
id | integer | not null default nextval('datasets_id_seq'::regclass)
key | text |
date_added | timestamp with time zone |
date_last_modified | timestamp with time zone |
***** | integer |
******** | boolean | default false
***** | boolean | default false
*************** | integer |
********************* | integer |
********* | boolean | default false
******** | integer |
************ | integer |
************ | integer |
**************** | timestamp with time zone |
************ | text | default ''::text
***** | text |
******* | integer |
********* | integer |
********************** | text | default ''::text
******************* | text |
**************** | integer |
********************** | text | default ''::text
******************* | text | default ''::text
********** | integer |
*********** | text |
*********** | text |
********************** | integer |
******************************* | text | default ''::text
************************ | text | default ''::text
*********** | integer | default 0
************* | text |
******************* | integer |
**************** | integer | default 0
*************** | text |
************** | text |
Indexes:
"datasets_pkey" PRIMARY KEY, btree (id)
"datasets_date_added_idx" btree (date_added)
"datasets_*_idx" btree (*)
"datasets_*_idx" btree (*)
"datasets_*_idx" btree (*)
"datasets_*_idx" btree (*)
"datasets_*_idx" btree (*)
"datasets_*_idx1" btree (*)
"datasets_*_idx" btree (*)
Best Answer
The problem is that your 2nd query:
does not mean what you expect. It does give you the first 25 rows ordered by
date_added
only because theid
is the primary key of the table, so theGROUP BY
can be removed without changing the result.It seems however that the optimizer does not always remove the redundant
GROUP BY
and thus it produces a different plan. I'm not sure why - the various features of the optimizer that do these simplifications are far from covering all cases.You might get a better plan if you change the query to have matching
GROUP BY
andORDER BY
clauses:But in any case, my advice would be "don't use redundant / complicated syntax when there is a simpler one".
Now for the 3rd query, with the join, while the
GROUP BY
method is working, you can rewrite it by using standard SQL window functions (ROW_NUMBER()
) or PostgresDISTINCT ON
or by joining to a derived table (which uses your very first query!, with minor details changed):We could also avoid
GROUP BY
completely (well, it's hidden in the inline subquery):Both queries are written so that the plan produced will do the (fast) limit subquery first and then the join, avoiding a full table scan of either table.
If you need aggregate from more columns, a third method combines both of the above, using a correlated (
LATERAL
) subquery in theFROM
clause: