I have a SQL optimization question.
Table Name: charts
Total Records: +250 million
Columns: timestamp(string), imported(boolean), ….
Indexes:
CREATE INDEX index_charts_on_timestamp ON public.charts
USING btree ("timestamp") WHERE (imported = false)
CREATE UNIQUE INDEX charts_pkey ON public.charts USING btree (id)
QUERY EXPLAIN
SELECT "charts"."id" FROM
"charts" WHERE
"charts"."imported" = FALSE AND
"charts"."timestamp" = '2018-01-03' ORDER BY
"charts"."id" ASC LIMIT 1000;
OUTPUT
Limit (cost=1000.59..237865.48 rows=1000 width=8)
-> Gather Merge (cost=1000.59..33262989.10 rows=140426 width=8)
Workers Planned: 2
-> Parallel Index Scan using charts_pkey on charts_pkey (cost=0.57..33245780.43 rows=58511 width=8)
Filter: ((NOT imported) AND (("timestamp")::text = '2018-01-03'::text))
(5 rows)
How can i make postgres to use index_charts_on_timestamp
instead?
I cannot present EXPLAIN ANALYZE
results, the command simply freezes without any results.
Best Answer
I suggest to improve the index to:
so it fits the query better, for both the
WHERE
andORDER BY
clauses.The plan will only need an index seek and a partial index scan (until it finds the required 1000 rows).
The current index does not hold the
id
values, so the whole index has to be read and then the id values retrieved from the table and sorted to get the first 1000.If the rows matching the
WHERE
conditions are not many (less than 1000 or less than 10k), then the sort operation is not very expensive but the values still have to be read as they are not in the index. If the matching rows are a lot more (say 100K or more), then both fetching and sorting them will be much more expensive.It might be cheaper in these cases to do a full table scan. The optimizer decided in your case to do something similar: a parallel index scan of the PK index. This avoids the need of sorting but it still has to read the rows from the table until it finds 1000 rows that match the
WHERE
condition.Both cases however would have no problem with the suggested index above, as the table would not have to be read at all and no sorting needed as the index order matches the
ORDER BY
.