I have table with 30 mln records and I want to get data by some query very fast. I have these fields: id (int), field1 (int), field2 (int), field3 (int). So I want to query data by this condition
SELECT *
FROM mytable
WHERE field3 = 1
AND (field1 < field2 OR field2 < 0)
LIMIT 1
Query plan I got
explain (analyze, buffers) select * from mytable mt where (mt.field1 < mt.field2 OR mt.field2 < 0) and mt.field3 = 1 limit 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..0.37 rows=1 width=13) (actual time=525.050..525.051 rows=1 loops=1)
Buffers: shared hit=245 read=35166
-> Seq Scan on mytable mt (cost=0.00..359373.97 rows=983846 width=13) (actual time=525.048..525.048 rows=1 loops=1)
Filter: ((field3 = 1) AND ((field1 < field2) OR (field2 < 0)))
Rows Removed by Filter: 4006175
Buffers: shared hit=245 read=35166
Planning time: 0.163 ms
Execution time: 525.077 ms
I have index set on field3 but it used by postgres only with
SELECT *
FROM mytable
WHERE field3 = 1
LIMIT 1
I tried to set index on fields (field3, field1, field2), then on the (field3, field2, field1), but it only works without field1 < field2
.
Then I tried to set index only for condition field1 < field2
. Indexes (field1, field2) and (field2, field1) don't work too.
What index do I need to use it with condition field1 < field2
?
Update.
I add partial index by condition
CREATE INDEX idx_mytable_condition_1 ON mytable (field3) WHERE (field1 < field2 OR field2 < 0);
but anyway there's no using index, only seq scan
explain (analyze, buffers) select * from mytable mt where mt.field3 = 1 and (mt.field1 < mt.field2 OR mt.field2 < 0) limit 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..0.37 rows=1 width=37) (actual time=492.787..492.787 rows=1 loops=1)
Buffers: shared hit=277 read=35134
-> Seq Scan on mytable mt (cost=0.00..359373.97 rows=983846 width=37) (actual time=492.785..492.785 rows=1 loops=1)
Filter: ((field3 = 1) AND ((field1 < field2) OR (field2 < 0)))
Rows Removed by Filter: 4006175
Buffers: shared hit=277 read=35134
Planning time: 0.110 ms
Execution time: 492.806 ms
Best Answer
You have a problem with correlations of the expressions. The cases where
field3 = 1
are selectively depleted of cases where((field1 < field2) OR (field2 < 0))
. PostgreSQL thinks that those are independent, but they aren't. So it thinks you will very quickly find a qualifying row by doing a seq scan. But instead it had to dig through 4 million before finding the first one.A quirk of PostgreSQL is that it collects special statistics on functional indexes, but not on plain partial indexes. So if you add a dummy function to your partial index, you may get a better plan:
And then your query would have to use
abs(field3)=1
in the condition. Of course you can only do that if field3 is never negative, otherwise pick some other dummy function to use.