Postgresql – How to involve table index in filtering of 3 conditions

indexperformancepostgresqlpostgresql-9.6postgresql-performance

I have table with 30 mln records and I want to get data by some query very fast. I have these fields: id (int), field1 (int), field2 (int), field3 (int). So I want to query data by this condition

SELECT * 
FROM mytable 
WHERE field3 = 1 
AND (field1 < field2 OR field2 < 0) 
LIMIT 1

Query plan I got

explain (analyze, buffers) select * from mytable mt where (mt.field1 < mt.field2 OR mt.field2 < 0) and mt.field3 = 1 limit 1;
                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.37 rows=1 width=13) (actual time=525.050..525.051 rows=1 loops=1)
   Buffers: shared hit=245 read=35166
   ->  Seq Scan on mytable mt  (cost=0.00..359373.97 rows=983846 width=13) (actual time=525.048..525.048 rows=1 loops=1)
         Filter: ((field3 = 1) AND ((field1 < field2) OR (field2 < 0)))
         Rows Removed by Filter: 4006175
         Buffers: shared hit=245 read=35166
 Planning time: 0.163 ms
 Execution time: 525.077 ms

I have index set on field3 but it used by postgres only with

SELECT * 
FROM mytable 
WHERE field3 = 1 
LIMIT 1

I tried to set index on fields (field3, field1, field2), then on the (field3, field2, field1), but it only works without field1 < field2.

Then I tried to set index only for condition field1 < field2. Indexes (field1, field2) and (field2, field1) don't work too.

What index do I need to use it with condition field1 < field2?

Update.
I add partial index by condition

CREATE INDEX idx_mytable_condition_1 ON mytable (field3) WHERE (field1 < field2 OR field2 < 0);

but anyway there's no using index, only seq scan

explain (analyze, buffers) select * from mytable mt where mt.field3 = 1 and (mt.field1 < mt.field2 OR mt.field2 < 0) limit 1;
                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.37 rows=1 width=37) (actual time=492.787..492.787 rows=1 loops=1)
   Buffers: shared hit=277 read=35134
   ->  Seq Scan on mytable mt  (cost=0.00..359373.97 rows=983846 width=37) (actual time=492.785..492.785 rows=1 loops=1)
         Filter: ((field3 = 1) AND ((field1 < field2) OR (field2 < 0)))
         Rows Removed by Filter: 4006175
         Buffers: shared hit=277 read=35134
 Planning time: 0.110 ms
 Execution time: 492.806 ms

Best Answer

You have a problem with correlations of the expressions. The cases where field3 = 1 are selectively depleted of cases where ((field1 < field2) OR (field2 < 0)). PostgreSQL thinks that those are independent, but they aren't. So it thinks you will very quickly find a qualifying row by doing a seq scan. But instead it had to dig through 4 million before finding the first one.

A quirk of PostgreSQL is that it collects special statistics on functional indexes, but not on plain partial indexes. So if you add a dummy function to your partial index, you may get a better plan:

create index on mytable (abs(field3)) WHERE (field1 < field2 OR field2 < 0);

And then your query would have to use abs(field3)=1 in the condition. Of course you can only do that if field3 is never negative, otherwise pick some other dummy function to use.