Postgresql – Take advantage of monotonic columns in PostreSQL

execution-planperformancepostgresqlquery-performance

I have a PostgreSQL table with, among the other, two columns named col1 and col2, both of integer type (there are around 10M rows in the table). I want to perform SQL queries like:

SELECT * FROM table WHERE col1 >= val1 AND col2 <= val2;

(for certain val1 and val2 that I know a query time).

If I put btree indices on col1 and col2 PostgreSQL tries to execute the query performing an index scan on one of the two columns and then filtering on the other. This means that in most cases it has to sweep through around half of the table, even when the number of matching rows is very little. Adding a multicolumn index is useless, because PostgreSQL can effectively use it only when at least one of the two columns is tested for equality.

One important assumption that I can make on the values, though, is that the two columns are monotonic one respect to the other. This means that if in a row col1 is greater then or equal two col1 in another row, then the same relation is valid between the two corresponding col2 entries.

This means that in line of principle the query execution could be sped up by performing an index scan on one of the two columns, filtering on the other and stopping the execution as soon as a non matching value is found on the second column. In this case the query would read just exactly the rows to be returned.

Is there any way to setup indices or whatever other invariant in PostgreSQL so that the query planner is able to detect this?

(of course the problem can be easily solved performing two queries, the first one to translate the inequality on col2 to an inequality on col1; I am asking if there is a way to avoid this workaround and let PostgreSQL manage the mess by itself)

Best Answer

if in a row col1 is greater then or equal two col1 in another row, then the same relation is valid between the two corresponding col2 entries

In which case you can reformulate your query to look like:

SELECT * FROM table WHERE col2 >= val1 AND col2 <= val2;

because you can find the lower bound for col2 from the lower bound for col1, like this:

schema:

create schema stack;
set search_path=stack;
--
create table t(foo integer, bar integer);
insert into t(foo,bar) select 10*g, 20*g from generate_series(1,100000) g;
create index on t(foo);
create index on t(bar);

method:

explain analyse select min(foo) from t where foo>500; -- assuming val1=500
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                              QUERY PLAN                                                               │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Result  (cost=0.38..0.39 rows=1 width=0) (actual time=0.063..0.064 rows=1 loops=1)                                                    │
│   InitPlan 1 (returns $0)                                                                                                             │
│     ->  Limit  (cost=0.29..0.38 rows=1 width=4) (actual time=0.059..0.060 rows=1 loops=1)                                             │
│           ->  Index Only Scan using t_foo_idx on t  (cost=0.29..2803.63 rows=33167 width=4) (actual time=0.058..0.058 rows=1 loops=1) │
│                 Index Cond: ((foo IS NOT NULL) AND (foo > 500))                                                                       │
│                 Heap Fetches: 1                                                                                                       │
│ Total runtime: 0.087 ms                                                                                                               │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
select min(foo) from t where foo>500;
┌─────┐
│ min │
├─────┤
│ 510 │
└─────┘
select min(bar) from t where foo=510;
┌──────┐
│ min  │
├──────┤
│ 1020 │
└──────┘
explain analyse select * from t where bar>=1020 and bar<= 1100; -- assuming val2=1100
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                     QUERY PLAN                                                      │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Bitmap Heap Scan on t  (cost=13.42..485.00 rows=500 width=8) (actual time=0.011..0.013 rows=5 loops=1)              │
│   Recheck Cond: ((bar >= 1020) AND (bar <= 1100))                                                                   │
│   ->  Bitmap Index Scan on t_bar_idx  (cost=0.00..13.29 rows=500 width=0) (actual time=0.008..0.008 rows=5 loops=1) │
│         Index Cond: ((bar >= 1020) AND (bar <= 1100))                                                               │
│ Total runtime: 0.030 ms                                                                                             │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
select * from t where bar>=1020 and bar<= 1100;
┌─────┬──────┐
│ foo │ bar  │
├─────┼──────┤
│ 510 │ 1020 │
│ 520 │ 1040 │
│ 530 │ 1060 │
│ 540 │ 1080 │
│ 550 │ 1100 │
└─────┴──────┘

clean up:

drop schema stack cascade;