My question is why a multi-column index is not used for a index only scan, when a partial index with equivalent information (i think) is.
The table:
CREATE TABLE test
(
id INT,
descr TEXT,
flag BOOLEAN
);
INSERT INTO test
SELECT GENERATE_SERIES(1,100000) AS id,
MD5(RANDOM()::TEXT) AS descr,
(RANDOM() < 0.1) AS flag;
SELECT *
FROM test LIMIT 10;
A content sample:
id descr flag
1 81978ceb5514461fbad9af1152ad78f6 true
2 cc0aee68ba3e0095cc74d53e8da55fef false
3 689a76e5897d565638f8ddd2d2019b7a true
4 9df03bc2969a6af88cd1d6e0423d0f4c true
5 318983766d11f831e9f0df34606dc908 false
6 198102bb71640a16f28263b7fb56ba2e false
7 9bef7320389db46a8ad88ffa611e81b5 false
8 c1f0d637ee0a985aa7d768a78d2d97b1 false
9 781b4064f721ae3879d95579264b0aba false
10 c4582890bb1e9af430e0f36b50f5e88c false
The query I need to run is:
SELECT id
FROM test
WHERE flag;
Now if I use a partial index. The query (eventually) gets executed as a index only scan:
CREATE INDEX i1
ON test (id) WHERE flag;
QUERY PLAN
Index Only Scan using i1 on test (cost=0.29..354.95 rows=9911 width=4) (actual time=0.120..6.268 rows=9911 loops=1)
Heap Fetches: 9911
Buffers: shared hit=834 read=29
Planning time: 0.806 ms
Execution time: 6.922 ms
What I do not understand is, why a multi column index of the following form is never used for a index-only scan?
CREATE INDEX i2
ON test (flag, id);
QUERY PLAN
Bitmap Heap Scan on test (cost=189.10..1122.21 rows=9911 width=4) (actual time=0.767..5.986 rows=9911 loops=1)
Filter: flag
Heap Blocks: exact=834
Buffers: shared hit=863
-> Bitmap Index Scan on i2 (cost=0.00..186.62 rows=9911 width=0) (actual time=0.669..0.669 rows=9911 loops=1)
Index Cond: (flag = true)
Buffers: shared hit=29
Planning time: 0.090 ms
Execution time: 6.677 ms
Can't the query visit all consecutive leaves of the btree where flag is True
to determine all ids?
(Note that tuple visibility probably is not the problem since the index-only scan is used with the partial index i1
.)
My Postgres version is: PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.3.1-14ubuntu2) 5.3.1 20160413, 64-bit
Best Answer
Tuple visibility is the problem. It uses the bitmap index scan in the unvacuumed case because it thinks it will be faster, not because it is incapable of using the IOS. Once I do a
VACUUM test
on your exact example, it starts using the index only scan on indexi2
, because now it thinks that the IOS will be faster.As for why it doesn't use the bitmap scan for the partial index when the table is not vacuumed, that is a bit more complicated and is not about visibility. One of the benefits of bitmap scans (and in your case, the only benefit of them) is that it reads the table in physical order and visits each table block only once, regardless of the order of the index. PostgreSQL sees a perfect correlation between
id
and the order of the physical table rows, so for the index withid
as a leading column (the partial index) it think this benefit does not really matter, as it will read the table in physical order anyway even with a regular index scan. Since bitmap scans have slightly higher CPU cost, it won't do one when it offers no predicted IO advantage. The correlation between theflag
column and the physical table order is lower, and so it thinks bitmap index scans will get a benefit when using the index which hasflag
as its leading column, which happens to be the full index. It thinks this benefit outweighs the CPU cost. If the table was marked all-visible, then the benefits of an index-only-scan would tip the scales back again, but without vacuuming, that doesn't happen.Now in the case of the index on
(flag, id)
, the leading column is being tested for equality and the only rows selected will have the same value for flag. So what really matters is the correlation with the next column in the index,id
, not with theflag
column. But PostgreSQL isn't (apparently) clever enough to understand this. Now I am kind of curious how hard it would be to make it understand that.