Postgres version
Using PostgreSQL 10.3.
Table definition
CREATE TABLE tickets (
id bigserial primary key,
state character varying,
closed timestamp
);
CREATE INDEX "state_index" ON "tickets" ("state")
WHERE ((state)::text = 'open'::text));
Cardinality
The table contains 1027616 rows, with 51533 of the rows having state = 'open'
and closed IS NULL
, or 5%.
A query with a condition on state
performs well using an index scan as expected:
explain analyze select * from tickets where state = 'open';
Index Scan using state_index on tickets (cost=0.29..16093.57 rows=36599 width=212) (actual time=0.025..22.875 rows=37346 loops=1)
Planning time: 0.212 ms
Execution time: 25.697 ms
I am trying to achieve the same or better performance for the query with the condition closed IS NULL
so that I can drop the state
column and rely on the closed
column for fetching the same rows. closed
is null
for the same rows where state = 'open'
, hence the state
column is redundant.
select * from tickets where closed IS NULL;
However, none of the indexes I've tried result in a single index scan like the first query. Below are the indexes I've tried along with the EXPLAIN ANALYZE
results.
A partial index:
CREATE INDEX "closed_index" ON "tickets" ("closed") WHERE (closed IS NULL)
explain analyze select * from tickets where closed IS NULL;
Bitmap Heap Scan on tickets (cost=604.22..38697.91 rows=36559 width=212) (actual time=12.879..48.780 rows=37348 loops=1)
Recheck Cond: (closed IS NULL)
Heap Blocks: exact=14757
-> Bitmap Index Scan on closed_index (cost=0.00..595.09 rows=36559 width=0) (actual time=7.585..7.585 rows=37348 loops=1)
Planning time: 4.831 ms
Execution time: 52.068 ms
An expression index:
CREATE INDEX "closed_index" ON "tickets" ((closed IS NULL))
explain analyze select * from tickets where closed IS NULL;
Seq Scan on tickets (cost=0.00..45228.26 rows=36559 width=212) (actual time=0.025..271.418 rows=37348 loops=1)
Filter: (closed IS NULL)
Rows Removed by Filter: 836578
Planning time: 7.992 ms
Execution time: 274.504 ms
A partial expression index:
CREATE INDEX "closed_index" ON "tickets" ((closed IS NULL))
WHERE (closed IS NULL);
explain analyze select * from tickets where closed IS NULL;
Bitmap Heap Scan on tickets (cost=604.22..38697.91 rows=36559 width=212) (actual time=177.109..238.008 rows=37348 loops=1)
Recheck Cond: (closed IS NULL)
Heap Blocks: exact=14757
-> Bitmap Index Scan on "closed_index" (cost=0.00..595.09 rows=36559 width=0) (actual time=174.598..174.598 rows=37348 loops=1)
Planning time: 23.063 ms
Execution time: 241.292 ms
UPDATED
Expanded table definition:
CREATE TABLE tickets (
id bigserial primary key,
state character varying,
closed timestamp,
created timestamp,
updated timestamp,
title character varying,
size integer NOT NULL,
comment_count integer NOT NULL
);
CREATE INDEX "state_index" ON "tickets" ("state")
WHERE ((state)::text = 'open'::text));
Cardinality:
The table contains 1027616 rows, with 51533 of the rows having state = 'open' and closed IS NULL, or 5%. As mentioned above, I am trying to drop the state
column so I want to be able to fetch the same rows using a condition on the closed
column instead.
Query with condition on state
column uses index scan.
explain analyze select id, title, created, closed, updated from tickets where state = 'open';
Index Scan using state_index on tickets (cost=0.29..22901.58 rows=49356 width=72) (actual time=0.107..49.599 rows=51533 loops=1)
Planning time: 0.511 ms
Execution time: 54.366 ms
I'd like the same performance (index scan, ideally) when switching to querying on the closed
column. I tried the partial index on id
and closed IS NULL
:
CREATE INDEX closed_index ON tickets (id) WHERE closed IS NULL;
VACUUM ANALYZE tickets;
explain analyze select id, title, created, closed, updated from tickets where closed IS NULL;
Bitmap Heap Scan on tickets (cost=811.96..33999.42 rows=49461 width=72) (actual time=7.868..47.080 rows=51537 loops=1)
Recheck Cond: (closed IS NULL)
Heap Blocks: exact=17479
-> Bitmap Index Scan on closed_index (cost=0.00..799.60 rows=49461 width=0) (actual time=4.868..4.868 rows=51537 loops=1)
Planning time: 0.222 ms
Execution time: 51.028 ms
Best Answer
Assuming the central piece of information:
is supposed to mean the same 15 % of all 1031584 rows meet both these conditions (all details matter!). Both conditions should perform equally - returning around 155k rows (!)
Your query plans show 37346 qualifying rows, ~ 3.6 % not 15 %. Something's still not right in your question.
With 3.6 %, indexes only start to make sense. Your tiny row size effectively occupies ~ 52 bytes per row, around 155 rows per page. That would be 5-6 hits per page for a completely random distribution. Postgres has to read all pages anyway and a sequential scan should be the fastest plan. Filtering the misses should be faster than involving indexes in any way.
Typically, qualifying rows are more or less clustered and the fewer data pages are involved the more sense it would make to involve an index. All bitmap index scans, though, I see hardly any case for an index scan. Even far less (as far as there can be much "less" than "hardly any") for the 15 % you claim.
For your updated numbers (~ 5% of all rows match) I would still rather expect bitmap index scans than index scans. A possible explanation: table bloat with lots of dead tuples. You mentioned a high write load. That would result in fewer and fewer live tuples per data page and favor index scans (as compared to bitmap index scans). You might re-test your initial query after a
VACUUM FULL ANALYZE
(if you can afford an exclusive lock on the table!). If my hypothesis holds, the physical table size would shrink substantially and you would then see a bitmap index scan instead of the index scan (and faster, too).You may need more aggressive
autovacuum
settings. See:Partial index
Your "expression index" and "partial expression index" are not useful. We don't need
closed IS NULL
as actual index expression (which is alwaystrue
here). The expression only adds cost and no gain.The first, plain partial index is the more useful variant. But don't use
closed
as index expression (again, alwaysNULL
here). Instead, use any column possibly useful to other queries, and ideally never updated to avoid additional cost and index bloat. The primary key columnid
is the natural candidate in absence of other useful applications:Or, if
id
is not useful, consider a constant instead:This makes the actual index column useless like in other dismissed variants - but it avoids all additional costs and dependencies. Related:
What I might try:
Updated to your updated question - only makes sense if you don't have many other writes on the the rows in question (The added columns
updated
andcomments_count
make me doubt it.)*Create a partial index with
id
and the other relevant columns (few & small) as index expressions, and capitalize on it with a suitable query to get index-only scans:*We don't need
SELECT *
,closed IS NULL
is given by theWHERE
clause. So we can use the tiny partial index in a fast index-only scan - assuming you meet the preconditions (that's why I threw inVACUUM
to update the visibility map up there). This is the rare case where queries reading more than ~ 5 % of all rows still happily use the index (even up to including the whole table).There seems to be redundancy in your design, simplifications should be possible.
This works since Postgres 9.6, quoting the release notes:
Or the information in your question is misleading.
Related:
If you don't see index-only scans with this, even immediately after running
VACUUM
, then high write load may be in the way and the visibility map just never reaches a state that would allow index-only scans. The manual. Or you have another problem in your DB keepingVACUUM
from doing its job. Related: