Background
We're running this query:
SELECT COUNT(*) AS "__count"
FROM "xfiler_document"
INNER JOIN "xfiler_tx" ON ("xfiler_document"."tx_id" = "xfiler_tx"."id")
WHERE (
"xfiler_document"."added_on" <= '2015-06-28T23:59:59.999999-04:00'::timestamptz
AND "xfiler_tx"."company_id" = 1
AND "xfiler_document"."added_on" >= '2015-06-01T00:00:00-04:00'::timestamptz
)
The result of EXPLAIN ANALYZE
is @ https://explain.depesz.com/s/voMo
The table is large (+30 million records), so we've run into a performance problem with a report we're running (this can be seen at the URL above; +25 second query).
Question
It seems to me, based on the analyze results, that the performance issue here is due to the fact that Postgres is comparing the added_on
values (step 4), and then re-checking them for some reason (step 3) for every row from the query's added_on
time frame, rather than only checking the 18,515 rows that apparently match the company_id
comparison in step 7.
Firstly, am I understanding / explaining the performance issue correctly?
Secondly, is there a way to solve this (other than denormalizing company_id
)?
Database
- Postgres 9.6.1
xfiler_document
Column | Type | Modifiers
---------------------+-----------------------------+--------------------------------------------------------------
id | integer | not null default nextval('xfiler_document_id_seq'::regclass)
tx_id | integer | not null
doc | character varying(255) |
added_on | timestamp with time zone | not null
Indexes:
"xfiler_document_pkey" PRIMARY KEY, btree (id)
"xfiler_document_tx_id" btree (tx_id)
"xfiler_document_added_on" btree (added_on)
Foreign-key constraints:
"xfiler_document_tx_id_fkey" FOREIGN KEY (tx_id) REFERENCES xfiler_tx(id) DEFERRABLE INITIALLY DEFERRED
xfiler_tx
Column | Type | Modifiers
----------------------------+--------------------------+--------------------------------------------------------
id | integer | not null default nextval('xfiler_tx_id_seq'::regclass)
name | character varying(255) | not null
company_id | integer | not null
Indexes:
"xfiler_tx_pkey" PRIMARY KEY, btree (id)
"xfiler_tx_company_id" btree (company_id)
"xfiler_tx_name" btree (name)
Foreign-key constraints:
"xfiler_tx_company_id_fkey" FOREIGN KEY (company_id) REFERENCES company_company(id) DEFERRABLE INITIALLY DEFERRED
Referenced by:
TABLE "xfiler_document" CONSTRAINT "xfiler_document_tx_id_fkey" FOREIGN KEY (tx_id) REFERENCES xfiler_tx(id) DEFERRABLE INITIALLY DEFERRED
Best Answer
The plan entry
Recheck Cond:
indicates what conditions will be used for the recheck, should a recheck be necessary. This will always be listed, even if no recheck is actually necessary at run time. (You can tell this because theRecheck Cond
shows up even if you do a regular EXPLAIN, not an EXPLAIN ANALYZE, so this is part of the planning process, not the execution process.)In your situation, all Heap Blocks were exact, meaning no recheck was necessary.
The real explanation for the slowness is probably that the Bitmap Heap Scan needs to read a lot of data from disk in order to find the rows it needs. You could check this by turning
track_io_timing
on and running EXPLAIN (ANALYZE, BUFFERS).If that is indeed the issue, one solution would be to
CLUSTER xfiler_document ON xfiler_document_added_on
so that the data for a given date range are closer together on disk.If your data is on a RAID, then increasing
effective_io_concurrency
could help.