Postgresql – Postgres – how to avoid unnecessary comparisons inside a WHERE with a JOIN

countjoin;performancepostgresqlpostgresql-9.6query-performance

Background

We're running this query:

SELECT COUNT(*) AS "__count"
FROM "xfiler_document"
INNER JOIN "xfiler_tx" ON ("xfiler_document"."tx_id" = "xfiler_tx"."id")
WHERE (
    "xfiler_document"."added_on" <= '2015-06-28T23:59:59.999999-04:00'::timestamptz
    AND "xfiler_tx"."company_id" = 1
    AND "xfiler_document"."added_on" >= '2015-06-01T00:00:00-04:00'::timestamptz
)

The result of EXPLAIN ANALYZE is @ https://explain.depesz.com/s/voMo

The table is large (+30 million records), so we've run into a performance problem with a report we're running (this can be seen at the URL above; +25 second query).

Question

It seems to me, based on the analyze results, that the performance issue here is due to the fact that Postgres is comparing the added_on values (step 4), and then re-checking them for some reason (step 3) for every row from the query's added_on time frame, rather than only checking the 18,515 rows that apparently match the company_id comparison in step 7.

Firstly, am I understanding / explaining the performance issue correctly?

Secondly, is there a way to solve this (other than denormalizing company_id)?

Database

  • Postgres 9.6.1

xfiler_document

       Column        |            Type             |                          Modifiers                           
---------------------+-----------------------------+--------------------------------------------------------------
 id                  | integer                     | not null default nextval('xfiler_document_id_seq'::regclass)
 tx_id               | integer                     | not null
 doc                 | character varying(255)      | 
 added_on            | timestamp with time zone    | not null

Indexes:
    "xfiler_document_pkey" PRIMARY KEY, btree (id)
    "xfiler_document_tx_id" btree (tx_id)
    "xfiler_document_added_on" btree (added_on)
Foreign-key constraints:
    "xfiler_document_tx_id_fkey" FOREIGN KEY (tx_id) REFERENCES xfiler_tx(id) DEFERRABLE INITIALLY DEFERRED

xfiler_tx

           Column           |           Type           |                       Modifiers                        
----------------------------+--------------------------+--------------------------------------------------------
 id                         | integer                  | not null default nextval('xfiler_tx_id_seq'::regclass)
 name                       | character varying(255)   | not null
 company_id                 | integer                  | not null

Indexes:
    "xfiler_tx_pkey" PRIMARY KEY, btree (id)
    "xfiler_tx_company_id" btree (company_id)
    "xfiler_tx_name" btree (name)
Foreign-key constraints:
    "xfiler_tx_company_id_fkey" FOREIGN KEY (company_id) REFERENCES company_company(id) DEFERRABLE INITIALLY DEFERRED
Referenced by:
    TABLE "xfiler_document" CONSTRAINT "xfiler_document_tx_id_fkey" FOREIGN KEY (tx_id) REFERENCES xfiler_tx(id) DEFERRABLE INITIALLY DEFERRED

Best Answer

The plan entry Recheck Cond: indicates what conditions will be used for the recheck, should a recheck be necessary. This will always be listed, even if no recheck is actually necessary at run time. (You can tell this because the Recheck Cond shows up even if you do a regular EXPLAIN, not an EXPLAIN ANALYZE, so this is part of the planning process, not the execution process.)

In your situation, all Heap Blocks were exact, meaning no recheck was necessary.

The real explanation for the slowness is probably that the Bitmap Heap Scan needs to read a lot of data from disk in order to find the rows it needs. You could check this by turning track_io_timing on and running EXPLAIN (ANALYZE, BUFFERS).

If that is indeed the issue, one solution would be to CLUSTER xfiler_document ON xfiler_document_added_on so that the data for a given date range are closer together on disk.

If your data is on a RAID, then increasing effective_io_concurrency could help.