Postgresql – Which index shall I create in Postgres to speedup join of two big (several million rows) tables connected one-to-many

join;performancepostgresql

I have 2 tables, first one contains dated items (something like log records), second one – multiple attributes of those items.

First table:

CREATE TABLE items
(
  date date NOT NULL,
  point_id integer NOT NULL,
  /* several more columns */
);

Second table:

CREATE TABLE attributes
(
  attribute_id integer,
  value double precision NOT NULL,
  point_id integer,
  CONSTRAINT fk_point_id FOREIGN KEY (point_id)
      REFERENCES items (point_id) MATCH SIMPLE
      ON UPDATE RESTRICT ON DELETE CASCADE
)

So for single record in items there are 1..N records in attributes. For now able items has about 35M of rows, for each row there are 2 rows in attributes table – one with attribute_id = 0 and second with attribute_id = 1

I have separate btree indexes on all listed fields. And what is unclear to me is why following query leads to Seq Scan of attributes table:

EXPLAIN SELECT s.date, a1.value AS value
    FROM items s
    INNER JOIN attributes a1
        ON s.point_id = a1.point_id
    WHERE a1.attribute_id = 0
    AND s.date >= '2016-12-01'::DATE

Here is explain result:

Hash Join  (cost=14457.15..1529906.57 rows=310075 width=12)
  Hash Cond: (a1.point_id = s.point_id)
  ->  Seq Scan on attributes a1  (cost=0.00..1376027.10 rows=36352418 width=12)
        Filter: (attribute_id = 0)
  ->  Hash  (cost=10569.55..10569.55 rows=311008 width=8)
        ->  Index Scan using items_date_idx on items s  (cost=0.56..10569.55 rows=311008 width=8)
              Index Cond: (date >= '2016-12-01'::date)

I see that Postgres uses index to filter items table and get only needed rows – there are about 300K of them. But then why it iterates over whole 70M attributes table to compose hash map? As I have indexes on 'point_id' column in both tables, why not to use them to perform merge join of just 300K?

I tried to enable/disable different planner options, but of course it get worse. I understand that planner does his job very well, there is just something wrong in my schema and I need to help planner somehow. I've tried to solve the issue with adding composite index like 'date,point_id' – at least it helped in Mongo (after filtering by date you have sorted list of point_id and may perform merge join), but Postgres ignored those indexes and used simple ones.

So which index shall I create (or how shall I redesign my schema) to get provided query work faster and prevent seq scan?

Not sure how can it help, but in items table each row has larger point_id than previous and same or larger date. Also, I do not need exact date/time – only date part.

In the query above if I remove date check condition I get the merge join as expected:

Merge Join  (cost=73.17..4229922.45 rows=36352418 width=12)
  Merge Cond: (s.point_id = a1.point_id)
  ->  Index Scan using items_point_id_idx on items s  (cost=0.44..1143957.50 rows=36461804 width=8)
  ->  Index Scan using attributes_point_id_idx on attributes a1  (cost=0.57..2540470.32 rows=36352418 width=12)
        Filter: (attribute_id = 0)

I use postgres 9.5

Result of explain(analyze,verbose): https://explain.depesz.com/s/P1D

In theory there could be custom amount of attributes, it is defined in runtime.

There is an index on attribute_id. I think it is not used because half of table has attribute_id = 0 and another half has attribute_id = 1.

Best Answer

It is quite reasonable for the planner to think that scanning 70,000,000 rows in sequential order is going to be faster than reading 310,075 in a random order. If you tried disabling enable_seqscan and the query performance got worse, then it seems likely the planner is making the correct decision there. Could you post an EXPLAIN (ANALYZE) of what happens in that case?

If it did a merge join on the index attributes_point_id_idx like you suggest, than for each row it returns it would have to jump to some random part of the attributes table to check that that row has not yet been deleted, to check the attribute_id to see if it is 0, and to fetch value so it can be returned.

If you make an index on (point_id, attribute_id, value) and you keep the table vacuumed, then it might choose to use an index only scan (as the input to the merge join) as now it can get all the data it needs from the index, no randomly ordered visit to the table is needed.

It could be a challenge keeping the attributes table adequately vacuumed. The autovacuum system is only designed to reduce bloat. Keeping index-only-scans efficient is not yet part of the autovacuum design.