Postgresql – SELECT very slow when JOIN and LIMIT are specified but no/few results found

execution-planindexperformancepostgresqlpostgresql-performance

I am running into an issue where the query planner does not take into account how a JOIN effects the frequency of particular values.

For some context: I am storing sessions in an append only log where each session will have multiple entries, (with all data of the session) one for each update.

create table session(
  update_id                                     bigint                   not null
    constraint "PK_gameSession_update"
    primary key,
  event_id                                      bigint                   not null,
  event_type                                    varchar(255)             not null,
  time_in_millis                                timestamp with time zone not null,
  game_code                                     varchar(100)             not null,
  account_id                                    bigint,
  session_id                                    bigint                   not null,
  game_id                                       bigint                   not null,
  session_client_ref                            varchar(255)             not null,
  session_external_ref                          varchar(255)             not null,
  player_external_ref                           varchar(255),
  community_id                                  bigint                   not null,
  community_code                                varchar(255)             not null,
  device                                        text,
  type                                          varchar(50)              not null,
  country_code                                  varchar(10),
  game_after_state                              jsonb,
  accept_language                               char(5)                  not null,
  currency                                      varchar(255),
  status                                        varchar(50)              not null,
  provider_metadata                             jsonb,
  operator_code                                 varchar(255),
  jurisdiction                                  varchar(255),
  pending_round_id                              bigint,
  provider_managed_pending_round                boolean,
  time_started                                  timestamp with time zone not null,
  last_time_updated                             timestamp with time zone not null,
  action                                        varchar(30)              not null,
  seconds_to_idle                               bigint,
  provider_maximum_concurrent_sessions_per_game integer,
  referrer                                      text
);

create index "IDX_session_accept_language" on session (accept_language);
create index "IDX_session_account_id" on session (account_id);
create index "IDX_session_action" on session (action);
create index "IDX_session_community_code" on session (community_code);
create index "IDX_session_community_id" on session (community_id);
create index "IDX_session_country_code" on session (country_code);
create index "IDX_session_currency" on session (currency);
create index "IDX_session_game_after_state" on session (game_after_state);
create index "IDX_session_game_code" on session (game_code);
create index "IDX_session_game_id" on session (game_id);
create index "IDX_session_jurisdiction" on session (jurisdiction);
create index "IDX_session_last_time_updated" on session (last_time_updated);
create index "IDX_session_max_concurrent_sessions_per_game" on session (provider_maximum_concurrent_sessions_per_game);
create index "IDX_session_operator_code" on session (operator_code);
create index "IDX_session_pending_round_id" on session (pending_round_id);
create index "IDX_session_player_external_ref" on session (player_external_ref);
create index "IDX_session_provider_managed_pending_round" on session (provider_managed_pending_round);
create index "IDX_session_provider_metadata" on session (provider_metadata);
create index "IDX_session_referrer" on session (referrer);
create index "IDX_session_seconds_to_idle" on session (seconds_to_idle);
create index "IDX_session_session_client_ref" on session (session_client_ref);
create index "IDX_session_session_external_ref" on session (session_external_ref);
create index "IDX_session_status" on session (status);
create index "IDX_session_time_started" on session (time_started);
create index "IDX_session_type" on session (type);
create index "IDX_session_update_community_id" on session (update_id, community_id);
create index "IDX_session_update_partial_status_started" on session (update_id) where ((status) :: text = 'STARTED' :: text);
create index "IDX_session_session_id_player_external_ref"  on session (session_id, player_external_ref);
create index "IDX_session_session_id" on session (session_id);

To keep track of the "freshest" info I have a second table that keeps track of this.

create table sessionlatest
(
  session_id bigint                   not null
    constraint "PK_session_id_latest"
    primary key,
  update_id       bigint                   not null,
  event_id        bigint                   not null,
  time_in_millis  timestamp with time zone not null
);

create index "IDX_sessionlatest_update_id" on sessionlatest (update_id);

EXAMPLE
Given a session that was created, then updated twice and finally completed, the following data is generated where update_id shows the order in which the changes to the event occurred:

session table

+-----------+------------+-----------+
| update_id | session_id |  status   |
+-----------+------------+-----------+
|         1 |          1 | STARTED   |
|         2 |          1 | STARTED   |
|         3 |          1 | STARTED   |
|         4 |          1 | COMPLETED |
+-----------+------------+-----------+

seesionlatest table:

+------------+-----------+--+
| session_id | update_id |  |
+------------+-----------+--+
|          1 |         4 |  |
+------------+-----------+--+

This way I know that if I lookup update_id 4 of session 1, that will be the latest (correct) view of the session. But I can still lookup the history of its changes.

Cardinalities

~5 million rows in session

~97% of rows in session have status "STARTED"

~50K rows in sessionlatest

~10-50 rows on join session and sessionlatest (i.e. most sessions are no longer started)

My objective is to get the list of sessions that are currently "STARTED", i.e. their latest update has status "STARTED"

Postgres Version 9.6.10

Queries

Problem query:

EXPLAIN ANALYSE SELECT *
FROM session gs
join "public"."sessionlatest" gsl
  on gs."update_id" = gsl."update_id"
WHERE gs."status" = 'STARTED'
order by gs."session_id" desc
LIMIT 500;

Limit  (cost=0.85..37681.96 rows=500 width=909) (actual time=0.137..45960.666 rows=29 loops=1)
  ->  Nested Loop  (cost=0.85..2996779.86 rows=39765 width=909) (actual time=0.135..45960.644 rows=29 loops=1)
        ->  Index Scan Backward using "IDX_session_session_id" on session gs  (cost=0.43..808987.56 rows=4884873 width=869) (actual time=0.038..32928.897 rows=4848024 loops=1)
              Filter: ((status)::text = 'STARTED'::text)
              Rows Removed by Filter: 115172
        ->  Index Scan using "IDX_sessionlatest_update_id" on sessionlatest gsl  (cost=0.41..0.44 rows=1 width=32) (actual time=0.002..0.002 rows=0 loops=4848024)
              Index Cond: (update_id = gs.update_id)
Planning time: 1.108 ms
Execution time: 45960.839 ms

Query with no limit:

EXPLAIN ANALYSE SELECT *
FROM session gs
join "public"."sessionlatest" gsl
  on gs."update_id" = gsl."update_id"
WHERE gs."status" = 'STARTED'
order by gs."session_id" desc;

Sort  (cost=249561.70..249661.11 rows=39765 width=909) (actual time=145.306..145.317 rows=28 loops=1)
  Sort Key: gs.session_id DESC
  Sort Method: quicksort  Memory: 55kB
  ->  Nested Loop  (cost=0.43..230619.81 rows=39765 width=909) (actual time=69.713..145.205 rows=28 loops=1)
        ->  Seq Scan on sessionlatest gsl  (cost=0.00..743.23 rows=40723 width=32) (actual time=0.010..5.386 rows=40732 loops=1)
        ->  Index Scan using "IDX_session_update_partial_status_started" on session gs  (cost=0.43..5.63 rows=1 width=869) (actual time=0.003..0.003 rows=0 loops=40732)
              Index Cond: (update_id = gsl.update_id)
Planning time: 0.955 ms
Execution time: 145.431 ms

Query with no order:

EXPLAIN ANALYSE SELECT *
FROM session gs
join "public"."sessionlatest" gsl
  on gs."update_id" = gsl."update_id"
WHERE gs."status" = 'STARTED'
LIMIT 500;

Limit  (cost=0.43..2900.21 rows=500 width=901) (actual time=70.971..149.729 rows=28 loops=1)
  ->  Nested Loop  (cost=0.43..230619.81 rows=39765 width=901) (actual time=70.970..149.721 rows=28 loops=1)
        ->  Seq Scan on sessionlatest gsl  (cost=0.00..743.23 rows=40723 width=32) (actual time=0.011..5.655 rows=40732 loops=1)
        ->  Index Scan using "IDX_session_update_partial_status_started" on session gs  (cost=0.43..5.63 rows=1 width=869) (actual time=0.003..0.003 rows=0 loops=40732)
              Index Cond: (update_id = gsl.update_id)
Planning time: 0.874 ms
Execution time: 149.820 ms

In my use case I need both the ordering and the limit. After some research I am thinking that since the status "STARTED" is very common, the query planner is preferring to do a sequential scan while there is a limit (since it assumes that it will satisfy the limit within the first few rows). Which is not correct since after joining with the sessionlatest table the limit will never be satisfied.

I have tried adding a multi-column index but it was ignored.

CREATE INDEX idx_session_id_status ON public.session(session_id, status);

I have also tried a partial index but the improvement in performance was minimal.

CREATE INDEX idx_session_id_partial_status ON public.session(session_id) WHERE public.session.status='STARTED';

Limit  (cost=0.72..34319.23 rows=500 width=903)
  ->  Nested Loop  (cost=0.72..1145415.36 rows=16688 width=903)
        ->  Index Scan Backward using idx_session_id_partial_status on session gs  (cost=0.43..435253.15 rows=2252261 width=863)
        ->  Index Scan using "IDX_Sessionlatest_update_id" on sessionlatest gsl  (cost=0.29..0.31 rows=1 width=32)
              Index Cond: (update_id = gs.update_id)

Finally I have seen a similar question SELECT very slow when no results and a LIMIT is specified with the main difference being that my corner case is being created by joins rather than multiple filters (and I obviously cannot make an index across different tables)

Best Answer

There are several things that you can try:

  • Add the status in sessionlatest table. Since this table is like a "cache" of the latest session, it makes absolute sense to do so (@jjanus suggestion). The WHERE - ORDER BY - LIMIT would all involve columns from the same (and smaller) table, so a suitable index could be used effectively (and then the join to the big table would only have to find the relevant 500 rows).
  • I don't understand why you are joining on update_id and not session_id. This has not a good effect on the execution plan. You want to ORDER BY and LIMIT by session_id but the join is on another column. Perhaps you do that in order to get a single entry per session. The query could be rewritten to use session_id instead.
  • I would try to use LATERAL JOIN starting from the smaller table.

See this (using update_id or session_id or both):

select gs.*, gsl.*
from sessionlatest gsl
  join lateral
  ( select s.*
    from session s
    where s.session_id = gsl.session_id
      and s.update_id = gsl.update_id
    order by s.update_id desc
    LIMIT 1
  ) as gs
  on gs.status = 'STARTED'
order by gsl.session_id desc
limit 500 ;

I think the above would result in a better execution plan, if you had an index on session (session_id, update_id) but it's better if you tested with your tables (sizes, distribution, several things matter for performance).

See the dbfiddle.uk.