Postgresql – postgres explain plan with giant gaps between operations

explainoptimizationpostgresql

I have a postgresql explain plan that seems to have large gaps between the join, sort & filter processes and can't explain (sorry) it. The query is taking an inordinate amount of time and the planned version is very different from the actual version. I tried upping the work_mem to 1500MB for this session but it is still swapping to disk (is this because there's 2 huge(1GB) sorts)?

explain analyze SELECT DISTINCT ac.player_id, gvt.game_type,
 first_value(platform) over (partition by acs.start_ts, acs.account_id
 order by start_ts desc rows between unbounded preceding and unbounded following) platform,
 date_trunc('hour', ac.audit_ts),
 first_value(audit_ts) over
   (partition by platform, game_type, acs.account_id
  order by audit_ts desc rows between unbounded preceding and unbounded following)
 FROM audit_command ac
 JOIN table_definition td ON ac.table_id = td.table_id
 JOIN game_variation_template gvt ON td.game_variation_template_id = gvt.game_variation_template_id
 LEFT JOIN player_definition p ON ac.player_id = p.player_id
 LEFT JOIN account_session acs ON p.account_id = acs.account_id
 WHERE ac.command_type::text <> 'Leave'::text AND ac.command_type::text <> 'GetStatus'::text
 and start_ts < audit_ts and start_ts >= audit_ts - interval '1 day'
 AND ac.audit_ts between '2014-06-26T02:10:01.017+01:00' AND '2014-06-27T02:10:01.017+01:00'
;
                                    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate  (cost=4781863.47..4824152.70 rows=3383139 width=47) (actual time=1783393.417..1783412.946 rows=34251 loops=1)
 ->  WindowAgg  (cost=4646537.91..4739574.23 rows=3383139 width=47) (actual time=1735317.999..1772580.060 rows=14568155 loops=1)
   ->  Sort  (cost=4646537.91..4654995.75 rows=3383139 width=47) (actual time=1735317.969..1756145.084 rows=14568155 loops=1)
     Sort Key: acs.platform, gvt.game_type, acs.account_id, ac.audit_ts
     Sort Method: external merge  Disk: 1108544kB
     ->  WindowAgg  (cost=4211974.87..4279637.65 rows=3383139 width=47) (actual time=1125155.357..1140058.023 rows=14568155 loops=1)
       ->  Sort  (cost=4211974.87..4220432.72 rows=3383139 width=47) (actual time=1125141.693..1128195.716 rows=14568155 loops=1)
         Sort Key: acs.start_ts, acs.account_id
         Sort Method: external sort  Disk: 975392kB
         ->  Hash Join  (cost=1274416.28..3845074.61 rows=3383139 width=47) (actual time=1034359.154..1098242.110 rows=14568155 loops=1)
           Hash Cond: (td.game_variation_template_id = gvt.game_variation_template_id)
           ->  Hash Join  (cost=1274374.79..3798514.96 rows=3383139 width=42) (actual time=1034359.024..1091131.468 rows=14568155 loops=1)
             Hash Cond: (ac.table_id = td.table_id)
             ->  Hash Join  (cost=1217034.45..3665053.99 rows=3383139 width=46) (actual time=1033746.587..1082753.263 rows=14568157 loops=1)
               Hash Cond: (acs.account_id = lobby_user.account_id)
               Join Filter: ((acs.start_ts < ac.audit_ts) AND (acs.start_ts >= (ac.audit_ts - '1 day'::interval)))
               Rows Removed by Join Filter: 1953618923
               ->  Seq Scan on account_session acs  (cost=0.00..1507571.65 rows=46583265 width=22) (actual time=0.004..56952.205 rows=43264444 loops=1)
               ->  Hash  (cost=1150844.59..1150844.59 rows=5295189 width=31) (actual time=15666.924..15666.924 rows=3835505 loops=1)
                 Buckets: 1048576  Batches: 1  Memory Usage: 253959kB
                 ->  Hash Join  (cost=691441.16..1150844.59 rows=5295189 width=31) (actual time=7817.384..13276.084 rows=3835524 loops=1)
                   Hash Cond: (ac.player_id = lobby_user.player_id)
                   ->  Index Scan using audit_command_ts_idx on audit_command ac  (cost=0.57..340262.25 rows=5295189 width=24) (actual time=0.025..2697.823 rows=3835524 loops=1)
                     Index Cond: ((audit_ts >= '2014-06-26 02:10:01.017'::timestamp without time zone) AND (audit_ts <= '2014-06-27 02:10:01.017'::timestamp without time zone))
                     Filter: (((command_type)::text <> 'Leave'::text) AND ((command_type)::text <> 'GetStatus'::text))
                     Rows Removed by Filter: 16517
                   ->  Hash  (cost=590189.15..590189.15 rows=8100115 width=14) (actual time=7813.606..7813.606 rows=8300768 loops=1)
                     Buckets: 1048576  Batches: 1  Memory Usage: 377469kB
                     ->  Seq Scan on lobby_user  (cost=0.00..590189.15 rows=8100115 width=14) (actual time=0.020..3667.951 rows=8300768 loops=1)
             ->  Hash  (cost=32174.04..32174.04 rows=2013304 width=12) (actual time=611.500..611.500 rows=1952805 loops=1)
               Buckets: 262144  Batches: 1  Memory Usage: 84599kB
               ->  Seq Scan on table_definition td  (cost=0.00..32174.04 rows=2013304 width=12) (actual time=0.007..198.591 rows=1952805 loops=1)
           ->  Hash  (cost=39.55..39.55 rows=155 width=15) (actual time=0.116..0.116 rows=155 loops=1)
             Buckets: 1024  Batches: 1  Memory Usage: 8kB
             ->  Seq Scan on game_variation_template gvt  (cost=0.00..39.55 rows=155 width=15) (actual time=0.009..0.067 rows=155 loops=1)

I had a feeling it was an un-vacuumed audit_command table, but the auto-vacuum ran over the weekend and the table is kicking out:

analyze verbose audit_command;
INFO:  analyzing "public.audit_command"
INFO:  "audit_command": scanned 30000 of 15429435 pages, containing 861531 live rows and 39590 dead rows; 30000 rows in sample, 586507527 estimated total rows

can anyone tell me what the problem is and why it is so slow to execute (it is double the time it was previously)?

Best Answer

I'm not sure exactly what is causing your "gap", but I think your biggest performance bottleneck is:

Join Filter: ((acs.start_ts < ac.audit_ts) AND (acs.start_ts >= (ac.audit_ts - '1 day'::interval)))

leading to:

Rows Removed by Join Filter: 1953618923

Iterating over 2 billion rows is bound to take some time. That particular join is killing your query because it is joining everything based on:

Hash Cond: (acs.account_id = lobby_user.account_id)

regardless of date range, then excluding everything outside the date range in the join filter.

Since you mentioned that it was faster at some point in the past, it might also be worth checking for bloated indexes, if you haven't already; however I suspect this is mostly a side-effect of ever-increasing amounts of data in account_session.

Also, the fact that your 'distinct' is taking 14M rows down to 35K rows makes me think there might be a better way to get at the information you want (can any of your window functions be replaced by a group by with an aggregate function?).