Postgresql – How to make this query more efficient

index-tuningoptimizationpostgresqlpostgresql-9.3

This is the query:

SELECT 
  races.*, tmptimers.last_start_time, tmplaps.updated_at AS last_updated_at
FROM 
  races
LEFT JOIN 
  (SELECT * FROM timers WHERE timers.user_id = 1) AS tmptimers ON tmptimers.race_id = races.id
LEFT JOIN
  (
  SELECT 
    race_id, 
    MAX(updated_at) AS updated_at 
  FROM
    (SELECT 
      race_id, updated_at
    FROM 
      laps 
    WHERE 
      user_id = 148
      AND updated_at > '2014-06-13'
    ) AS tmp
    GROUP BY 
    race_id
  ) AS tmplaps ON tmplaps.race_id = races.id
WHERE
  races.account_id = 5
  AND races.id IN (29, 30, 31, 32, 3, 2, 33, 1, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 
    22, 23, 24, 25, 26, 27, 28, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 
    55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 119, 154, 122, 82, 123, 156, 76, 73, 138, 70, 150, 135, 92, 143, 
    75, 72, 126, 113, 80, 83, 77, 94, 87, 129, 117, 68, 104, 134, 148, 152, 111, 120, 141, 69, 158, 96, 116, 114, 
    147, 78, 130, 139, 131, 90, 118, 132, 136, 67, 102, 84, 105, 101, 81, 153, 112, 137, 144, 71, 88, 89, 157, 107, 
    109, 79, 140, 97, 110, 106, 93, 142, 128, 108, 151, 95, 98, 121, 103, 149, 85, 124, 145, 99, 125, 146, 115, 133,
    100, 86, 91, 127, 155, 74)
ORDER BY
  last_updated_at DESC;

And here's the explain analyze for my (rather small) dataset:

   Sort Key: tmplaps.updated_at
   Sort Method: quicksort  Memory: 47kB
   ->  Nested Loop Left Join  (cost=211.16..286.25 rows=158 width=130) (actual time=3.854..4.248 rows=158 loops=1)
         Join Filter: (timers.race_id = races.id)
         ->  Hash Left Join  (cost=211.16..265.50 rows=158 width=122) (actual time=3.852..4.186 rows=158 loops=1)
               Hash Cond: (races.id = tmplaps.race_id)
               ->  Seq Scan on races  (cost=0.00..52.81 rows=158 width=114) (actual time=0.012..0.254 rows=158 loops=1)
                     Filter: ((account_id = 5) AND (id = ANY ('{29,30,31,32,3,2,33,1,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,119,154,122,82,123,156,76,73,138,70,150,135,92,143,75,72,126,113,80,83,77,94,87,129,117,68,104,134,148,152,111,120,141,69,158,96,116,114,147,78,130,139,131,90,118,132,136,67,102,84,105,101,81,153,112,137,144,71,88,89,157,107,109,79,140,97,110,106,93,142,128,108,151,95,98,121,103,149,85,124,145,99,125,146,115,133,100,86,91,127,155,74}'::integer[])))
                     Rows Removed by Filter: 3
               ->  Hash  (cost=209.96..209.96 rows=96 width=12) (actual time=3.833..3.833 rows=96 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 5kB
                     ->  Subquery Scan on tmplaps  (cost=208.03..209.96 rows=96 width=12) (actual time=3.768..3.807 rows=96 loops=1)
                           ->  HashAggregate  (cost=208.03..209.00 rows=96 width=12) (actual time=3.766..3.786 rows=96 loops=1)
                                 ->  Seq Scan on laps  (cost=0.00..182.03 rows=5201 width=12) (actual time=0.005..2.075 rows=5202 loops=1)
                                       Filter: ((updated_at > '2014-06-13 00:00:00'::timestamp without time zone) AND (user_id = 148))
         ->  Materialize  (cost=0.00..18.38 rows=1 width=12) (actual time=0.000..0.000 rows=0 loops=158)
               ->  Seq Scan on timers  (cost=0.00..18.38 rows=1 width=12) (actual time=0.001..0.001 rows=0 loops=1)
                     Filter: (user_id = 1)
 Total runtime: 4.494 ms

It seems to me that the most critical piece is the SeqScan on laps. Is there anything that can be done to minimize the time it takes?

Best Answer

You could try simplifying and using LATERAL for joining the laps table:

SELECT 
  races.*, tmptimers.last_start_time, tmplaps. last_updated_at
FROM 
  races
LEFT JOIN 
  timers AS tmptimers 
    ON  tmptimers.user_id = 1 
    AND tmptimers.race_id = races.id
LEFT JOIN LATERAL
  ( SELECT 
      updated_at AS last_updated_at
    FROM 
      laps 
    WHERE 
      user_id = 148
      AND updated_at > '2014-06-13'
      AND race_id = races.id
    ORDER BY 
      updated_at DESC
    LIMIT 1
 ) AS tmplaps
     ON TRUE  
WHERE
  races.account_id = 5
  AND races.id IN (29, 30, ..., 74)
ORDER BY
  last_updated_at DESC ;

I'm not sure what index will be best for the laps table though. Perhaps an index on (user_id, race_id, updated_at). I tested a similar query on SQL-Fiddle and it used the index. The plan showed:

Index Only Scan Backward using "index_name" on laps