Postgresql – Optimize slow psql query

performancepostgresqlquery-performance

I have a psql query which is quite slow. It uses from 0.7 up to 5s to execute, which is way too long. I've tried to optimize using explain analyze and adding some indexes, but it hasn't helped enough (but without the index the query was exceptionally slow!) The DB is vacuumed on a regular basis, and totally the DB contains about 3 million rows, with several rows per id.

Here is the sql query:

WITH min_match AS(
    SELECT DISTINCT ON ("id") *
    FROM cars
    WHERE
      "publicationDate" >= NOW() - interval '250 days' AND
      "mileage" BETWEEN({mileage} - 30000) AND ({mileage} + 30000) AND 
      "model" = '{model}' AND
      "manufacturer" = '{manufacturer}' AND
      "modelYear" = {modelYear} AND
      "driveWheels" = '{driveWheels}' AND
      "fuelType" = '{fuelType}' AND
      "chassisType" = '{chassisType}' AND
      "registrationClass" = '{registrationClass}' AND
      "publisherType" = 'private' AND
      "vehicleTransmission" = '{vehicleTransmission}' AND
      "power" <= ({power} * 1.3) AND
      "power" >= ({power} * 0.7) AND
      "engineVolume" <= 5.0 AND
      "power" <= 500
    ORDER BY "id", "lastUpdateDate" DESC, "eventTime" DESC
),
full_match AS (
    SELECT * FROM min_match
    WHERE
     "engineVolume" = {engineVolume} AND
     "power" = {power} AND
     "numberOfSeats" = {numberOfSeats}
)
SELECT *
FROM
    full_match
        RIGHT JOIN min_match
        ON full_match."id" = min_match."id"
ORDER BY
  CASE WHEN (min_match."state" = 'activated') THEN CURRENT_DATE - min_match."publicationDate"
       WHEN (min_match."unPublicationDate" is null) THEN age(CURRENT_DATE) - interval '1 year'
       ELSE CURRENT_DATE - min_match."unPublicationDate"
  END ASC,
  (abs(min_match."power" - {power}) + abs(min_match."engineVolume" - {engineVolume}) + abs(min_match."numberOfSeats" - {numberOfSeats})),
  abs(min_match."mileage" - {mileage}),
  min_match."lastUpdateDate" DESC
LIMIT 30

I have tried several indexes, right now I have this one:

CREATE INDEX car_list_query_index
ON car_list_view ("publicationDate",
                  "mileage",
                  "model",
                  "manufacturer",
                  "modelYear",
                  "driveWheels",
                  "fuelType",
                  "chassisType",
                  "registrationClass",
                  "publisherType",
                  "power",
                  "engineVolume",
                  "vehicleTransmission",
                  "numberOfSeats");

From what I see in explain analyze this index is indeed used. It's the min_match part which is the slowest, but I'm not really sure how to be able to optimize this more. I've tried with a materialized view which also helped a little bit, but this requires some maintaning so I'd prefer to avoid using it.

Here's the output from explain analyze:

 Limit  (cost=417930.63..417930.64 rows=1 width=268) (actual time=706.714..706.714 rows=0 loops=1)
   CTE min_match
     ->  Unique  (cost=417930.46..417930.47 rows=1 width=1235) (actual time=706.684..706.684 rows=0 loops=1)
           ->  Sort  (cost=417930.46..417930.47 rows=1 width=1235) (actual time=706.683..706.683 rows=0 loops=1)
                 Sort Key: cars."id", cars."lastUpdateDate" DESC, cars."eventTime" DESC
                 Sort Method: quicksort  Memory: 25kB
                 ->  Index Scan using car_list_query_index on cars  (cost=0.56..417930.45 rows=1 width=1235) (actual time=706.665..706.665 rows=0 loops=1)
                       Index Cond: (("publicationDate" >= (now() - '250 days'::interval)) AND (mileage >= '122000'::double precision) AND (mileage <= '182000'::double precision) AND (model = 'qashqai +2'::text) AND (manufacturer = 'nissan'::text) AND ("modelYear" = 2011) AND ("driveWheels" = 'fwd'::drive_wheels) AND ("fuelType" = 'diesel'::fuel_type) AND ("chassisType" = 'mpv'::chassis_type) AND ("registrationClass" = 'passenger'::registration_class) AND ("publisherType" = 'private'::publisher_type) AND (power <= '143'::double precision) AND (power >= '77'::double precision) AND (power <= '500'::double precision) AND ("engineVolume" <= '5'::double precision) AND ("vehicleTransmission" = 'manual'::text))
   CTE full_match
     ->  CTE Scan on min_match min_match_1  (cost=0.00..0.03 rows=1 width=616) (never executed)
           Filter: (("engineVolume" = '1.5'::double precision) AND (power = '110'::double precision) AND ("numberOfSeats" = 7))
   ->  Sort  (cost=0.13..0.14 rows=1 width=268) (actual time=706.711..706.711 rows=0 loops=1)
         Sort Key: (CASE WHEN (min_match.state = 'activated'::ad_state) THEN ((('now'::cstring)::date)::timestamp without time zone - min_match."publicationDate") WHEN (min_match."unPublicationDate" IS NULL) THEN (age((('now'::cstring)::date)::timestamp with time zone, (('now'::cstring)::date)::timestamp with time zone) - '1 year'::interval) ELSE ((('now'::cstring)::date)::timestamp without time zone - min_match."unPublicationDate") END), (((abs((min_match.power - '110'::double precision)) + abs((min_match."engineVolume" - '1.5'::double precision))) + (abs((min_match."numberOfSeats" - 7)))::double precision)), (abs((min_match.mileage - '152000'::double precision))), min_match."lastUpdateDate" DESC
         Sort Method: quicksort  Memory: 25kB
         ->  Nested Loop Left Join  (cost=0.00..0.12 rows=1 width=268) (actual time=706.688..706.688 rows=0 loops=1)
               Join Filter: (full_match."id" = min_match."id")
               ->  CTE Scan on min_match  (cost=0.00..0.02 rows=1 width=240) (actual time=706.686..706.686 rows=0 loops=1)
               ->  CTE Scan on full_match  (cost=0.00..0.02 rows=1 width=32) (never executed)
 Planning time: 1.777 ms
 Execution time: 706.913 ms

Best Answer

Right now you have index scans. It's better than a table scan but a seek would be more ideal. I would move publicationdate and mileage down on the index list. You want the exact matches to seek first.

CREATE INDEX car_list_query_index
ON car_list_view (
                  "model",
                  "manufacturer",
                  "modelYear",
                  "driveWheels",
                  "fuelType",
                  "chassisType",
                  "registrationClass",
                  "publisherType",
                  "vehicleTransmission",
                 "publicationDate",
                  "mileage",
                  "power",
                  "engineVolume",
                  "numberOfSeats");