Postgresql – Improve performance in slow query in PostgreSQL

optimizationperformancepostgresqlquery-performance

How can I improve the following query, since (in my opinion) it takes more time than it should, and I think I'm missing something in terms of query optimization(the query takes around 1.0-1.4 seconds):

SELECT  
    DISTINCT a."adId",ct.distance AS distance,a.title,a.link,a.excerpt,a.employer,a."createdAt",a.location, count(*) OVER() AS counted 
FROM  
    ad a 
JOIN 
    ads_industries ai ON ai."adId" = a."adId"  
JOIN( 
    SELECT  
        ci."cityId", 
        earth_box( ll_to_earth( 33.97, -118.24 ), 32186.8) @> ll_to_earth(ci.lat, ci.lon) AS distance 
    FROM 
        city ci 
    WHERE  
        earth_box( ll_to_earth( 33.97, -118.24 ), 32186.8) @> ll_to_earth(ci.lat, ci.lon) 
) AS ct ON ct."cityId" = a."cityId" 
WHERE 
    ai."industryId" in( 28,18,31,33,24,2,29,32,22,14,25 ) AND 
    a."createdAt" BETWEEN '2017-11-14 00:00:00' AND '2017-11-18 22:24:51' 
ORDER BY distance DESC,a."createdAt" DESC  
LIMIT  10 OFFSET 0;

This is what explain analyze shows: https://explain.depesz.com/s/YRfO
enter image description here

ad table:

     Column     |              Type              |                      Modifiers                      | Storage  | Stats target | Description 
----------------+--------------------------------+-----------------------------------------------------+----------+--------------+-------------
 adId           | integer                        | not null default nextval('"ad_adId_seq"'::regclass) | plain    |              | 
 cityId         | integer                        | not null                                            | plain    |              | 
 title          | character varying(255)         | not null                                            | extended |              | 
 description    | text                           | not null                                            | extended |              | 
 link           | character varying(255)         | not null                                            | extended |              | 
 employer       | character varying(100)         | not null                                            | extended |              | 
 location       | character varying(100)         | not null                                            | extended |              | 
 excerpt        | character(191)                 | not null                                            | extended |              | 
 expirationDate | date                           | not null                                            | plain    |              | 
 createdAt      | timestamp(0) without time zone | not null                                            | plain    |              | 
 updatedAt      | timestamp(0) without time zone | not null                                            | plain    |              | 
 lexemsvector   | tsvector                       |                                                     | extended |              | 
Indexes:
    "ad_pkey" PRIMARY KEY, btree ("adId")
    "linkIdx" UNIQUE CONSTRAINT, btree (link)
    "cityIdIdx" btree ("cityId")
    "textsearchidx" gin ("cityId", lexemsvector)
Triggers:
    tsvectorupdate BEFORE INSERT OR UPDATE ON ad FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger('lexemsvector', 'pg_catalog.english', 'title', 'description')

Table has 1578966 records( 10 GB in size ).

ads_industries table:

**ads_industries** table:
   Column   |   Type   | Modifiers | Storage | Stats target | Description 
------------+----------+-----------+---------+--------------+-------------
 adId       | integer  | not null  | plain   |              | 
 industryId | smallint | not null  | plain   |              | 
Indexes:
    "adToIndustryIdx" btree ("industryId", "adId")

The table has 1782162 records( 62MB in size ).

city table:

  Column   |              Type              |                        Modifiers                        | Storage  | Stats target | Description 
-----------+--------------------------------+---------------------------------------------------------+----------+--------------+-------------
 cityId    | integer                        | not null default nextval('"city_cityId_seq"'::regclass) | plain    |              | 
 stateId   | smallint                       | not null                                                | plain    |              | 
 name      | character varying(50)          | not null                                                | extended |              | 
 zipCode   | character(5)                   | not null                                                | extended |              | 
 lat       | double precision               | not null default '0'::double precision                  | plain    |              | 
 lon       | double precision               | not null default '0'::double precision                  | plain    |              | 
 visible   | boolean                        | not null                                                | plain    |              | 
 createdAt | timestamp(0) without time zone | not null                                                | plain    |              | 
 updatedAt | timestamp(0) without time zone | not null                                                | plain    |              | 
Indexes:
    "city_pkey" PRIMARY KEY, btree ("cityId")
    "cityGeolocIdx" gist (ll_to_earth(lat, lon))
    "zipCodeIdx" btree ("zipCode")

The city table has 50203 records(5 MB in size).

As I understands from the EXPLAIN, the problem seems to be in the ads_industries join, despite using "Index Only Scan".

OS: Ubuntu 16.04.3 LTS
CPUs: 2
RAM: 4GB
PostgreSQL version 9.6.4 with default settings.

Best Answer

Based on this line from you plan:

Index Cond: ((ai."industryId" = ANY ('{28,18,31,33,24,2,29,32,22,14,25}'::integer[])) AND (ai."adId" = a."adId"))

I think having an index on ads_industries ("adId", "industryId") might help. Either in addition to, or instead of, the one with those columns reversed.

It is generally best to have the more specific column first, and a simple equality test is likely to be more specific than a =ANY test.