Postgresql – Fastest way to choose distinct rows and a different order by without using a subquery

postgresqlpostgresql-performance

PostgresSQL – Join and get distinct left table rows only without gigantic subquery sort

We have a midsized database that has 50k rows of products and 33m rows of product prices at various store locations.

We join the products table to the access table with various where conditions / order by the suit the end user the best.

A simple explanation of a typical query would be 100 distinct products, sorted by highest price first, sorted by in stock first.

Seems simple enough… Everything I read online says to join the tables with a LEFT INNER JOIN (perhaps this is the wrong type of join to get unique rows?) on products_displayproductaccess in a sub query, then order the outer query by the order we're looking for. Problem is the performance is brutal… We partitioned the tables and it did help, but not significantly due the large sort size.

I'm surprised this is such an issue in Postgres (or perhaps other DB's as well) or the solution has just alluded me and the other upwork DBA's we've paid. It's just a join to get unique rows from the left table…

products_displayproduct

Table "public.products_displayproduct"
        Column         |           Type           | Collation | Nullable |                       Default                       | Storage  | Stats target | Description 
-----------------------+--------------------------+-----------+----------+-----------------------------------------------------+----------+--------------+-------------
 id                    | integer                  |           | not null | nextval('products_displayproduct_id_seq'::regclass) | plain    |              | 
 date_created          | timestamp with time zone |           | not null |                                                     | plain    |              | 
 date_updated          | timestamp with time zone |           | not null |                                                     | plain    |              | 
 name                  | character varying(255)   |           | not null |                                                     | extended |              | 
 sub_title             | character varying(255)   |           |          |                                                     | extended |              | 
 tags                  | character varying(255)   |           |          |                                                     | extended |              | 
 has_multiple_variants | boolean                  |           | not null |                                                     | plain    |              | 
 rating                | numeric(3,2)             |           | not null |                                                     | main     |              | 
 reviews               | integer                  |           | not null |                                                     | plain    |              | 
 is_toppick            | boolean                  |           | not null |                                                     | plain    |              | 
 search_index          | tsvector                 |           | not null |                                                     | extended |              | 
 brand_id              | integer                  |           |          |                                                     | plain    |              | 
 placement_id          | integer                  |           | not null |                                                     | plain    |              | 
 poster_image_id       | integer                  |           |          |                                                     | plain    |              | 
 last_amalgamation     | timestamp with time zone |           |          |                                                     | plain    |              | 
Indexes:
    "products_displayproduct_pkey" PRIMARY KEY, btree (id)
    "products_di_id_f3063a_idx" btree (id, placement_id)
    "products_di_search__a06b26_gist" gist (search_index)
    "products_displayproduct_brand_id_c215ec11" btree (brand_id)
    "products_displayproduct_is_toppick_3901660d" btree (is_toppick)
    "products_displayproduct_last_amalgamation_bcfdab2f" btree (last_amalgamation)
    "products_displayproduct_name_6e5686c5" btree (name)
    "products_displayproduct_name_6e5686c5_like" btree (name varchar_pattern_ops)
    "products_displayproduct_name_gist" gist (name gist_trgm_ops)
    "products_displayproduct_placement_id_90f0d4a2" btree (placement_id)
    "products_displayproduct_poster_image_id_7c188e4d" btree (poster_image_id)
    "products_displayproduct_rating_98794c10" btree (rating)
    "products_displayproduct_reviews_4d5513bb" btree (reviews)
    "products_displayproduct_sub_title_72354226" btree (sub_title)
    "products_displayproduct_sub_title_72354226_like" btree (sub_title varchar_pattern_ops)
    "products_displayproduct_tags_ab900c6b" btree (tags)
    "products_displayproduct_tags_ab900c6b_like" btree (tags varchar_pattern_ops)
Foreign-key constraints:
    "products_displayprod_brand_id_c215ec11_fk_products_" FOREIGN KEY (brand_id) REFERENCES products_displaybrand(id) DEFERRABLE INITIALLY DEFERRED
    "products_displayprod_placement_id_90f0d4a2_fk_retailers" FOREIGN KEY (placement_id) REFERENCES retailers_displaysubcategory(id) DEFERRABLE INITIALLY DEFERRED
    "products_displayprod_poster_image_id_7c188e4d_fk_products_" FOREIGN KEY (poster_image_id) REFERENCES products_displayproductimage(id) DEFERRABLE INITIALLY DEFERRED
Referenced by:
    TABLE "intake_intakeproduct" CONSTRAINT "intake_intakeproduct_displayed_as_id_479d7ed0_fk_products_" FOREIGN KEY (displayed_as_id) REFERENCES products_displayproduct(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "products_amalgamateddisplayproduct" CONSTRAINT "products_amalgamated_orig_product_id_df68bcd8_fk_products_" FOREIGN KEY (orig_product_id) REFERENCES products_displayproduct(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "products_displayproductvariant" CONSTRAINT "products_displayprod_product_id_210696e5_fk_products_" FOREIGN KEY (product_id) REFERENCES products_displayproduct(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "products_displayproductaccess" CONSTRAINT "products_displayprod_product_id_2a2032ac_fk_products_" FOREIGN KEY (product_id) REFERENCES products_displayproduct(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "reviews_displayproductreview" CONSTRAINT "reviews_displayprodu_product_id_1d59703e_fk_products_" FOREIGN KEY (product_id) REFERENCES products_displayproduct(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "search_searchquery" CONSTRAINT "search_searchquery_from_product_id_8f9dfd22_fk_products_" FOREIGN KEY (from_product_id) REFERENCES products_displayproduct(id) DEFERRABLE INITIALLY DEFERRED

products_displayproductaccess

       Column       |           Type           | Collation | Nullable |                          Default                          | Storage  | Stats target | Description 
--------------------+--------------------------+-----------+----------+-----------------------------------------------------------+----------+--------------+-------------
 id                 | integer                  |           | not null | nextval('products_displayproductaccess_id_seq'::regclass) | plain    |              | 
 date_created       | timestamp with time zone |           | not null |                                                           | plain    |              | 
 date_updated       | timestamp with time zone |           | not null |                                                           | plain    |              | 
 price              | numeric(16,2)            |           | not null |                                                           | main     |              | 
 quantity           | smallint                 |           | not null |                                                           | plain    |              | 
 department         | character varying(64)    |           |          |                                                           | extended |              | 
 aisle              | character varying(64)    |           |          |                                                           | extended |              | 
 original_price     | numeric(16,2)            |           |          |                                                           | main     |              | 
 is_onsale          | boolean                  |           | not null |                                                           | plain    |              | 
 is_instock         | boolean                  |           | not null |                                                           | plain    |              | 
 sale_until         | date                     |           |          |                                                           | plain    |              | 
 sale_percent       | numeric(5,2)             |           | not null |                                                           | main     |              | 
 sale_amount        | numeric(16,2)            |           | not null |                                                           | main     |              | 
 location_id        | integer                  |           | not null |                                                           | plain    |              | 
 product_variant_id | integer                  |           | not null |                                                           | plain    |              | 
 retailer_id        | integer                  |           | not null |                                                           | plain    |              | 
 product_id         | integer                  |           |          |                                                           | plain    |              | 
 location_intspace  | integer                  |           | not null |                                                           | plain    |              | 
Indexes:
    "products_displayproductaccess_pkey" PRIMARY KEY, btree (id)
    "pmtest" btree (location_intspace, location_id) INCLUDE (product_id, price, is_instock)
    "products_displayproductaccess_aisle_61457682" btree (aisle)
    "products_displayproductaccess_aisle_61457682_like" btree (aisle varchar_pattern_ops)
    "products_displayproductaccess_date_created_d38d9b4e" btree (date_created)
    "products_displayproductaccess_date_updated_0c689801" btree (date_updated)
    "products_displayproductaccess_department_a7031553" btree (department)
    "products_displayproductaccess_department_a7031553_like" btree (department varchar_pattern_ops)
    "products_displayproductaccess_idx_highest_priced" btree (product_id, price DESC, is_instock DESC)
    "products_displayproductaccess_is_instock_d5c2eef0" btree (is_instock)
    "products_displayproductaccess_is_onsale_bdc85585" btree (is_onsale)
    "products_displayproductaccess_location_id_fe5aeb43" btree (location_id)
    "products_displayproductaccess_original_price_904da60f" btree (original_price)
    "products_displayproductaccess_price_0e2f3286" btree (price)
    "products_displayproductaccess_product_id_2a2032ac" btree (product_id)
    "products_displayproductaccess_product_variant_id_ab7b9896" btree (product_variant_id)
    "products_displayproductaccess_quantity_72b1a17c" btree (quantity)
    "products_displayproductaccess_retailer_id_70041027" btree (retailer_id)
    "products_displayproductaccess_sale_amount_6cf01b14" btree (sale_amount)
    "products_displayproductaccess_sale_percent_5f469f10" btree (sale_percent)
    "products_displayproductaccess_sale_until_afb569bf" btree (sale_until)
Foreign-key constraints:
    "products_displayprod_location_id_fe5aeb43_fk_retailers" FOREIGN KEY (location_id) REFERENCES retailers_baselocation(id) DEFERRABLE INITIALLY DEFERRED
    "products_displayprod_product_id_2a2032ac_fk_products_" FOREIGN KEY (product_id) REFERENCES products_displayproduct(id) DEFERRABLE INITIALLY DEFERRED
    "products_displayprod_product_variant_id_ab7b9896_fk_products_" FOREIGN KEY (product_variant_id) REFERENCES products_displayproductvariant(id) DEFERRABLE INITIALLY DEFERRED
    "products_displayprod_retailer_id_70041027_fk_retailers" FOREIGN KEY (retailer_id) REFERENCES retailers_retailer(id) DEFERRABLE INITIALLY DEFERRED
Triggers:
    after_insert_products_displayproductaccess_trigger AFTER INSERT ON products_displayproductaccess FOR EACH ROW EXECUTE PROCEDURE products_displayproductaccess_delete_master()
    before_insert_products_displayproductaccess_trigger BEFORE INSERT ON products_displayproductaccess FOR EACH ROW EXECUTE PROCEDURE products_displayproductaccess_insert_child()
Child tables: products_displayproductaccess_745_745,
              products_displayproductaccess_746_746,
              products_displayproductaccess_747_747,
              products_displayproductaccess_748_748,
              products_displayproductaccess_749_749,
              products_displayproductaccess_750_750,
              products_displayproductaccess_751_751,
              products_displayproductaccess_752_752,
              products_displayproductaccess_753_753,
              products_displayproductaccess_754_754,
              products_displayproductaccess_755_755,
              products_displayproductaccess_756_756,
              products_displayproductaccess_757_757,
              products_displayproductaccess_758_758,
              products_displayproductaccess_759_759,
              products_displayproductaccess_760_760,
              products_displayproductaccess_761_761,
              products_displayproductaccess_762_762,
              products_displayproductaccess_763_763,
              products_displayproductaccess_764_764,
              products_displayproductaccess_765_765,
              products_displayproductaccess_766_766,
              products_displayproductaccess_767_767,
              products_displayproductaccess_768_768,
              products_displayproductaccess_769_769,
              products_displayproductaccess_770_770,
              products_displayproductaccess_771_771,
              products_displayproductaccess_772_772,
              products_displayproductaccess_773_773,
              products_displayproductaccess_774_774,
              products_displayproductaccess_775_775,
              products_displayproductaccess_776_776,
              products_displayproductaccess_777_777,
              products_displayproductaccess_778_778,
              products_displayproductaccess_779_779,
              products_displayproductaccess_780_780,
              products_displayproductaccess_781_781,
              products_displayproductaccess_782_782,
              products_displayproductaccess_783_783,
              products_displayproductaccess_784_784,
              products_displayproductaccess_785_785,
              products_displayproductaccess_786_786,
              products_displayproductaccess_787_787,
              products_displayproductaccess_788_788,
              products_displayproductaccess_789_789,
              products_displayproductaccess_790_790,
              products_displayproductaccess_791_791,
              products_displayproductaccess_792_792,
              products_displayproductaccess_793_793,
              products_displayproductaccess_794_794,
              products_displayproductaccess_795_795,
              products_displayproductaccess_796_796,
              products_displayproductaccess_797_797,
              products_displayproductaccess_798_798,
              products_displayproductaccess_799_799,
              products_displayproductaccess_801_801,
              products_displayproductaccess_802_802,
              products_displayproductaccess_803_803,
              products_displayproductaccess_805_805,
              products_displayproductaccess_806_806,
              products_displayproductaccess_807_807,
              products_displayproductaccess_808_808,
              products_displayproductaccess_809_809,
              products_displayproductaccess_810_810,
              products_displayproductaccess_811_811,
              products_displayproductaccess_812_812,
              products_displayproductaccess_814_814,
              products_displayproductaccess_816_816,
              products_displayproductaccess_818_818,
              products_displayproductaccess_819_819,
              products_displayproductaccess_821_821,
              products_displayproductaccess_822_822,
              products_displayproductaccess_824_824,
              products_displayproductaccess_827_827,
              products_displayproductaccess_837_837,
              products_displayproductaccess_849_849,
              products_displayproductaccess_859_859

Attempt #1 (right results – wrong execution time) – using subquery and distinct. The performance is brutal because postgres will find all the matching rows from the index first, then do a sort on those entries which is can be millions of rows

SELECT * FROM (
    SELECT 
      DISTINCT ON (
          "products_displayproductaccess"."product_id"
      ) 
      "products_displayproduct"."id", 
      "products_displayproduct"."date_created", 
      "products_displayproduct"."date_updated", 
      "products_displayproduct"."name", 
      "products_displayproduct"."sub_title", 
      "products_displayproduct"."tags", 
      "products_displayproduct"."has_multiple_variants", 
      "products_displayproduct"."placement_id", 
      "products_displayproduct"."brand_id", 
      "products_displayproduct"."poster_image_id", 
      "products_displayproduct"."rating", 
      "products_displayproduct"."reviews", 
      "products_displayproduct"."is_toppick", 
      "products_displayproduct"."last_amalgamation", 
      "products_displayproduct"."search_index",
      "products_displayproductaccess"."price",
      "products_displayproductaccess"."is_instock"
    FROM 
      "products_displayproduct" 
      INNER JOIN "products_displayproductaccess" ON (
        "products_displayproduct"."id" = "products_displayproductaccess"."product_id"
      ) 
    WHERE 
      (
        --"products_displayproduct"."placement_id" = 853
        "products_displayproductaccess"."location_intspace" IN (755)
        AND "products_displayproductaccess"."location_id" IN (
         55,
         60,
         61,
         62,
         64,
         65,
         68,
         69,
         72,
         78,
         79,
         81,
         84,
         85,
         89,
         92,
         97,
         99,
         101,
         102,
         113,
         120,
         121,
         130,
         149,
         187,
         210,
         234,
         597,
         599,
         601,
         602,
         603,
         605,
         606,
         607,
         608,
         610,
         611,
         613,
         632,
         633,
         634,
         635,
         636,
         637,
         865,
         870,
         900,
         910,
         950,
         951,
         952,
         953,
         954,
         955,
         999,
         1045,
         1046,
         1521,
         1522,
         1525,
         1526,
         1531,
         1532
        )
      ) 
    ORDER BY 
      "products_displayproductaccess"."product_id" ASC NULLS LAST,
      "products_displayproductaccess"."price" DESC NULLS FIRST,
      "products_displayproductaccess"."is_instock" DESC NULLS FIRST
) as subquery
ORDER BY subquery.price DESC NULLS FIRST, subquery.is_instock DESC NULLS FIRST
LIMIT 100

RESULTS – Subquery + Distinct

"Limit  (cost=272865.00..272865.25 rows=100 width=719) (actual time=1087.171..1087.184 rows=100 loops=1)"
"  ->  Sort  (cost=272865.00..272936.34 rows=28534 width=719) (actual time=1087.170..1087.175 rows=100 loops=1)"
"        Sort Key: subquery.price DESC, subquery.is_instock DESC"
"        Sort Method: top-N heapsort  Memory: 95kB"
"        ->  Subquery Scan on subquery  (cost=23.32..271774.45 rows=28534 width=719) (actual time=0.031..1067.073 rows=51147 loops=1)"
"              ->  Unique  (cost=23.32..271489.11 rows=28534 width=723) (actual time=0.029..1057.363 rows=51147 loops=1)"
"                    ->  Merge Join  (cost=23.32..266823.74 rows=1866147 width=723) (actual time=0.029..976.558 rows=1866146 loops=1)"
"                          Merge Cond: (products_displayproductaccess.product_id = products_displayproduct.id)"
"                          ->  Merge Append  (cost=0.56..240512.03 rows=1866147 width=11) (actual time=0.016..714.280 rows=1866146 loops=1)"
"                                Sort Key: products_displayproductaccess.product_id, products_displayproductaccess.price DESC, products_displayproductaccess.is_instock DESC"
"                                ->  Index Scan using products_displayproductaccess_idx_highest_priced on products_displayproductaccess  (cost=0.12..2.29 rows=1 width=11) (actual time=0.003..0.003 rows=0 loops=1)"
"                                      Filter: ((location_intspace = 755) AND (location_id = ANY ('{55,60,61,62,64,65,68,69,72,78,79,81,84,85,89,92,97,99,101,102,113,120,121,130,149,187,210,234,597,599,601,602,603,605,606,607,608,610,611,613,632,633,634,635,636,637,865,870,900,910,950,951,952,953,954,955,999,1045,1046,1521,1522,1525,1526,1531,1532}'::integer[])))"
"                                ->  Index Scan using test_1_idx on products_displayproductaccess_755_755  (cost=0.43..221848.26 rows=1866146 width=11) (actual time=0.013..627.617 rows=1866146 loops=1)"
"                                      Filter: ((location_intspace = 755) AND (location_id = ANY ('{55,60,61,62,64,65,68,69,72,78,79,81,84,85,89,92,97,99,101,102,113,120,121,130,149,187,210,234,597,599,601,602,603,605,606,607,608,610,611,613,632,633,634,635,636,637,865,870,900,910,950,951,952,953,954,955,999,1045,1046,1521,1522,1525,1526,1531,1532}'::integer[])))"
"                          ->  Index Scan using products_displayproduct_pkey on products_displayproduct  (cost=0.29..2877.23 rows=52045 width=712) (actual time=0.009..36.709 rows=52045 loops=1)"
"Planning Time: 17.100 ms"
"Execution Time: 1087.231 ms"

Attempt #2 (wrong results – excellent execution time) – no subquery with distinct. This is very close to the performance and results we're hoping for. Postgres can simply walk the combined index because it's in the exact same order as the order by statement. The problem is there will be duplicate rows when the price on the products_displayproductaccess table varies for example. I simply don't know how to wrap this in say a function that can take 50 rows at a time squash the duplicates and take another 50 until it satisfies the limit and offset properly.

    SELECT 
      DISTINCT ON (
          "products_displayproductaccess"."price",
          "products_displayproductaccess"."is_instock",
          "products_displayproductaccess"."product_id"
        ) 
      "products_displayproduct"."id", 
      "products_displayproduct"."date_created", 
      "products_displayproduct"."date_updated", 
      "products_displayproduct"."name", 
      "products_displayproduct"."sub_title", 
      "products_displayproduct"."tags", 
      "products_displayproduct"."has_multiple_variants", 
      "products_displayproduct"."placement_id", 
      "products_displayproduct"."brand_id", 
      "products_displayproduct"."poster_image_id", 
      "products_displayproduct"."rating", 
      "products_displayproduct"."reviews", 
      "products_displayproduct"."is_toppick", 
      "products_displayproduct"."last_amalgamation", 
      "products_displayproduct"."search_index" 
    FROM 
      "products_displayproduct" 
      INNER JOIN "products_displayproductaccess" ON (
        "products_displayproduct"."id" = "products_displayproductaccess"."product_id"
      ) 
    WHERE 
      (
        --"products_displayproduct"."placement_id" = 853
        "products_displayproductaccess"."location_intspace" IN (755) 
        AND "products_displayproductaccess"."location_id" IN (
         55,
         60,
         61,
         62,
         64,
         65,
         68,
         69,
         72,
         78,
         79,
         81,
         84,
         85,
         89,
         92,
         97,
         99,
         101,
         102,
         113,
         120,
         121,
         130,
         149,
         187,
         210,
         234,
         597,
         599,
         601,
         602,
         603,
         605,
         606,
         607,
         608,
         610,
         611,
         613,
         632,
         633,
         634,
         635,
         636,
         637,
         865,
         870,
         900,
         910,
         950,
         951,
         952,
         953,
         954,
         955,
         999,
         1045,
         1046,
         1521,
         1522,
         1525,
         1526,
         1531,
         1532
        )
      ) 
    ORDER BY 
      "products_displayproductaccess"."price" DESC NULLS FIRST,
      "products_displayproductaccess"."is_instock" DESC NULLS FIRST,
      "products_displayproductaccess"."product_id" ASC NULLS LAST
LIMIT 100

RESULTS – Walking the Index

"Limit  (cost=2.77..225.27 rows=100 width=723) (actual time=0.069..2.854 rows=100 loops=1)"
"  ->  Unique  (cost=2.77..830440.99 rows=373230 width=723) (actual time=0.069..2.845 rows=100 loops=1)"
"        ->  Nested Loop  (cost=2.77..816444.89 rows=1866147 width=723) (actual time=0.067..2.647 rows=1043 loops=1)"
"              ->  Merge Append  (cost=2.48..240507.76 rows=1866147 width=11) (actual time=0.055..1.105 rows=1043 loops=1)"
"                    Sort Key: products_displayproductaccess.price DESC, products_displayproductaccess.is_instock DESC, products_displayproductaccess.product_id"
"                    ->  Sort  (cost=2.04..2.05 rows=1 width=11) (actual time=0.005..0.006 rows=0 loops=1)"
"                          Sort Key: products_displayproductaccess.price DESC, products_displayproductaccess.is_instock DESC, products_displayproductaccess.product_id"
"                          Sort Method: quicksort  Memory: 25kB"
"                          ->  Index Scan using products_displayproductaccess_product_id_2a2032ac on products_displayproductaccess  (cost=0.12..2.03 rows=1 width=11) (actual time=0.003..0.003 rows=0 loops=1)"
"                                Filter: ((location_intspace = 755) AND (location_id = ANY ('{55,60,61,62,64,65,68,69,72,78,79,81,84,85,89,92,97,99,101,102,113,120,121,130,149,187,210,234,597,599,601,602,603,605,606,607,608,610,611,613,632,633,634,635,636,637,865,870,900,910,950,951,952,953,954,955,999,1045,1046,1521,1522,1525,1526,1531,1532}'::integer[])))"
"                    ->  Index Scan using test_2_idx on products_displayproductaccess_755_755  (cost=0.43..221844.23 rows=1866146 width=11) (actual time=0.049..1.037 rows=1043 loops=1)"
"                          Filter: ((location_intspace = 755) AND (location_id = ANY ('{55,60,61,62,64,65,68,69,72,78,79,81,84,85,89,92,97,99,101,102,113,120,121,130,149,187,210,234,597,599,601,602,603,605,606,607,608,610,611,613,632,633,634,635,636,637,865,870,900,910,950,951,952,953,954,955,999,1045,1046,1521,1522,1525,1526,1531,1532}'::integer[])))"
"              ->  Index Scan using products_displayproduct_pkey on products_displayproduct  (cost=0.29..0.31 rows=1 width=712) (actual time=0.001..0.001 rows=1 loops=1043)"
"                    Index Cond: (id = products_displayproductaccess.product_id)"
"Planning Time: 14.706 ms"
"Execution Time: 2.915 ms"

NIKITA's ANSWER (30k stack exchange char limit exceeded)
https://pastebin.com/SP5fQq9F

EDIT:

NIKITAS'S ANSWER #2 (updated)
Query: https://pastebin.com/raw/hGRVcPJb
Explain Analyze: https://pastebin.com/raw/iS1KxDsH

Best Answer

products_displayproduct doesn't seem to be so big, so I would aim for a hash join with an index only scan on products_displayproductaccess_755_755:

CREATE INDEX ON products_displayproductaccess (location_intspace, location_id)
   INCLUDE (price, is_instock, product_id);

VACUUM products_displayproductaccess_755_755;

For older versions of PostgreSQL that don't support the INCLUDE clause, add the columns to the index keys. For older PostgreSQL versions that don't support partitioned indexes, index the partition.