PostgreSQL 12 – Using both Similarity an ts_query at the same time, better performance

postgresqlpostgresql-performance

Running a query with similarity or ts_query only or running them AND'ded together, the performance is good but the results aren't optimal for real time search.

We use both similarity and ts_query in combination to produce desirable results as with ts_query one character off and you get zero results. Similarity is good but produces odd results when words are simply present in others.

There are only 60k rows in the table, so scaling this up seems terrifying for a product where search is important.

Query in question (with OR)

    SELECT 
  ts_rank("products_displayproduct"."search_index", to_tsquery('pg_catalog.english', 'WowWee:*'), 8) AS "rank", 
  ("products_displayproduct"."name" <-> 'WowWee') AS "similarity",
  "products_displayproduct"."id", 
  "products_displayproduct"."date_created", 
  "products_displayproduct"."date_updated", 
  "products_displayproduct"."name", 
  "products_displayproduct"."sub_title", 
  "products_displayproduct"."tags", 
  "products_displayproduct"."has_multiple_variants", 
  "products_displayproduct"."placement_id", 
  "products_displayproduct"."brand_id", 
  "products_displayproduct"."poster_image_id", 
  "products_displayproduct"."rating", 
  "products_displayproduct"."reviews", 
  "products_displayproduct"."is_toppick", 
  "products_displayproduct"."last_amalgamation", 
  "products_displayproduct"."search_index"
FROM 
  "products_displayproduct" 
  INNER JOIN "products_displayproductaccess" ON (
    "products_displayproduct"."id" = "products_displayproductaccess"."product_id"
  ) 
WHERE 
 "products_displayproductaccess"."location_intspace" IN (758, 759) 
  AND "products_displayproductaccess"."location_id" IN (1, 2, 3, 4, 5, 1511, 940, 931, 660, 661) 
  AND (
    ("products_displayproduct"."search_index") @@ (to_tsquery('pg_catalog.english', 'WowWee:*'))
    OR "products_displayproduct"."name" % 'WowWee'
  )
ORDER BY 
  "rank" DESC, 
  "products_displayproduct"."id" ASC
LIMIT 100

EXPLAIN ANALYZE (with OR)

Limit  (cost=19376.51..19401.51 rows=100 width=720) (actual time=586.733..586.741 rows=51 loops=1)
  ->  Sort  (cost=19376.51..19535.76 rows=637 width=720) (actual time=586.730..586.734 rows=51 loops=1)
        Sort Key: (ts_rank(products_displayproduct.search_index, '''wowwe'':*'::tsquery, 8)) DESC, products_displayproduct.id
        Sort Method: quicksort  Memory: 39kB
        ->  Nested Loop  (cost=132.72..16941.94 rows=637 width=720) (actual time=141.764..586.618 rows=51 loops=1)
              ->  Bitmap Heap Scan on products_displayproduct  (cost=90.22..250.21 rows=103 width=712) (actual time=141.562..585.014 rows=51 loops=1)
                    Recheck Cond: ((search_index @@ '''wowwe'':*'::tsquery) OR ((name)::text % 'WowWee'::text))
                    Rows Removed by Index Recheck: 51994
                    Heap Blocks: exact=1554
                    ->  BitmapOr  (cost=90.22..90.22 rows=103 width=0) (actual time=49.036..49.036 rows=0 loops=1)
                          ->  Bitmap Index Scan on products_di_search__a06b26_gist  (cost=0.00..41.86 rows=51 width=0) (actual time=12.531..12.531 rows=52055 loops=1)
                                Index Cond: (search_index @@ '''wowwe'':*'::tsquery)
                          ->  Bitmap Index Scan on products_displayproduct_name_gist  (cost=0.00..43.21 rows=52 width=0) (actual time=36.502..36.502 rows=2 loops=1)
                                Index Cond: ((name)::text % 'WowWee'::text)
              ->  Append  (cost=42.50..158.89 rows=7 width=4) (actual time=0.015..0.020 rows=1 loops=51)
                    ->  Index Scan using products_displayproductaccess_758_product_id_idx on products_displayproductaccess_758  (cost=42.50..85.14 rows=5 width=4) (actual time=0.014..0.015 rows=1 loops=51)
                          Index Cond: (product_id = products_displayproduct.id)
                          Filter: ((location_intspace = ANY ('{758,759}'::integer[])) AND (location_id = ANY ('{1,2,3,4,5,1511,940,931,660,661}'::integer[])))
                          Rows Removed by Filter: 3
                    ->  Index Scan using products_displayproductaccess_759_product_id_idx on products_displayproductaccess_759  (cost=42.25..73.72 rows=2 width=4) (actual time=0.003..0.003 rows=0 loops=51)
                          Index Cond: (product_id = products_displayproduct.id)
                          Filter: ((location_intspace = ANY ('{758,759}'::integer[])) AND (location_id = ANY ('{1,2,3,4,5,1511,940,931,660,661}'::integer[])))
Planning Time: 3.988 ms
Execution Time: 586.878 ms

Query in question (with AND)

    SELECT 
  ts_rank("products_displayproduct"."search_index", to_tsquery('pg_catalog.english', 'WowWee:*'), 8) AS "rank", 
  ("products_displayproduct"."name" <-> 'WowWee') AS "similarity",
  "products_displayproduct"."id", 
  "products_displayproduct"."date_created", 
  "products_displayproduct"."date_updated", 
  "products_displayproduct"."name", 
  "products_displayproduct"."sub_title", 
  "products_displayproduct"."tags", 
  "products_displayproduct"."has_multiple_variants", 
  "products_displayproduct"."placement_id", 
  "products_displayproduct"."brand_id", 
  "products_displayproduct"."poster_image_id", 
  "products_displayproduct"."rating", 
  "products_displayproduct"."reviews", 
  "products_displayproduct"."is_toppick", 
  "products_displayproduct"."last_amalgamation", 
  "products_displayproduct"."search_index"
FROM 
  "products_displayproduct" 
  INNER JOIN "products_displayproductaccess" ON (
    "products_displayproduct"."id" = "products_displayproductaccess"."product_id"
  ) 
WHERE 
 "products_displayproductaccess"."location_intspace" IN (758, 759) 
  AND "products_displayproductaccess"."location_id" IN (1, 2, 3, 4, 5, 1511, 940, 931, 660, 661) 
  AND (
    ("products_displayproduct"."search_index") @@ (to_tsquery('pg_catalog.english', 'WowWee:*'))
    AND "products_displayproduct"."name" % 'WowWee'
  )
ORDER BY 
  "rank" DESC, 
  "products_displayproduct"."id" ASC
LIMIT 100

EXPLAIN ANALYZE (with AND)

 Limit  (cost=283.32..284.82 rows=6 width=720) (actual time=43.936..43.937 rows=2 loops=1)
      ->  Sort  (cost=283.32..284.82 rows=6 width=720) (actual time=43.935..43.935 rows=2 loops=1)
            Sort Key: (ts_rank(products_displayproduct.search_index, '''wowwe'':*'::tsquery, 8)) DESC, products_displayproduct.id
            Sort Method: quicksort  Memory: 25kB
            ->  Nested Loop  (cost=70.25..275.57 rows=6 width=720) (actual time=4.057..43.921 rows=2 loops=1)
                  ->  Index Scan using products_di_search__a06b26_gist on products_displayproduct  (cost=27.75..111.22 rows=1 width=712) (actual time=4.025..43.858 rows=2 loops=1)
                        Index Cond: (search_index @@ '''wowwe'':*'::tsquery)
                        Rows Removed by Index Recheck: 51994
                        Filter: ((name)::text % 'WowWee'::text)
                        Rows Removed by Filter: 49
                  ->  Append  (cost=42.50..161.28 rows=7 width=4) (actual time=0.015..0.022 rows=1 loops=2)
                        ->  Index Scan using products_displayproductaccess_758_product_id_idx on products_displayproductaccess_758  (cost=42.50..86.39 rows=5 width=4) (actual time=0.014..0.015 rows=1 loops=2)
                              Index Cond: (product_id = products_displayproduct.id)
                              Filter: ((location_intspace = ANY ('{758,759}'::integer[])) AND (location_id = ANY ('{1,2,3,4,5,1511,940,931,660,661}'::integer[])))
                              Rows Removed by Filter: 3
                        ->  Index Scan using products_displayproductaccess_759_product_id_idx on products_displayproductaccess_759  (cost=42.25..74.86 rows=2 width=4) (actual time=0.005..0.005 rows=0 loops=2)
                              Index Cond: (product_id = products_displayproduct.id)
                              Filter: ((location_intspace = ANY ('{758,759}'::integer[])) AND (location_id = ANY ('{1,2,3,4,5,1511,940,931,660,661}'::integer[])))
    Planning Time: 1.793 ms
    Execution Time: 43.991 ms

Best Answer

Use a GIN index for full text search, that usually performs better. You are killed by the many false positive results of the index scan.