PostgreSQL Performance – Why Response Time Increases When Searching Big Tables

optimizationperformancepostgresqlpostgresql-10

I am new to PostgreSQL, especially when querying data in big tables. I have a table with almost 35 million rows. I have created a GIST index and also a tsvector for this table. All goes well until I run complex searches against the big table. For example, if I search for apple iphone then response is in m-seconds.

This is my query:

explain analyze select "SNR_SKU","SNR_Title","SNR_ModelNo","SNR_Brand",  
"SNR_UPC","SNR_Price","SNR_CustomerReviews","SNR_Available",  
"SNR_ProductURL","SNR_ImageURL","SNR_Description","SNR_isShow",  
"SNR_Date","SNR_Category","SNR_PriceBefore","SNR_Condition", 
"SNR_SubCategory" from products_allproducts 
Where "SNR_Title" ~* 'apple iphone ' limit 10;

…and here is the response:

          QUERY PLAN                                                                          
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..42.15 rows=10 width=453) (actual time=1.338..10.404 rows=10 loops=1)
   ->  Index Scan using snr_title_trigram_gist on products_allproducts  (cost=0.42..7186.55 rows=1722 width=453) (actual time=1.338..10.402 rows=10 loops=1)
         Index Cond: (("SNR_Title")::text ~* 'apple iphone '::text)
         Rows Removed by Index Recheck: 5
 Planning time: 1.365 ms
 Execution time: 10.455 ms
(6 rows)

If I run a tsvector query like this:

 explain analyze Select  
"SNR_SKU","SNR_Title","SNR_ModelNo","SNR_Brand","SNR_UPC","SNR_Available",
"SNR_ProductURL","SNR_ImageURL","SNR_Description","SNR_isShow","SNR_Date",
"SNR_Category","SNR_Condition","SNR_SubCategory","SNR_PriceBefore",
"SNR_CustomerReviews","SNR_Price"  from products_allproducts 
Where tsv_title @@ to_tsquery('apple & iphone')  LIMIT 36 OFFSET 0;

…then the response will look like this:

                                                                       QUERY PLAN                                                                    
    -------------------------------------------------------------------------------------------------------------------------------------------------
     Limit  (cost=1000.00..169207.11 rows=36 width=453) (actual time=55.726..427.097 rows=36 loops=1)
       ->  Gather  (cost=1000.00..4009936.05 rows=858 width=453) (actual time=55.725..427.090 rows=36 loops=1)
             Workers Planned: 2
             Workers Launched: 2
             ->  Parallel Seq Scan on products_allproducts  (cost=0.00..4008850.25 rows=358 width=453) (actual time=66.512..453.720 rows=13 loops=3)
                   Filter: (tsv_title @@ to_tsquery('apple & iphone'::text))
                   Rows Removed by Filter: 196177
     Planning time: 0.111 ms
     Execution time: 479.535 ms
    (9 rows)

The execution time decreases as I reduce the amount of keywords in the search query. But as soon as the amount of query words increases response becomes too slow.

Here another example using apple iphone x 256gb as the search value:

explain analyze Select "SNR_SKU","SNR_Title","SNR_ModelNo","SNR_Brand",
"SNR_UPC","SNR_Available","SNR_ProductURL","SNR_ImageURL","SNR_Description", 
"SNR_isShow","SNR_Date","SNR_Category","SNR_Condition","SNR_SubCategory",
"SNR_PriceBefore","SNR_CustomerReviews","SNR_Price" from  products_allproducts 
Where tsv_title @@ to_tsquery('apple & iphone & x & 256')  LIMIT 36 OFFSET 0;

The response will look like this:

                                                                    QUERY PLAN                                                                     
---------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1000.00..4009850.35 rows=1 width=453) (actual time=24985.309..24985.309 rows=0 loops=1)
   ->  Gather  (cost=1000.00..4009850.35 rows=1 width=453) (actual time=24985.308..24985.308 rows=0 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Parallel Seq Scan on products_allproducts  (cost=0.00..4008850.25 rows=1 width=453) (actual time=24981.731..24981.731 rows=0 loops=3)
               Filter: (tsv_title @@ to_tsquery('apple & iphone & x & 256'::text))
               Rows Removed by Filter: 5759185
 Planning time: 0.178 ms
 Execution time: 25003.873 ms
(9 rows)
  1. Can anybody suggest to me, which indexing technique will be good in my scenario where user will search using complex values?

Here is one more question.

  1. When user searches for an item which does not exist, PostgreSQL keeps on searching and takes a long time to respond. Is there any faster way to return results if no data exists for this query?

Any help will be highly appreciated.

Best Answer

It looks like you have created a GiST index using pg_trgm on "SNR_Title". This is the wrong operator type on the wrong column. This index will support LIKE (and some other) queries on the "SNR_Title" column; it will not support the other queries you are trying to do.

You need to build the index like this:

create index on products_allproducts using GIST (tsv_title)

You could use GIN rather than GiST; the tsvector @@ ts_query has operators for both index methods.