PostgreSQL – How to Optimize Query with Multicolumn Index

indexperformancepostgresqlpostgresql-performance

Currently, I have a view which is defined like this:

                       View "public.customer_list"
  Column   |          Type           | Modifiers | Storage  | Description 
-----------+-------------------------+-----------+----------+-------------
 id        | bigint                  |           | plain    | 
 name      | character varying(100)  |           | extended | 
 street    | character varying(100)  |           | extended | 
 zip       | character varying(10)   |           | extended | 
 city      | character varying(100)  |           | extended | 
 country   | character varying(3)    |           | extended | 
 phone     | character varying(100)  |           | extended | 
 mail      | character varying(100)  |           | extended | 
 rating    | integer                 |           | plain    | 
 salesnote | character varying(1800) |           | extended | 
View definition:
 SELECT c.id,
    c.name,
    a.street,
    a.zip,
    a.city,
    a.country,
    c.phone,
    c.mail,
    c.rating,
    c.salesnote
   FROM crm_customer c
     JOIN crm_address a ON a.id = c.address_id;

To better search through the List I created an index with the following:

CREATE INDEX crm_customer_big_index 
    ON crm_customer (name ASC, 
                     name text_pattern_ops, 
                     (id::text) text_pattern_ops, 
                     phone text_pattern_ops, 
                     mail text_pattern_ops, 
                     rating);

I search through this table with a varying where query mostly it looks like that:

SELECT * 
  FROM customer_list
 WHERE lower(name::text) LIKE 'env%' 
       AND rating = 3 
 ORDER BY name ASC 
 LIMIT 20 
 OFFSET 0;

But still my analyzer won't ever use an index. Is there a way to use one?

             QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=164.04..164.08 rows=15 width=129) (actual time=3.243..3.244 rows=1 loops=1)
   ->  Sort  (cost=164.04..164.08 rows=15 width=129) (actual time=3.243..3.243 rows=1 loops=1)
         Sort Key: c.name
         Sort Method: quicksort  Memory: 26kB
         ->  Nested Loop  (cost=0.28..163.75 rows=15 width=129) (actual time=0.020..3.225 rows=1 loops=1)
               ->  Seq Scan on crm_customer c  (cost=0.00..111.13 rows=15 width=98) (actual time=0.012..3.216 rows=1 loops=1)
                     Filter: ((rating = 3) AND (lower((name)::text) ~~ 'env%'::text))
                     Rows Removed by Filter: 2978
               ->  Index Only Scan using crm_address_search_index on crm_address a  (cost=0.28..3.50 rows=1 width=47) (actual time=0.006..0.006 rows=1 loops=1)
                     Index Cond: (id = c.address_id)
                     Heap Fetches: 0
 Planning time: 0.580 ms
 Execution time: 3.286 ms

Currently, the query is still "somewhat" fast. However, I could have way more data inside my table. Also, crm_address has an index over street, zip, city, country, which works fine.

Best Answer

The expression lower(name::text) is not covered by your index, which also does not seem to be useful at all, at least for what we see in your question:


CREATE INDEX crm_customer_big_index 
    ON crm_customer (name ASC, 
                     name text_pattern_ops, 
                     (id::text) text_pattern_ops, 
                     phone text_pattern_ops, 
                     mail text_pattern_ops, 
                     rating);

This index would cover your query:

CREATE INDEX crm_customer_foo_index ON crm_customer (
  rating
, lower(name::text) text_pattern_ops
, address_id);  -- for the join

To optimize, I suggest to adapt your query additionally:

SELECT * 
FROM   customer_list
WHERE  lower(name::text) LIKE 'env%' 
AND    rating = 3 
ORDER  BY lower(name::text), name
LIMIT  20;

Also, if you are only interested in the first few characters of name character varying(100), make the index shorter and faster by just indexing those first few characters:

CREATE INDEX crm_customer_foo_index ON crm_customer (
  rating
, lower(left(name::text, 7)) text_pattern_ops  -- example with local optimum
, address_id);  -- for the join

The query must match the index expression:

WHERE  lower(left(name::text, 7)) LIKE 'env%' 
...
ORDER  BY lower(left(name::text, 7)), name

Not sure if the view is in the way somehow. But it should be transparent.

I wouldn't use varchar(n) at all. Just text:

About pattern matching and text_pattern_ops:

Why did I put rating first in the index?

There are query techniques to better capitalize on non-leading index columns. Consider the Postgres Wiki on "Loose indexscan".