Postgresql – Create index on multicolumn type

postgresql

Using Postgres 9.6

I am trying to create an index on a type adt_flds with a field also named adt_flds

 create type adt_flds as
(
      created_at timestamp
    , updated_at timestamp
    , created_by text
    , updated_by text
    , created_by_client_addr inet
    , updated_by_client_addr inet
    , created_by_client_cmptr text
    , updated_by_client_cmptr text
);

My attempt is as follows for

schema = document

table = training_complete_history

field = adt_flds

field column = created_at

SQL

create index training_complete_history_adt_flds_index
on document.training_complete_history ((adt_flds).created_at);

Unfortunately, my attempt yields the following error.

[42601] ERROR: syntax error at or near ")" Position: 114.

How would I properly index created_at inside adt_flds?

Best Answer

You need an extra pair of parentheses around the indexed term if it is not a column or a function call.

In the words of the documentation:

The expression usually must be written with surrounding parentheses, as shown in the syntax. However, the parentheses can be omitted if the expression has the form of a function call.

CREATE INDEX training_complete_history_adt_flds_index
ON document.training_complete_history (((adt_flds).created_at));

I think it is a bad idea to use a composite type if you want to use parts of it in a WHERE or join condition. It violates the first normal form.

Answer

Since you refer to the website use-the-index-luke.com, consider the chapter:

Use The Index, Luke › The Where Clause › Searching For Ranges › Greater, Less and BETWEEN

It has an example that matches your situation perfectly (two-column index, one is tested for equality, the other for range), explains (with more of those nice index graphics) why @ypercube's advice is accurate and sums it up:

Rule of thumb: index for equality first — then for ranges.

Also good for just one column?

What to do for queries on just one column seems to be clear. More details and benchmarks concerning that under these related question:

Less selective column first?

Apart from that, what if you have only equality conditions for both columns?

It doesn't matter. Put the column first that is more likely to receive conditions of its own, which actually matters.

Consider this demo, or reproduce it yourself. I create a simple table of two columns with 100k rows. One with very few, the other one with lots of distinct values:

CREATE TEMP TABLE t AS
SELECT (random() * 10000)::int AS lots
     , (random() * 4)::int     AS few
FROM generate_series (1, 100000);

DELETE FROM t WHERE random() > 0.9;  -- create some dead tuples, more "real-life"

ANALYZE t;

SELECT count(distinct lots)   -- 9999
     , count(distinct few)    --    5
FROM   t;

Query:

SELECT *
FROM   t
WHERE  lots = 2345
AND    few = 2;

EXPLAIN ANALYZE output (Best of 10 to exclude caching effects):

Seq Scan on t  (cost=0.00..5840.84 rows=2 width=8)
               (actual time=5.646..15.535 rows=2 loops=1)
  Filter: ((lots = 2345) AND (few = 2))
  Buffers: local hit=443
Total runtime: 15.557 ms

Add index, retest:

CREATE INDEX t_lf_idx ON t(lots, few);

Index Scan using t_lf_idx on t  (cost=0.00..3.76 rows=2 width=8)
                                (actual time=0.008..0.011 rows=2 loops=1)
  Index Cond: ((lots = 2345) AND (few = 2))
  Buffers: local hit=4
Total runtime: 0.027 ms

Add other index, retest:

DROP INDEX t_lf_idx;
CREATE INDEX t_fl_idx  ON t(few, lots);

Index Scan using t_fl_idx on t  (cost=0.00..3.74 rows=2 width=8)
                                (actual time=0.007..0.011 rows=2 loops=1)
  Index Cond: ((few = 2) AND (lots = 2345))
  Buffers: local hit=4
Total runtime: 0.027 ms

PostgreSQL – How to Optimize Query with Multicolumn Index

The expression lower(name::text) is not covered by your index, which also does not seem to be useful at all, at least for what we see in your question:


CREATE INDEX crm_customer_big_index 
    ON crm_customer (name ASC, 
                     name text_pattern_ops, 
                     (id::text) text_pattern_ops, 
                     phone text_pattern_ops, 
                     mail text_pattern_ops, 
                     rating);

This index would cover your query:

CREATE INDEX crm_customer_foo_index ON crm_customer (
  rating
, lower(name::text) text_pattern_ops
, address_id);  -- for the join

To optimize, I suggest to adapt your query additionally:

SELECT * 
FROM   customer_list
WHERE  lower(name::text) LIKE 'env%' 
AND    rating = 3 
ORDER  BY lower(name::text), name
LIMIT  20;

Also, if you are only interested in the first few characters of name character varying(100), make the index shorter and faster by just indexing those first few characters:

CREATE INDEX crm_customer_foo_index ON crm_customer (
  rating
, lower(left(name::text, 7)) text_pattern_ops  -- example with local optimum
, address_id);  -- for the join

The query must match the index expression:

WHERE  lower(left(name::text, 7)) LIKE 'env%' 
...
ORDER  BY lower(left(name::text, 7)), name

Not sure if the view is in the way somehow. But it should be transparent.

I wouldn't use varchar(n) at all. Just text:

Should I add an arbitrary length limit to VARCHAR columns?

About pattern matching and text_pattern_ops:

Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL

Why did I put rating first in the index?

Multicolumn index and performance

There are query techniques to better capitalize on non-leading index columns. Consider the Postgres Wiki on "Loose indexscan".

Best Answer

Related Solutions

PostgreSQL Index Tuning – Multicolumn Index and Performance

Answer

Also good for just one column?

Less selective column first?

PostgreSQL – How to Optimize Query with Multicolumn Index

Related Question