I have a table with a multicolumn index, and I have doubts about the proper sorting of the indexes to get the maximum performance on the queries.
The scenario:
-
PostgreSQL 8.4, table with about one million rows
-
Values in column c1 can have about 100 different values. We can assume the values are evenly distributed, so we have about 10000 rows for every possible value.
-
Column c2 can have 1000 different values. We have 1000 rows for every possible value.
When searching data, the condition always includes values for these two columns, so the table has a multicolumn index combining c1 and c2. I have read about the importance of properly ordering the columns in a multicolumn index if you have queries using just one column for filtering. This is not the case in our scenario.
My question is this one:
Given the fact that one of the filters selects a much smaller set of data, could I improve performance if the first index is the most selective one (the one which allows a smaller set)? I had never considered this question until I saw the graphics from the referenced article:
Image taken from the referenced article about multicolumn indexes.
The queries use values from the two columns for filtering. I have no queries using just one column for filtering. All of them are: WHERE c1=@ParameterA AND c2=@ParameterB
. There are also conditions like this: WHERE c1 = "abc" AND c2 LIKE "ab%"
Best Answer
Answer
Since you refer to the website
use-the-index-luke.com
, consider the chapter:Use The Index, Luke › The Where Clause › Searching For Ranges › Greater, Less and BETWEEN
It has an example that matches your situation perfectly (two-column index, one is tested for equality, the other for range), explains (with more of those nice index graphics) why @ypercube's advice is accurate and sums it up:
Also good for just one column?
What to do for queries on just one column seems to be clear. More details and benchmarks concerning that under these related question:
Less selective column first?
Apart from that, what if you have only equality conditions for both columns?
It doesn't matter. Put the column first that is more likely to receive conditions of its own, which actually matters.
Consider this demo, or reproduce it yourself. I create a simple table of two columns with 100k rows. One with very few, the other one with lots of distinct values:
Query:
EXPLAIN ANALYZE
output (Best of 10 to exclude caching effects):Add index, retest:
Add other index, retest: