I like providing direct answers to questions; however, this topic can go deeper and longer, therefore I am adding a few articles at the bottom that expand on these details for all to learn from.
In summary ...
A few things that can affect using an index (mind you there are a lot of reasons too as you'll see in the articles posted below). The main priority of the engine is to predict how to get the data off the disk as fast as possible in the most effective/efficient way (use of statistics). What becomes the primary choice of the engine is to perform a table scan or index seek/scan. (There can be more depth to this conversation, but I will keep it to the context of your question).
- Use less fields in the SELECT that are more cohesive with the INDEX KEY and INDEX INCLUDES.
- More rows in the table will associate with the use of indexes
- The more the rows are selective (unique) the more likely the use of indexes, thus many NULL values will cause the engine to avoid the index
In detail ...
What are some causes a table scan or an index scan/seek is used. These are also general helpers to determine index creation and use for most general situations. Follow these few steps and you will achieve big benefits right from the get-go.
One reason, as it is mentioned in the comments above, is the use of SELECT * FROM WHERE . SELECT * is a sure-fire way for the engine to decide to avoid using indexes. It's faster to get all fields from the table (Clustered Index/Heap) by scanning/retrieving from the table itself, bypassing any indexes. First human choice is to minimize your select fields to those in the INDEX KEY and in the INDEX INCLUDES.
The second reason is to how many records are in the table. The fewer the records, the easier it is for the engine to simply scan the table. The statistics can have a part in this. Because the engine will use the statistics to predict where the data is on the pages/disk, it can be said that there is not enough rows/distribution to use the index.
And thirdly, having values in your table that are less selective (less unique), like Male and Female, the less likely the index will be used. A table that will use indexes more in queries will be for fields that are highly selective (more unique), like zip codes in an address table (so long as your list of address are NOT all of your neighbors with the same zip code).
There are many techniques and strategies. But one piece of advice, become knowledgeable in reading and experience with indexes/statistics and how/where to put them on DISKS/LUNS and you'll go far as a DBA and offering huge performance gains.
Answer
Since you refer to the website use-the-index-luke.com
, consider the chapter:
Use The Index, Luke › The Where Clause › Searching For Ranges › Greater, Less and BETWEEN
It has an example that matches your situation perfectly (two-column index, one is tested for equality, the other for range), explains (with more of those nice index graphics) why @ypercube's advice is accurate and sums it up:
Rule of thumb: index for equality first — then for ranges.
Also good for just one column?
What to do for queries on just one column seems to be clear. More details and benchmarks concerning that under these related question:
Less selective column first?
Apart from that, what if you have only equality conditions for both columns?
It doesn't matter. Put the column first that is more likely to receive conditions of its own, which actually matters.
Consider this demo, or reproduce it yourself. I create a simple table of two columns with 100k rows. One with very few, the other one with lots of distinct values:
CREATE TEMP TABLE t AS
SELECT (random() * 10000)::int AS lots
, (random() * 4)::int AS few
FROM generate_series (1, 100000);
DELETE FROM t WHERE random() > 0.9; -- create some dead tuples, more "real-life"
ANALYZE t;
SELECT count(distinct lots) -- 9999
, count(distinct few) -- 5
FROM t;
Query:
SELECT *
FROM t
WHERE lots = 2345
AND few = 2;
EXPLAIN ANALYZE
output (Best of 10 to exclude caching effects):
Seq Scan on t (cost=0.00..5840.84 rows=2 width=8)
(actual time=5.646..15.535 rows=2 loops=1)
Filter: ((lots = 2345) AND (few = 2))
Buffers: local hit=443
Total runtime: 15.557 ms
Add index, retest:
CREATE INDEX t_lf_idx ON t(lots, few);
Index Scan using t_lf_idx on t (cost=0.00..3.76 rows=2 width=8)
(actual time=0.008..0.011 rows=2 loops=1)
Index Cond: ((lots = 2345) AND (few = 2))
Buffers: local hit=4
Total runtime: 0.027 ms
Add other index, retest:
DROP INDEX t_lf_idx;
CREATE INDEX t_fl_idx ON t(few, lots);
Index Scan using t_fl_idx on t (cost=0.00..3.74 rows=2 width=8)
(actual time=0.007..0.011 rows=2 loops=1)
Index Cond: ((few = 2) AND (lots = 2345))
Buffers: local hit=4
Total runtime: 0.027 ms
Best Answer
Without having much detail, I can't recommend much.
One thing that does jump out at me is that it's very likely you can improve performance on the table by normalizing it! The presence of so many duplicated (so few unique) values in the columns you listed suggests that perhaps many others in the table are not normalized, as well. I'm suggesting making the
Name
column anint
(or evensmallint
) with a foreign key to aNames
table, and theCurrent_state
columnbit
(or alternately atinyint
) with a foreign key to aWhateverStates
table. You would have to, of course, change your data access code to deal with this indirection, but that is nothing more than the basic job relational database developers have always had to do.Normalizing will reduce the number of bytes per row, increasing the number of rows per page, reducing the number of pages that have to be read to satisfy any particular query, helping performance across the board! Right now the columns given require likely close to 34 bytes each. After the change I suggest, those columns will only require 11 bytes each. Of course, I haven't seen your whole table--your rows may be so big that it doesn't matter.
What columns and datatypes are in the clustered index (if there is one)? This can radically affect the size of the nonclustered indexes, again affecting performance in exactly the way I described (rows per page).
When you do query based on non-selective columns such as
Current_state
, what other columns are always or almost always included? It may be okay for you to have a nonselective column in an index if the index also contains a more selective column (or that in conjunction with the less selective column is more highly selective). If on the other hand you generally query often for rows based on the single columnCurrent_state = 'Pending'
, then you can add a filtered index:This technique could help you even when you also include other columns: you would want to put those in the index instead of the
ClusteredColumnsInOrder
columns I suggested (which was just a tricky way to not put any additional columns into the nonclustered index, since--remember, now--nonclustered indexes always have all the columns of the clustered index implicitly included). Or, if you only pull a very few other columns, you can make your nonclustered index cover the query by addingINCLUDE (AdditionalColumn1, AdditionalColumn2)
so that the query engine doesn't have to go back to the clustered index to satisfy the query.You haven't provided very much information such as full table schema, sample data, and sample queries, and without those it's going to be pretty hard to give you very specific advice about what to do.
One thing I can say, though, is that indiscriminately throwing indexes at the table may not improve things much and could in fact hurt performance of your system overall.
If the hints I have given you here don't seem to help much, then I recommend that you do come back with some of the additional info I mentioned so that we can do a better job of assisting you.