Sql-server – SQL Server 2008R2 – Why is the index not used

hintsindexnullsql-server-2008-r2

I have a table defined in the following way:

CREATE TABLE [dbo].[MyTable]
(
   [MyTable_ID] [int] IDENTITY(1,1) NOT NULL,
   [COLUMN_WITH_DATA] [varchar](128) NOT NULL,
   [COLUMN_A] [varchar](128) NULL,
   [COLUMN_B] [varchar](128) NULL,
   [COLUMN_C] [bit] NOT NULL
)

And an index created like that:

CREATE INDEX [MyTable_Index_ABC] ON [dbo].[MyTable]
(
    [COLUMN_A],
    [COLUMN_B], 
    [COLUMN_C]
) 
ON [PRIMARY]

And I run the following query:

SELECT TOP 1 [MyTable].*
    FROM [MyTable](UPDLOCK)
    WHERE
        [MyTable].[COLUMN_A] IS NULL
    AND [MyTable].[COLUMN_B] IS NULL
    AND [MyTable].[COLUMN_C] = 0

The goal behind this is to get COLUMN_WITH_DATA vaule of first not yet used record, and then update its' COLUMN_A and COLUMN_B with not null values, to mark it as consumed.

I have a couple of millions of records in MyTable, and about half of them has both:
COULMN_A and COLIMN_B that are not NULL (indicating the data was already consumed).

In this situation the query runs really slow, and the execution plan shows that the index:

MyTable_Index_ABC

is not used unless I use a query hint, in which case the query runs much faster.
On the other hand if I update all not yet consumed rows, that is having:

COLUMN_A IS NULL AND COLUMN_B IS NULL

so that COLUMN_A and COLUMN_B contain empty string: '' instead of NULL, the index is used and the query runs much faster again.

Two questions would be:

  1. why null values make my index discarded
  2. is it possible to instruct the database to always use the index without having to use query hints on possibly multiple different queries
    ??

Thanks,

Best Answer

I like providing direct answers to questions; however, this topic can go deeper and longer, therefore I am adding a few articles at the bottom that expand on these details for all to learn from.

In summary ...

A few things that can affect using an index (mind you there are a lot of reasons too as you'll see in the articles posted below). The main priority of the engine is to predict how to get the data off the disk as fast as possible in the most effective/efficient way (use of statistics). What becomes the primary choice of the engine is to perform a table scan or index seek/scan. (There can be more depth to this conversation, but I will keep it to the context of your question).

  1. Use less fields in the SELECT that are more cohesive with the INDEX KEY and INDEX INCLUDES.
  2. More rows in the table will associate with the use of indexes
  3. The more the rows are selective (unique) the more likely the use of indexes, thus many NULL values will cause the engine to avoid the index

In detail ...

What are some causes a table scan or an index scan/seek is used. These are also general helpers to determine index creation and use for most general situations. Follow these few steps and you will achieve big benefits right from the get-go.

One reason, as it is mentioned in the comments above, is the use of SELECT * FROM WHERE . SELECT * is a sure-fire way for the engine to decide to avoid using indexes. It's faster to get all fields from the table (Clustered Index/Heap) by scanning/retrieving from the table itself, bypassing any indexes. First human choice is to minimize your select fields to those in the INDEX KEY and in the INDEX INCLUDES.

The second reason is to how many records are in the table. The fewer the records, the easier it is for the engine to simply scan the table. The statistics can have a part in this. Because the engine will use the statistics to predict where the data is on the pages/disk, it can be said that there is not enough rows/distribution to use the index.

And thirdly, having values in your table that are less selective (less unique), like Male and Female, the less likely the index will be used. A table that will use indexes more in queries will be for fields that are highly selective (more unique), like zip codes in an address table (so long as your list of address are NOT all of your neighbors with the same zip code).

There are many techniques and strategies. But one piece of advice, become knowledgeable in reading and experience with indexes/statistics and how/where to put them on DISKS/LUNS and you'll go far as a DBA and offering huge performance gains.