Sql-server – Query optimizer recommends adding index instead of using existing index

indexnonclustered-indexperformancequery-performancesql server

I am trying to determine why the query optimizer in SQL Server is recommending creating a new index instead of using an existing index that appears to be sufficient for the query.

First the table. Column names changed to protect the innocent 🙂

CREATE TABLE [myTable] (
  [id] [int] IDENTITY(1,1) NOT NULL,
  [serialNumber] [varchar](12) NOT NULL,
  [sName] [varchar](64) NOT NULL,
  [meanValue] [int] NOT NULL,
  [range] [int] NOT NULL,
  [modifiedDate] [datetime] NOT NULL,
  CONSTRAINT [PK_myTable] PRIMARY KEY CLUSTERED ( [id] ASC )
)

Create the index in question:

CREATE NONCLUSTERED INDEX [IDX_myIndex]
ON [myTable] ([serialNumber], [sName], [meanValue], [range])
INCLUDE ([modifiedDate])

Add data for testing using your generator of choice 😉 Run the following query (table only has a few million records)

SELECT TOP 1000
  [serialNumber],
  [sName],
  [meanValue],
  [range],
  [modifiedDate]
FROM [myTable]
WHERE [serialNumber] = 137802
AND [sName] = 'A Name'

The query optimizer recommends using a new index where the additional where clauses are covered in the INCLUDE instead of part of the key:

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[myTable] ([sName])
INCLUDE ([serialNumber],[meanValue],[range],[modifiedDate])

I was under the impression that a broader index that encompasses more columns will be used as the index as long as the order of the WHERE clauses represents the order of the columns indexed.

If I also WHERE on the modifiedData the index gets used and the query optimizer doesn't complain:

SELECT TOP 1000
  [serialNumber],
  [sName],
  [meanValue],
  [range],
  [modifiedDate]
FROM [myTable]
WHERE [serialNumber] = 137802
AND [sName] = 'A Name'
AND ([modifiedDate] >= '2000-04-25' AND [modifiedDate] < '2019-04-30')

The DBA link
SQL Server 2008R2 – Why is my index not used suggests a closer correlation between the index key and includes with the SELECT statement helps determine index use (but in my example they are basically the same). I have lots of rows, which probably satisfies the row use probability test, and there are no NULLs – thus negating the NULL effect of the index.

I thought, maybe incorrectly, that an index A, B, C, D would cover an query where A, B, C, or A, B, or A would be run. Is this assumption wrong? I realize there could be edge conditions that throw this basic concept off-kilter, but at a fundamental level, isn't this roughly how it's supposed to work?

Thanks in advance for any help, pointing out the stupidity in my ways, recognizing I need to go (back) to DB school etc… 🙂

Best Answer

Your index is seemingly fine and good (i.e. covering) for the query and it should be used. The real problem is the query itself and specifically this condition which hides an implicit conversion:

WHERE [serialNumber] = 137802

According to SQL Server's datatype precedence, when two values of different datatypes are compared, the value with the datatype of lower precedence is converted to the datatype of the higher precedence. Unfortunately, int is higher in the list than varchar. This blows up any hope of using the index as the column (serialNumber) values are converted to integers. The column being the 1st position of the index, leads the optimizer to not use that index and search for an alternative (and thus the suggestion.)

Solution is not to have any implicit or explicit conversions of columns in WHERE condition. Simply use:

WHERE [serialNumber] = '137802'

Related Solutions

Mysql – Identical query, tables, but different EXPLAIN and performance

Just make a simple join. Sub-queries does not provide the best result quite often

EXPLAIN SELECT l.id, l.level_name, l.date_published, l.rating
FROM levels AS l
INNER JOIN users_favorites AS uf 
ON uf.level_id = l.id
WHERE l.user_id = 2;

Mysql – Adding index to large thesql tables

The activity of altering big tables are done in phases:

Create a new table with required fields and indexes say in test DB (just structure)
Dump the data from the existing table and load the same to the newly created table in test DB
Now announce your downtime :)
Swap the tables by renaming - RENAME table ur_db.table_name to test.temp, test.table_name to ur_db.table_name, test.temp to test.table_name; This is an atomic operation taking fraction of a second.
Load the extra records to the newly created table ( the records that came after dumping and then loading). This step can be done before Step: 3 also once to reduce your downtime.
And your system is back

Few notes:

You need not hit information schema directly like this, try using SHOW TABLE STATUS from db like 'table_name'
The speed of alter table is more or less linked with the I/O speed. Last time when we ran a direct alter table(without the above steps), we had 40GB+ table size it took around 4hours. If your 20GB data is taking years, you are working on some outdated machine.
Also drop of the unwanted indexes like index_bslt_ondate, index_base_schedule_line_items_loan_base_schedule as some other indexes has the left most column as the indexed column

Please let me know if you need any clarification on any of these steps.

Edit : A simple python script to automate the process https://github.com/georgecj11/hotswap_mysql_table.git

Best Answer

Related Solutions

Mysql – Identical query, tables, but different EXPLAIN and performance

Mysql – Adding index to large thesql tables

Related Question