Sql-server – SQL Server Index Scan when using ‘OR’ operator

nonclustered-indexoptimizationsql server

We implemented a Google-style search where SQL queries are run after a debounce is triggered from the front-end. (We know SQL is probably the wrong technology for this, but I'm knee-deep in startup-chaos here.) The query:

SELECT 
    TOP(50) [Name], [Surname]
FROM 
    [dbo].[Clients]
WHERE 
    [Name] LIKE @SearchTerm + '%' OR
    [Surname] LIKE @SearchTerm + '%'

It is a sizable table, so I added two non-clustered indexes on both columns to help speed things up:

CREATE NONCLUSTERED INDEX [IX_Patients_Name] ON [dbo].[Clients]
(
    [Name] ASC
)
INCLUDE([Surname]);

CREATE NONCLUSTERED INDEX [IX_Patients_Surname] ON [dbo].[Clients]
(
    [Surname] ASC
)
INCLUDE([Name]);

My thinking was that SQL would do an index seek on both columns, but it seems the query optimizer decides to use an index scan.

This might not be a real issue for this simple use-case, but we have more complex versions of this with multiple joins, etc.

Is there any way to optimize this query to uses seeks?

Best Answer

As you mentioned in your question, this kind of Google-style query isn't really what SQL Server is "good at." Erik Darling talked about this exact query anti-pattern in his post The Only Thing Worse Than Optional Parameters….

All that aside.

It's possible to naturally get a seek with that type of query, but it's much more common to get the scan as you noticed. Here's an example from the StackOverflow2010 sample database.

First I'll create these two helpful indexes:

CREATE NONCLUSTERED INDEX IX_DisplayName ON dbo.Users (DisplayName) INCLUDE ([Location]);
CREATE NONCLUSTERED INDEX IX_Location ON dbo.Users ([Location]) INCLUDE (DisplayName);
GO

Then I'll create a procedure similar to the one you have:

CREATE OR ALTER PROCEDURE dbo.sp_Test
    @SearchTerm nvarchar(100)
AS
BEGIN;
    SELECT TOP (50)
        DisplayName, 
        [Location]
    FROM 
        dbo.Users
    WHERE 
        DisplayName LIKE @SearchTerm + '%' OR
        [Location] LIKE @SearchTerm + '%'
END;
GO

If I run that procedure with a fairly selective parameter, I'll end up with an index union plan. If the parameter is less selective, a scan of one of the covering indexes is used instead.

DBCC FREEPROCCACHE;
GO
EXEC dbo.sp_Test @SearchTerm = N'Josh';
GO
DBCC FREEPROCCACHE;
GO
EXEC dbo.sp_Test @SearchTerm = N'S';
GO

Execution plans are here.

Note that this is true even if you write this as separate UNION queries directly.

One way to reliably get the index union plan, as mentioned in the linked post, is to add a FORCESEEK hint to the table you'd like to union on.

If I change the proc to this, I don't get scans on either plan:

CREATE OR ALTER PROCEDURE dbo.sp_Test
    @SearchTerm nvarchar(100)
AS
BEGIN;
    SELECT TOP (50)
        DisplayName, 
        [Location]
    FROM 
        dbo.Users WITH (FORCESEEK)
    WHERE 
        DisplayName LIKE @SearchTerm + '%' OR
        [Location] LIKE @SearchTerm + '%'
END;
GO

The bigger issue with the query, as simplified in the question anyway, is that you are using TOP without an ORDER BY, which is likely to produce drastically different search results depending on which index is used. Make sure your real query has an ORDER BY, or that this problem is accounted for in some way.

Related Solutions

Sql-server – Greater than operator ignoring a nonclustered index

Ensure your statistics are updated for your table and indexes.

Current statistics are crucial for determining the correct plan.

In your plan if the "Actual Number of Rows" and the "Estimated Number of Rows" are fairly close then your statistics are ok.

If they need updated try using a FULLSCAN - sql server uses it's own sample size but this can sometimes be too small.

Sql-server – Why clustered index scan

This table is very small!

It has 20 rows of which 2 match the search condition. The table definition contains three columns and two indexes (which both support uniqueness constraints).

CREATE TABLE Person.ContactType(
    ContactTypeID int IDENTITY(1,1) NOT NULL,
    Name dbo.Name NOT NULL,
    ModifiedDate datetime NOT NULL,
    CONSTRAINT PK_ContactType_ContactTypeID PRIMARY KEY CLUSTERED(ContactTypeID),
    CONSTRAINT AK_ContactType_Name UNIQUE NONCLUSTERED(Name)
)

Running

SELECT index_type_desc,
       index_depth,
       page_count,
       avg_page_space_used_in_percent,
       avg_record_size_in_bytes
FROM   sys.dm_db_index_physical_stats(db_id(), 
                                      object_id('Person.ContactType'), 
                                      NULL, 
                                      NULL, 
                                      'DETAILED')

Shows both indexes only consist of a single leaf page with no upper level pages.

+--------------------+-------------+------------+--------------------------------+--------------------------+
|  index_type_desc   | index_depth | page_count | avg_page_space_used_in_percent | avg_record_size_in_bytes |
+--------------------+-------------+------------+--------------------------------+--------------------------+
| CLUSTERED INDEX    |           1 |          1 | 15.9130219915987               | 62.5                     |
| NONCLUSTERED INDEX |           1 |          1 | 13.1949592290586               | 51.5                     |
+--------------------+-------------+------------+--------------------------------+--------------------------+

Rows on each index page aren't necessarily in index key order but each page has a slot array with the offset of each row on the page. This is maintained in index order.

The nonclustered index covers two out of the three columns (Name as a key column and ContactTypeID as a row locator back to the base table) but is missing ModifiedDate.

You can use index hints to force the NCI seek as below

SELECT ct.*
FROM   Person.ContactType AS ct WITH (INDEX = AK_ContactType_Name)
WHERE  ct.Name LIKE 'Own%';

But you can see that under SQL Server's cost model this plan is given a higher estimated cost than the competing CI scan (roughly double).

enter image description here

The single page clustered index scan would just need to read all the 20 rows on the page, evaluate the predicate against them and return them.

The single page nonclustered index range seek might potentially be able to perform a binary search on the slot array to reduce the number of rows evaluated however the index does not cover the query so it would also need a potential IO to retrieve the CI page and then it would still need to locate the row with the missing column values on there (for each row returned by the NCI seek).

On my machine running 1 million iterations of the non clustered index plan took 15.245 seconds compared to 11.113 seconds for the clustered index plan. Whilst this is far from double the plan without the hint was measurably faster.

Even if the table was orders of magnitude larger however you may well still not get your expected plan with lookups.

SQL Server's costing model prefers sequential scans to random IO lookups and the "tipping point" between it choosing a scan of a covering index or a seek and lookups of a non covering one is often surprisingly low as discussed in Kimberley Tripp's blog post here.

It is certainly not out of the question that it would choose such a plan for a 10% selective predicate but the clustered index would likely need to be quite a lot wider than the NCI for it to do so.

Best Answer

Related Solutions

Sql-server – Greater than operator ignoring a nonclustered index

Sql-server – Why clustered index scan

Related Question