Sql-server – Query Performance Issue

performancequery-performancesql-server-2008-r2t-sql

I have this below query which is behaving bit weird . well Weird in the sense that I couldn't find complete explanation for this.

Version : Sql Server 2008 R2 Enterprise

No fragmentation . Statistics updated with fullscan and for all indexes.

DECLARE @t TABLE (
  subid INT )

INSERT INTO @t
VALUES     (7)--,(3)


SELECT TOP 1 t.QueueItemID
FROM   QueueTable t
WHERE  t.IsProcessed = 0
       AND t.QCode = 'USA'
       AND SubID IN (SELECT SubID
                     FROM   @t)
ORDER  BY t.QueueItemID 

Little bit about schema :

Table variable (@t ) is just one column subid .

QueueTable schema is :

CREATE TABLE [dbo].[QueueTable](
    [QueueItemID] [int] IDENTITY(1,1) NOT NULL,
    [SubID] [int] NOT NULL,
    [IsProcessed] [bit] NOT NULL,
    [Qload] [varchar](max) NOT NULL,
    [QCode] [varchar](5) NOT NULL,
    [QDesc] [varchar](max) NULL,
 CONSTRAINT [PK_QueueTable] PRIMARY KEY CLUSTERED 
(
    [QueueItemID] ASC
)

This table is big as obvious from varchar(max) type columns in above schema..

There are 2 NC index:

  1. NC index NCx_1 on (isprocessed,qcode) include(queueitemid,subid)

  2. NC index NCx_2 on (subid,isprocessed,qcode) include(queueitemid)

Total rowcount is around 9 million rows. Group by subid is below :

SubID   RowCount
------  --------
1   68
2   8255571
3   378584
7   5350
11  5318

Rows satisfying condition (t.IsProcessed = 0 and t.QCode = 'USA' ) are around 350k .

When I run above query it takes 1.5 sec to complete with seek on NC NCx_1 and then scan on table variable. Here is plan.

Query Plan

Above plan is for subid = 11 or 7 in @t table variable
Not sure why it isn't using index NCx_2 (subid,isprocessed,qcode) include(queueitemid) which matches the criteria . It is using index NCX_1 instead .

It appears as it is seeking around 350k rows to satisfy (t.IsProcessed = 0 and t.QCode = 'USA' ) and then filtering out data based on subid column.

I would expect it to first filter out data based on subid column (which would be very less) and then apply other filters which is exactly what is NCX_2 is for.

I tried couple of optimization here that improved performance but want to understand this strange behavior at least to me.

  1. When I add merge join hint in the query then query runs very fast ( 100 ms)
  2. When I add index hint (NCX_2) in the query then also query runs very fast (60 ms)
  3. When I modify query to do MIN(t.QueueItemID) and remove order by query again runs very fast( 60 ms)

Not sure why optimizer not choosing it by default.

Best Answer

You say

Above plan is for subid = 11 or 7 in @t table variable

I think you may be under a misapprehension here. SQL Server does not look at the contents of the table variable and choose a plan based upon the values it contains.

The statement is compiled before the table variable contains any rows at all and you will get the same plan (that assumes a single row) regardless of whether it eventually contains 2 (and would match 95.5% of the rows) or 1 (and would match only 0.0008%).

The table variable may of course also contain multiple rows but SQL Server will not take account of that except if you use the OPTION (RECOMPILE) hint and even then there are no statistics on table variables so it cannot take any account of actual values.

Some alternate plans are below

plan 1

plan 2

These require finding all matching rows and sorting them.

Because NCx_1 is not declared as a unique index the include(QueueItemID) is ignored (as explained in More About Nonclustered Index Keys) and QueueItemID gets added as an index key column instead. This means that SQL Server can seek on IsProcessed, QCode and the matching rows will be ordered by QueueItemID.

The plan in your question therefore avoids a sort operation but performance is entirely reliant upon how many rows in practice need to be evaluated before the first one matching the SubID IN (SELECT SubID FROM @t) predicate is found and the range seek can stop.

Of course this can vary wildly depending on how common the SubID values contained in @t are and whether there is any skew in the distribution of these values with respect to QueueItemID (You say that both around 350k rows match the seek predicate and that around 350k end up being seeked so for SubID = 7 it sounds like these are all at the end or perhaps no rows match at all - which would be the worse case for this plan).

It would be interesting to know what the estimated number of rows coming out of the seek is. Presumably this is much less than 350,000 and thus SQL Server chooses the plan you see based on this estimated cost.

If the table variable will always just have few rows you might find this rewrite works better for you.

SELECT TOP 1 QueueItemID
FROM   @t
       CROSS APPLY (SELECT TOP 1 t.QueueItemID
                    FROM   QueueTable t
                    WHERE  t.IsProcessed = 0
                           AND t.QCode = 'USA'
                           AND SubID = [@t].SubID
                    ORDER  BY t.QueueItemID) CA 
ORDER BY QueueItemID   

For me it gives the plan below where it seeks into the index on subid,isprocessed,qcode,queueitemid as many times as you have rows in the table variable. It is similar to the first plan shown but may be slightly more efficient as each seek stops after the first row is returned.

plan