I have this below query which is behaving bit weird . well Weird in the sense that I couldn't find complete explanation for this.
Version : Sql Server 2008 R2 Enterprise
No fragmentation . Statistics updated with fullscan and for all indexes.
DECLARE @t TABLE (
subid INT )
INSERT INTO @t
VALUES (7)--,(3)
SELECT TOP 1 t.QueueItemID
FROM QueueTable t
WHERE t.IsProcessed = 0
AND t.QCode = 'USA'
AND SubID IN (SELECT SubID
FROM @t)
ORDER BY t.QueueItemID
Little bit about schema :
Table variable (@t ) is just one column subid .
QueueTable schema is :
CREATE TABLE [dbo].[QueueTable](
[QueueItemID] [int] IDENTITY(1,1) NOT NULL,
[SubID] [int] NOT NULL,
[IsProcessed] [bit] NOT NULL,
[Qload] [varchar](max) NOT NULL,
[QCode] [varchar](5) NOT NULL,
[QDesc] [varchar](max) NULL,
CONSTRAINT [PK_QueueTable] PRIMARY KEY CLUSTERED
(
[QueueItemID] ASC
)
This table is big as obvious from varchar(max) type columns in above schema..
There are 2 NC index:
-
NC index NCx_1 on (isprocessed,qcode) include(queueitemid,subid)
-
NC index NCx_2 on (subid,isprocessed,qcode) include(queueitemid)
Total rowcount is around 9 million rows. Group by subid is below :
SubID RowCount
------ --------
1 68
2 8255571
3 378584
7 5350
11 5318
Rows satisfying condition (t.IsProcessed = 0 and t.QCode = 'USA' ) are around 350k .
When I run above query it takes 1.5 sec to complete with seek on NC NCx_1 and then scan on table variable. Here is plan.
Above plan is for subid = 11 or 7 in @t table variable
Not sure why it isn't using index NCx_2 (subid,isprocessed,qcode) include(queueitemid) which matches the criteria . It is using index NCX_1 instead .
It appears as it is seeking around 350k rows to satisfy (t.IsProcessed = 0 and t.QCode = 'USA' ) and then filtering out data based on subid column.
I would expect it to first filter out data based on subid column (which would be very less) and then apply other filters which is exactly what is NCX_2 is for.
I tried couple of optimization here that improved performance but want to understand this strange behavior at least to me.
- When I add merge join hint in the query then query runs very fast ( 100 ms)
- When I add index hint (NCX_2) in the query then also query runs very fast (60 ms)
- When I modify query to do MIN(t.QueueItemID) and remove order by query again runs very fast( 60 ms)
Not sure why optimizer not choosing it by default.
Best Answer
You say
I think you may be under a misapprehension here. SQL Server does not look at the contents of the table variable and choose a plan based upon the values it contains.
The statement is compiled before the table variable contains any rows at all and you will get the same plan (that assumes a single row) regardless of whether it eventually contains
2
(and would match 95.5% of the rows) or1
(and would match only0.0008%
).The table variable may of course also contain multiple rows but SQL Server will not take account of that except if you use the
OPTION (RECOMPILE)
hint and even then there are no statistics on table variables so it cannot take any account of actual values.Some alternate plans are below
These require finding all matching rows and sorting them.
Because
NCx_1
is not declared as a unique index theinclude(QueueItemID)
is ignored (as explained in More About Nonclustered Index Keys) andQueueItemID
gets added as an index key column instead. This means that SQL Server can seek onIsProcessed, QCode
and the matching rows will be ordered byQueueItemID
.The plan in your question therefore avoids a sort operation but performance is entirely reliant upon how many rows in practice need to be evaluated before the first one matching the
SubID IN (SELECT SubID FROM @t)
predicate is found and the range seek can stop.Of course this can vary wildly depending on how common the
SubID
values contained in@t
are and whether there is any skew in the distribution of these values with respect toQueueItemID
(You say that both around 350k rows match the seek predicate and that around 350k end up being seeked so forSubID = 7
it sounds like these are all at the end or perhaps no rows match at all - which would be the worse case for this plan).It would be interesting to know what the estimated number of rows coming out of the seek is. Presumably this is much less than 350,000 and thus SQL Server chooses the plan you see based on this estimated cost.
If the table variable will always just have few rows you might find this rewrite works better for you.
For me it gives the plan below where it seeks into the index on
subid,isprocessed,qcode,queueitemid
as many times as you have rows in the table variable. It is similar to the first plan shown but may be slightly more efficient as each seek stops after the first row is returned.