Sql-server – Identical query slower when searching for `T` rather than `F`

I have a legacy database that uses char(1) with 'T' and 'F' for boolean values. The only values in the field are T and F, of the following queries, the first is instantaneous, the second takes 30 seconds.

select top 50 * from vw_Movement 
where MOVTypeId = 20 and MOVRequiresResolution = 'T' and MOVIsResolved = 'F' 

select top 50 * from vw_Movement 
where MOVTypeId = 20 and MOVRequiresResolution = 'T' and MOVIsResolved = 'T'

The execution plans are very similar but the slower one uses a different index.

I fix the issue by changing the sql to:

select top 50 * from vw_Movement 
where MOVTypeId = 20 and MOVRequiresResolution = 'T' and MOVIsResolved <> 'F'

This gets the db to use the same index as the first query and executes instantly, but I'd prefer to understand what's actually happening. I'm pretty green at reading execution plans, but some pointers as to what's might be going on would be appreciated.

From reading around, it seems that the expected number of records meeting the criteria would impact performance, the table in question has many more F than T:

movisresolved 
------------- -----------
T             70054
F             2734444

What's happening here?

EDIT:

The index that's fast for T:

CREATE NONCLUSTERED INDEX [ix_comms_composite3] ON [dbo].[tb_MOVMovement]
(
    [MOVTypeId] ASC,
    [MOVIsResolved] ASC,
    [MOVRequiresResolution] ASC,
    [MOVSupplierId] ASC,
    [MOVArrangementId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, 
       SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, 
       ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON
      ) ON [PRIMARY]

After a bit of prompting from the comments, this only occurs when run against the view, not on the table directly ie:

select top 50 * from tb_MOVMovement 
where MOVTypeId = 20 and MOVRequiresResolution = 'T' and MOVIsResolved = 'F'

select top 50 * from tb_MOVMovement 
where MOVTypeId = 20 and MOVRequiresResolution = 'T' and MOVIsResolved = 'T'

Are both equally quick, view definition here: http://pastebin.com/raw.php?i=FER0bqwh

Table script: http://pastebin.com/nSpSah7b
Poor Index: http://pastebin.com/V7qEpqsY
Execution Plans: https://drive.google.com/file/d/0B5O3tLIEkg56bWhXcU5KcldPa2s/edit?pli=1

Best Answer

As already stated in the responses below your request it is most certainly due to the unbalanced nature of F and T causing SQL Server to prefer scan over seek. Though this is quite some simplification as you are querying a view containing quite a lot of tables and WHERE as well as JOIN conditions.

In case you query "[...] and MOVIsResolved = 'F'" the SQL Optimizer most likely prefers a SCAN over a SEEK, as the given result is that large. with "[...] ='T'" it prefers a SEEK as the result set is small enough. And finally with "[...] <>'F'" the optimizer has no other choice but to SCAN and check each entry, whether the column contains not 'F'. In the end for you it is quite clear "<> F" equals "= T" as you know there can only be these 2. SQL Server does not know this. Therefore the resulting execution plans for "<> 'F'" and "= 'F'" are the same.

As an alternative you might wanna look into OPTION( OPTIMIZE FOR UNKNOWN ), thought I do not know, whether this actually works with hard coded queries as yours.

Best Answer

Related Solutions

SQL Server Hierarchical Order – Parent-Child Tree Hierarchical ORDER BY

SQL Server – Should INSERT Cause Exclusive Lock on Foreign Key?

Related Question