SQL Server – Efficient Sorting with Redundant Fields in ORDER BY

execution-planoptimizationorder-bysortingsql server

Suppose I have a table SortTest with fields Data1, Data2, Sort1, and Sort2, Key1 and Key2 need to perform the following query:

SELECT TOP 1
    Data1,
    Data2
FROM SortTest
WHERE Key1 = @key1 AND
      Key2 = @key2
ORDER BY Sort1, Sort2

In order to opimise it, I created an index for the following sequence

Key1, Key2, Sort1, Sort2

But the execution plan still shows an index scan instead of a seek, for it cannot sort effectively on a field sequence that does not start an index. In order, therefore, to optimise the query, I had to add they keys to the ORDER BY clause, which are, of course, redundant:

SELECT TOP 1
    Data1,
    Data2
FROM SortTest
WHERE Key1 = @key1 AND
      Key2 = @key2
ORDER BY Key1, Key2, Sort1, Sort2

The query now works as expected, but I should like to know whether it can be optimised in a more elegant way.

Correction:

I realized that when I simplified the queries above, there was a grave mistake. One of the condition had an IN, not =, so the real queries (that produce the different plans) have this instead:

WHERE Key1 = @key1 AND
      Key2 = IN (@key2a, @key2b, ...)

which explains (that the key2 values in the result are not fixed and thus the different ORDER BY can result in different output) and the plan difference.

Let thank every body for their help and apologies for the confusion.

Best Answer

This is actually pretty straightforward. When performing a query, SQL Server first looks to identify the rows to be returned. If there are WHERE clause elements, those are checked before the system even considers using an index used for the ORDER BY.

This makes perfect sense if you consider the possibilities.

No index on anything in the WHERE clause -- must perform table or clustered index scan
Index on one element in the WHERE clause -- Scan all rows as filtered by the one element for other matches
Index on all elements in the WHERE clause -- Select all rows based on the index. Must look up data in record.
Index on all elements in the WHERE clause, and INCLUDE of the data elements -- Select all rows based on the index. Use the data elements in the INCLUDE statement.
Index on all elements in the WHERE clause plus all data elements -- Select all rows based on the index, and use the data elements embedded in the index.
Index on the data elements only -- If there is a clustered index, looking up data on that index will be most efficient. Unknown if the query optimizer would scan the data element index to try to make the data retrieval more efficiend, but it is doubtful.

Basically, your best index would be:

CREATE INDEX MyIndex ON MyTable (Key1, Key2, Sort1, Sort2)

The data would be searched on the Key information (which is what an index is designed for), then use the additional information, which is already sorted in the index, for the output.

Edited

Related Solutions

SQL Server Sorting – Sort Order in Primary Key Yet Sorting Executed on SELECT

For a non partitioned table I get the following plan

Plan 1

There is a single seek predicate on Seek Keys[1]: Prefix: DeviceId, SensorId = (3819, 53), Start: Date < 1339225010.

Meaning that SQL Server can perform an equality seek on the first two columns and then begin a range seek starting at 1339225010 and ordered FORWARD (as the index is defined with [Date] DESC)

The TOP operator will stop requesting more rows from the seek after the first row is emitted.

When I create the partition scheme and function

CREATE PARTITION FUNCTION PF (int)
AS RANGE LEFT FOR VALUES (1000, 1339225009 ,1339225010 , 1339225011);
GO
CREATE PARTITION SCHEME [MyPartitioningScheme]
AS PARTITION PF
ALL TO ([PRIMARY] );

And populate the table with the following data

INSERT INTO [dbo].[SensorValues]    
/*500 rows matching date and SensorId, DeviceId predicate*/
SELECT TOP (500) 3819,53,1, ROW_NUMBER() OVER (ORDER BY (SELECT 0))           
FROM master..spt_values
UNION ALL
/*700 rows matching date but not SensorId, DeviceId predicate*/
SELECT TOP (700) 3819,52,1, ROW_NUMBER() OVER (ORDER BY (SELECT 0))           
FROM master..spt_values
UNION ALL 
/*1100 rows matching SensorId, DeviceId predicate but not date */
SELECT TOP (1100) 3819,53,1, ROW_NUMBER() OVER (ORDER BY (SELECT 0)) + 1339225011      
FROM master..spt_values

The plan on SQL Server 2008 looks as follows.

Plan 2

The actual number of rows emitted from the seek is 500. The plan shows seek predicates

Seek Keys[1]: Start: PtnId1000 <= 2, End: PtnId1000 >= 1, 
Seek Keys[2]: Prefix: DeviceId, SensorId = (3819, 53), Start: Date < 1339225010

Indicating it is using the skip scan approach described here

the query optimizer is extended so that a seek or scan operation with one condition can be done on PartitionID (as the logical leading column) and possibly other index key columns, and then a second-level seek, with a different condition, can be done on one or more additional columns, for each distinct value that meets the qualification for the first-level seek operation.

This plan is a serial plan and so for the specific query you have it seems that if SQL Server ensured that it processed the partitions in descending order of date that the original plan with the TOP would still work and it could stop processing after the first matching row was found rather than continuing on and outputting the remaining 499 matches.

In fact the plan on 2005 looks like it does take that approach

Plan on 2005

I'm not sure if it is straight forward to get the same plan on 2008 or maybe it would need an OUTER APPLY on sys.partition_range_values to simulate it.

SQL Server – Optimizer Not Choosing Index Union Plan

The optimizer does not always consider index-union plans (like the one shown in your second graphic) to resolve disjunctions (OR predicates) unless a FORCESEEK or INDEX hint is specified. This is a heuristic* based on some practical considerations:

Index union is not often enough a good plan selection for general queries.
The number of ways indexes can be combined grows exponentially.

Using a hint changes the way the optimizer searches the space of possible plans. It disables some of the general heuristics and pursues a more goal-orientated strategy.

The optimizer's usual primary goal is to find a good plan quickly. It does not exhaustively search for the 'best' plan (even relatively simple queries could take years to compile if it did).

Joins with multiple conditions separated with OR have long been problematic. Over the years, the optimizer has added new tricks like converting them to equivalent UNION forms, but the transformations available are limited, so it is quite easy to come unstuck.

As far as the query plan is concerned:

The first row from DispatchLink causes a full scan of the Dispatch table
The result of the scan is stored in an internal tempdb worktable (the Table Spool)
The join checks every row from the worktable against the full OR predicate
The next row is fetched from DispatchLink and the process repeats from step 3

If there are 25,000 rows in the Dispatch Link table, the spool will be fully scanned 25,000 times. This is a disaster of course (and without index intersection, the best the optimizer can do is run the whole thing on multiple threads).

Percentage costs in query plans are only the optimizer's estimates. They never reflect actual execution costs, and are subject to the optimizer's model and will usually bear little resemblance to the 'true' cost of executing the plan on your specific hardware.

Costing numbers are there to be informative, but they should not be taken literally. The particular model the optimizer uses happens to produce pretty good plans for most queries on most systems across the world - that does not mean the model approximates anyone's reality, just that it happens to work reasonably well in practice.

Changing the design so that (Dispatch, Contract) pairs are stored in rows rather than repeated across columns will make the whole index-intersection problem go away. Relational designs with useful constraints and indexes almost always get the best out of the optimizer.

_{* This can be overridden with undocumented trace flag 8726}

Best Answer

Related Solutions

SQL Server Sorting – Sort Order in Primary Key Yet Sorting Executed on SELECT

SQL Server – Optimizer Not Choosing Index Union Plan

Related Question