Sql-server – Execution plan contains a ‘sort’ even though the data is sorted in index

azure-sql-databasesql server

I have a table which has a clustered index as below:

CREATE CLUSTERED INDEX [IX_MachineryId_DateRecorded]
ON Machinery (MachineryId, DateRecorded)

I'm selecting based on the fields in the clustered index, plus one more:

SELECT DateRecorded, Latitude, Longitude
FROM MachineryReading
WHERE MachineryId = 2127        -- First key in the index
AND DateRecorded > '2017-01-10' -- Second key in the index
AND DateRecorded < '2017-10-16' -- Second key in the index
AND FixStatus >= 2              -- Not a key, resulting in a scan
ORDER BY DateRecorded

I would have expected this to result in a simple clustered index scan. However, looking at the live query statistics, and the actual execution plan, the majority of the query's execution time comes from sorting the results after the index scan. Why is the ordered data being sorted again?

https://www.brentozar.com/pastetheplan/?id=S10DvjZpb

Best Answer

Your query accesses 10 partitions and you are searching a 10 month range so my guess would be that it is partitioned on month of DateRecorded.

I can reproduce your plan with the sort with the below.

CREATE PARTITION FUNCTION pf1 (DATE) AS RANGE RIGHT FOR VALUES ( 
'2017-01-01', 
'2017-02-01', 
'2017-03-01', 
'2017-04-01', 
'2017-05-01', 
'2017-06-01', 
'2017-07-01', 
'2017-08-01', 
'2017-09-01', 
'2017-10-01', 
'2017-11-01' );

CREATE PARTITION SCHEME ps1 AS PARTITION pf1 ALL TO ([Primary]);

CREATE TABLE MachineryReading
  (
     MachineryId  INT,
     DateRecorded DATE,
     Latitude     FLOAT,
     Longitude    FLOAT,
     FixStatus    INT
  )
ON ps1(DateRecorded)

CREATE CLUSTERED INDEX [IX_MachineryId_DateRecorded]
  ON MachineryReading (MachineryId, DateRecorded)

but technically a sort could be avoided if you could get a plan that processed the partitions in order and just concatenated one ordered result to the next.

If you are happy to assume that the partition numbers will be in order of value (I don't know if this is actually guaranteed but it seems to be the case even after partition splits) then adding a leading column to the sort of the partition number achieves this

SELECT DateRecorded,
       Latitude,
       Longitude
FROM   MachineryReading
WHERE  MachineryId = 2127 
       AND DateRecorded > '2017-01-10' 
       AND DateRecorded < '2017-10-16' 
       AND FixStatus >= 2 
ORDER  BY $PARTITION.pf1(DateRecorded),
          MachineryId, --Not really needed as this is a constant 2127
          DateRecorded

Related Solutions

Sql-server – Why is the index not being used in a SELECT TOP

If I let the server decide which index to use, it picks IX_MachineryId, and it takes up to a minute.

That index is not partitioned, so the optimizer recognizes it can be used to provide the ordering specified in the query without sorting. As a non-unique nonclustered index, it also has the keys of the clustered index as subkeys, so the index can be used to seek on MachineryId and the DateRecorded range:

The index does not include OperationalSeconds, so the plan has to look that value up per row in the (partitioned) clustered index in order to test OperationalSeconds > 0:

The optimizer estimates that one row will need to be read from the nonclustered index and looked up to satisfy the TOP (1). This calculation is based on the row goal (find one row quickly), and assumes a uniform distribution of values.

From the actual plan, we can see the estimate of 1 row is inaccurate. In fact, 19,039 rows have to be processed to discover that no rows satisfy the query conditions. This is the worst case for a row goal optimization (1 row estimated, all rows actually needed):

You can disable row goals with trace flag 4138. This would most likely result in SQL Server choosing a different plan, possibly the one you forced. In any case, the index IX_MachineryId could be made more optimal by including OperationalSeconds.

It is quite unusual to have non-aligned nonclustered indexes (indexes partitioned in a different way from the base table, including not at all).

That really suggests to me that I have made the index right, and the server is just making a bad decision. Why?

As usual, the optimizer is selecting the cheapest plan it considers.

The estimated cost of the IX_MachineryId plan is 0.01 cost units, based on the (incorrect) row goal assumption that one row will be tested and returned.

The estimated cost of the IX_MachineryId_DateRecorded plan is much higher, at 0.27 units, mostly because it expects to read 5,515 rows from the index, sort them, and return the one that sorts lowest (by DateRecorded):

This index is partitioned, and cannot return rows in DateRecorded order directly (see later). It can seek on MachineryId and the DateRecorded range within each partition, but a Sort is required:

If this index were not partitioned, a sort would not be required, and it would be very similar to the other (unpartitioned) index with the extra included column. An unpartitioned filtered index would be slightly more efficient still.

You should update the source query so that the data types of the @From and @To parameters match the DateRecorded column (datetime). At the moment, SQL Server is computing a dynamic range due to the type mismatch at runtime (using the Merge Interval operator and its subtree):

<ScalarOperator ScalarString="GetRangeWithMismatchedTypes([@From],NULL,(22))">
<ScalarOperator ScalarString="GetRangeWithMismatchedTypes([@To],NULL,(22))">

This conversion prevents the optimizer from reasoning correctly about the relationship between ascending partition IDs (covering a range of DateRecorded values in ascending order) and the inequality predicates on DateRecorded.

The partition ID is an implicit leading key for a partitioned index. Normally, the optimizer can see that ordering by partition ID (where ascending IDs map to ascending, disjoint values of DateRecorded) then DateRecorded is the same as ordering by DateRecorded alone (given that MachineryID is constant). This chain of reasoning is broken by the type conversion.

Demo

A simple partitioned table and index:

CREATE PARTITION FUNCTION PF (datetime)
AS RANGE LEFT FOR VALUES ('20160101', '20160201', '20160301');

CREATE PARTITION SCHEME PS AS PARTITION PF ALL TO ([PRIMARY]);

CREATE TABLE dbo.T (c1 integer NOT NULL, c2 datetime NOT NULL) ON PS (c2);

CREATE INDEX i ON dbo.T (c1, c2) ON PS (c2);

INSERT dbo.T (c1, c2) 
VALUES (1, '20160101'), (1, '20160201'), (1, '20160301');

Query with matched types

-- Types match (datetime)
DECLARE 
    @From datetime = '20010101',
    @To datetime = '20090101';

-- Seek with no sort
SELECT T2.c2 
FROM dbo.T AS T2 
WHERE T2.c1 = 1 
AND T2.c2 >= @From
AND T2.c2 < @To
ORDER BY 
    T2.c2;

Query with mismatched types

-- Mismatched types (datetime2 vs datetime)
DECLARE 
    @From datetime2 = '20010101',
    @To datetime2 = '20090101';

-- Merge Interval and Sort
SELECT T2.c2 
FROM dbo.T AS T2 
WHERE T2.c1 = 1 
AND T2.c2 >= @From
AND T2.c2 < @To
ORDER BY 
    T2.c2;

Sql-server – rely on the estimate execution plan to recommend indexes

There's a limitation where it will not give you missing indexes when the query plan is trivial, perhaps that's what you are encountering.

Right click on the SELECT operator and select Properties. Check if the Optimization Level is TRIVIAL.

I use the estimated execution plan a great deal when performance troubleshooting. It's rare that I'll have the actual plan. Regarding missing indexes, I don't just add what the plan thinks is missing. I test the index, variations of it, etc.