SQL Server – Query Shows Sort Cost Even When Required Index is Available

optimizationperformancequery-performancesql server

This question is from SO,iflagged it move to DBA.SE..but i may not see that happening,due to upvotes..so posting here

below is test data :

--Main Table

 CREATE TABLE [dbo].[LogTable]
    (
      [LogID] [int] NOT NULL
                    IDENTITY(1, 1) ,
      [DateSent] [datetime] NULL,
    )
 ON [PRIMARY]
GO
 ALTER TABLE [dbo].[LogTable] ADD CONSTRAINT [PK_LogTable] PRIMARY KEY CLUSTERED  ([LogID]) ON [PRIMARY]
GO
 CREATE NONCLUSTERED INDEX [IX_LogTable_DateSent] ON [dbo].[LogTable] ([DateSent] DESC) ON [PRIMARY]
GO
 CREATE NONCLUSTERED INDEX [IX_LogTable_DateSent_LogID] ON [dbo].[LogTable] ([DateSent] DESC) INCLUDE ([LogID]) ON [PRIMARY]
GO


--Cross table

 CREATE TABLE [dbo].[LogTable_Cross]
    (
      [LogID] [int] NOT NULL ,
      [UserID] [int] NOT NULL
    )
 ON [PRIMARY]
GO
 ALTER TABLE [dbo].[LogTable_Cross] WITH NOCHECK ADD CONSTRAINT [FK_LogTable_Cross_LogTable] FOREIGN KEY ([LogID]) REFERENCES [dbo].[LogTable] ([LogID])
GO
 CREATE NONCLUSTERED INDEX [IX_LogTable_Cross_UserID_LogID]
 ON [dbo].[LogTable_Cross] ([UserID])
 INCLUDE ([LogID])
GO


-- Script to populate them
 INSERT INTO [LogTable]
        SELECT TOP 100000
                DATEADD(day, ( ABS(CHECKSUM(NEWID())) % 65530 ), 0)
        FROM    sys.sysobjects
                CROSS JOIN sys.all_columns


 INSERT INTO [LogTable_Cross]
        SELECT  [LogID] ,
                1
        FROM    [LogTable]
        ORDER BY NEWID()

 INSERT INTO [LogTable_Cross]
        SELECT  [LogID] ,
                2
        FROM    [LogTable]
        ORDER BY NEWID()

 INSERT INTO [LogTable_Cross]
        SELECT  [LogID] ,
                3
        FROM    [LogTable]
        ORDER BY NEWID()


GO

When we use below simple query ,

SELECT  DI.LogID              
FROM    LogTable DI              
        INNER JOIN LogTable_Cross DP ON DP.LogID = DI.LogID  
        WHERE  DP.UserID = 1  
ORDER BY DateSent DESC

query shows sort

enter image description here

As per my understanding,sort cost should be avoided,since we have the required index

 CREATE NONCLUSTERED INDEX [IX_LogTable_DateSent] ON [dbo].[LogTable] ([DateSent] DESC) ON [PRIMARY]

and also digging into plan shows the same index is used..

My questions are :
1.Why sort cost is still present
2.What exactly does ordered property is true means .

Few observations/work i have done :
1.Ordered property is set to false

so i have rewrote query like below

   SELECT  DI.LogID              
    FROM    LogTable DI              
            INNER JOIN LogTable_Cross DP ON DP.LogID = DI.LogID  
            WHERE  DP.UserID = 1  and di.datesent is not null
    ORDER BY DateSent DESC

rewriting like above,makes ordered property to true,but still sort is present

closest,i can find is this article by Paul White:A Tale of Two Index Hints ,in this article,below Point clarifies why this sort occurs

For very large tables, the optimizer may calculate that an IAM-driven scan might save more time than would be consumed by the extra sort, and a plan featuring the unordered scan + sort would be chosen. This is a heuristic optimization: the optimizer knows nothing about the actual fragmentation level of an inde

But this query requires sort and it shouldn't satisfy the conditions for an IAM scan

Let me know ,if you need any more details

Best Answer

In query plans, ordered property set to true means that an IAM-driven scan was not done. The data was read in the logical order according to the index definition.

Hash join does not preserve order. Your first query has a hash join so the explicit SORT is needed at the end. One way to avoid the sort in the query is to do a nested loop join with LogTable as the outer table. On my machine I was able to accomplish this with a variety of hints:

SELECT  DI.LogID              
FROM LogTable DI              
        INNER JOIN LogTable_Cross DP ON DP.LogID = DI.LogID  
        WHERE  DP.UserID = 1  
ORDER BY DateSent DESC
OPTION (MAXDOP 1, LOOP JOIN, FORCE ORDER, NO_PERFORMANCE_SPOOL);

bad query plan

This is a very inefficient query plan because the right index is not defined on the LogTable_Cross table. A seek is done against that table, but only one the UserId column:

bad seek

Filtering on LogID is done in the join itself. I can create an index that better supports the query plan that I'm looking for:

CREATE INDEX IX_LogTable_Cross ON LogTable_Cross (LogID, UserID);

The creation of that index does not guarantee a query plan without a sort. SQL Server may estimate that a plan with a sort has a lower cost. However, if I eliminate all hints except for the LOOP JOIN then I get a reasonably efficient query plan without a sort:

okay query plan