Sql-server – Storage order vs Result order

clustered-indexexecution-plansortingsql server

This is a spin-off question from Sort order specified in primary key, yet sorting is executed on SELECT.

@Catcall says this on the subject of storage order (clustered index) and the output order

A lot of people believe that a clustered index guarantees a sort order on output. But that's not what it does; it guarantees a storage order on disk.
See, for example, this blog post.

I've read the blog post by Hugo Kornelis and understands that an index doesn't guarantee that the sql server reads the records in a specific order. Yet I have a hard time accepting that I can't assume this for my scenario?

CREATE TABLE [dbo].[SensorValues](
  [DeviceId] [int] NOT NULL,
  [SensorId] [int] NOT NULL,
  [SensorValue] [int] NOT NULL,
  [Date] [int] NOT NULL,
CONSTRAINT [PK_SensorValues] PRIMARY KEY CLUSTERED 
(
  [DeviceId] ASC,
  [SensorId] ASC,
  [Date] DESC
) WITH (
    FILLFACTOR=75,
    DATA_COMPRESSION = PAGE,
    PAD_INDEX = OFF,
    STATISTICS_NORECOMPUTE = OFF,
    SORT_IN_TEMPDB = OFF,
    IGNORE_DUP_KEY = OFF,
    ONLINE = OFF,
    ALLOW_ROW_LOCKS = ON,
    ALLOW_PAGE_LOCKS = ON)
  ON [MyPartitioningScheme]([Date])

My original query was this:

SELECT TOP 1 SensorValue
  FROM SensorValues
  WHERE SensorId = 53
    AND DeviceId = 3819
    AND Date < 1339225010
  ORDER BY Date DESC

But I suggest that I could as well use this one (read below for my explanation):

SELECT TOP 1 SensorValue
  FROM SensorValues
  WHERE SensorId = 53
    AND DeviceId = 3819
    AND Date < 1339225010

As you can see, my table rows are small (16bytes) and I've got only one index, a clustered. In my scenario, the table consists of 100.000.000 records at this moment (and this will most likely increase tenfold).

When the database server queries this table it has two ways of finding my rows, either it seeks the primary key and thereby reading and returning my values in desc. order of Date, or it has to do a full table scan. My conclusion is that a full table scan on all those records will be way too slow and the database server will therefore always seek the table via its primary key and thereby returning the values sorted by Date DESC

Best Answer

Let me try to explain why you should not do that, why you should never assume that an SQL-product will return a result set in a specific order, unless you specify so, whatever indices - clustered or non-clustered, B-trees or R-Trees or k-d-trees or fractal-trees or whatever other exotic indices a DBMS is using.

Your original query tells to the DBMS to search the SensorValues table, find rows that match the 3 conditions, order those rows by Date descending, keep only the first row from those and - finally - select and return only the SensorValue column.

SELECT TOP 1 SensorValue
  FROM SensorValues
  WHERE SensorId = 53
    AND DeviceId = 3819
    AND Date < 1339225010
  ORDER BY Date DESC ;

These are very specific orders you have given to the DBMS and the result will most probably be the same every time you run the query (there is a chance it might not, if you have more than one row that match the conditions and have the same max Date but different SensorValue but lets assume for the rest of the conversation that no such rows exist in your table).

Does the DBMS have to do this, to run this query, the exact way I describe it above? No, of course not and you know that. It may not read the table but read from an index. Or it may use two indexes if it thinks it's better (faster). Or three. Or it may use a cached result (not SQL Server but other DBMS cache query results). Or it may use parallel execution one time and not the next time it runs. Or ... (add any other feature that affects execution and execution plans).

What is guaranteed though is that it will return the exact same result, every time you run it - as long as no rows are inserted, deleted or updated.

Now lets see what your suggestion says:

SELECT TOP 1 SensorValue
  FROM SensorValues
  WHERE SensorId = 53
    AND DeviceId = 3819
    AND Date < 1339225010 ;

This query tells to the DBMS to search the SensorValues table, find rows that match the 3 conditions, ~~order those rows by Date descending,~~, don't care about the order, keep only one row and - finally - select and return only the SensorValue column.

So, it basically tells the same as the first one, except that it tells that you want one result only that matches the conditions and you don't care which one.

Now, can we assume that it will give always the same result because of the clustered index?
- If it does use this clustered index every time, yes.

But will it use it?
- No.

Why not?
- Beacuse it can. The query optimizer is free to choose a path of execution every time it runs a statement. Whatever path it sees fit at that time for that statement.

But isn't using the clustered index the best/fastest way to get results?
- No, not always. It might be the first time you run the query. The second time, it may use a cached result (if the DBMS has such a feature, not SQL Server^*). The 1000th time the result may have been removed from the cache and another result may exist there. Say, you had executed this query just before:

SELECT TOP 1 SensorValue
  FROM SensorValues
  WHERE SensorId = 53
    AND DeviceId = 3819
    AND Date < 1339225010
  ORDER BY Date ASC ;         --- Notice the `ASC` here

and the cached result (from the above query) is another, different one that still matches your conditions but is not the first in your (wanted) ordering. And you have told the DBMS not to care about the order.

OK, so only cache can affect this?
- No, many other things, too.

other indexes were considered, at that time by the DBMS as better for this query.
a developer changed or completely removed this clustered index you had.
you or some other developer added another index that the optimizer decided it's more efficient to use than the CI.
you updated to a new version and the new optimizer has a minor bug or a change in how it ranks and chooses execution plans.
statistics were updated.
parallel execution was chosen instead.

^{*: SQL Server does not cache query results but the Enterprise Edition does have an Advanced Scanning feature which is kind of similar in that you may get different results because of concurrent queries. Not sure exactly when this kicks in though. (thnx @Martin Smith for the tip.)}

I hope you are convinced that you should never rely that an SQL query will return results in a specific order, unless you specify so. And never use TOP (n) without ORDER BY, unless of course you just want n rows in the result and you don't care which ones are returned.

Related Solutions

SQL Server Sorting – Sort Order in Primary Key Yet Sorting Executed on SELECT

For a non partitioned table I get the following plan

Plan 1

There is a single seek predicate on Seek Keys[1]: Prefix: DeviceId, SensorId = (3819, 53), Start: Date < 1339225010.

Meaning that SQL Server can perform an equality seek on the first two columns and then begin a range seek starting at 1339225010 and ordered FORWARD (as the index is defined with [Date] DESC)

The TOP operator will stop requesting more rows from the seek after the first row is emitted.

When I create the partition scheme and function

CREATE PARTITION FUNCTION PF (int)
AS RANGE LEFT FOR VALUES (1000, 1339225009 ,1339225010 , 1339225011);
GO
CREATE PARTITION SCHEME [MyPartitioningScheme]
AS PARTITION PF
ALL TO ([PRIMARY] );

And populate the table with the following data

INSERT INTO [dbo].[SensorValues]    
/*500 rows matching date and SensorId, DeviceId predicate*/
SELECT TOP (500) 3819,53,1, ROW_NUMBER() OVER (ORDER BY (SELECT 0))           
FROM master..spt_values
UNION ALL
/*700 rows matching date but not SensorId, DeviceId predicate*/
SELECT TOP (700) 3819,52,1, ROW_NUMBER() OVER (ORDER BY (SELECT 0))           
FROM master..spt_values
UNION ALL 
/*1100 rows matching SensorId, DeviceId predicate but not date */
SELECT TOP (1100) 3819,53,1, ROW_NUMBER() OVER (ORDER BY (SELECT 0)) + 1339225011      
FROM master..spt_values

The plan on SQL Server 2008 looks as follows.

Plan 2

The actual number of rows emitted from the seek is 500. The plan shows seek predicates

Seek Keys[1]: Start: PtnId1000 <= 2, End: PtnId1000 >= 1, 
Seek Keys[2]: Prefix: DeviceId, SensorId = (3819, 53), Start: Date < 1339225010

Indicating it is using the skip scan approach described here

the query optimizer is extended so that a seek or scan operation with one condition can be done on PartitionID (as the logical leading column) and possibly other index key columns, and then a second-level seek, with a different condition, can be done on one or more additional columns, for each distinct value that meets the qualification for the first-level seek operation.

This plan is a serial plan and so for the specific query you have it seems that if SQL Server ensured that it processed the partitions in descending order of date that the original plan with the TOP would still work and it could stop processing after the first matching row was found rather than continuing on and outputting the remaining 499 matches.

In fact the plan on 2005 looks like it does take that approach

Plan on 2005

I'm not sure if it is straight forward to get the same plan on 2008 or maybe it would need an OUTER APPLY on sys.partition_range_values to simulate it.

SQL Server Hierarchical Order – Parent-Child Tree Hierarchical ORDER BY

OK, enough brain cells are dead.

SQL Fiddle

WITH cte AS
(
  SELECT 
    [ICFilterID], 
    [ParentID],
    [FilterDesc],
    [Active],
    CAST(0 AS varbinary(max)) AS Level
  FROM [dbo].[ICFilters]
  WHERE [ParentID] = 0
  UNION ALL
  SELECT 
    i.[ICFilterID], 
    i.[ParentID],
    i.[FilterDesc],
    i.[Active],  
    Level + CAST(i.[ICFilterID] AS varbinary(max)) AS Level
  FROM [dbo].[ICFilters] i
  INNER JOIN cte c
    ON c.[ICFilterID] = i.[ParentID]
)

SELECT 
  [ICFilterID], 
  [ParentID],
  [FilterDesc],
  [Active]
FROM cte
ORDER BY [Level];

Best Answer

Related Solutions

SQL Server Sorting – Sort Order in Primary Key Yet Sorting Executed on SELECT

SQL Server Hierarchical Order – Parent-Child Tree Hierarchical ORDER BY

Related Question