SQL Server – Rank Function Creating Duplicate Partitions

ranksql serversql-server-2012window functions

The below query is outputting two rows. Given that I am partitioning on ref_number and also have ref_number in the where clause, I believe I should only ever get a single row back.

WITH rank_cte AS (SELECT  ref_number, RANK()
        OVER(
            PARTITION BY
                ref_number
            ORDER BY
                Logged_Date DESC
        ) AS "Rank1"
From my_table )

SELECT * FROM rank_cte
WHERE Rank1= 1 and ref_number = 'abcd'

What is likely to have caused this? SQL Server must think they are similar enough for the where clause to return both rows, but different enough to partition them.

Best Answer

Found the problem, there were ties in the rank ordering, hence why multiple rows are returned with the same rank id.

The solution will be for me to find an ordering key that has unique combinations of partition and order key.

Implied Index Keys

Your index is on id DESC, timeSampled DESC. In SQL Server 2008 and later, partitioning introduces an extra implied leading key on $partition ASC (it is always ascending, it is not configurable) making the full index key $partition ASC, id DESC, timeSampled DESC.

Since id and timeSampled increase together (though there is nothing in the schema to guarantee this) you could rewrite the query as TOP (1) ... ORDER BY $partition DESC, id DESC. Unfortunately, the DESC keys on your index and ASC implied leading key $partition means the index could not be used to scan just one row from the index in order.

If your index keys were instead id ASC, timeSampled ASC the whole index key would be $partition ASC, id ASC, timeSampled ASC. This all-ASC index could be scanned backward, returning just the first row in key order. This row would be guaranteed to have the highest id value in the highest-numbered partition. Given the (unenforced) relationship between id and partition id, this would produce the correct result with an optimal execution plan that reads just a single row.

This 'solution' lacks integrity because the id-timeSampled relationship is not enforced, and you probably do not want to rebuild the nonclustered primary key anyway. Nevertheless, I mention it because it may enhance your understanding of how partitioning interacts with indexes.

SQL Server Execution Plan – SHOWPLAN vs ‘Include Execution Plan’ Warnings for the Same Query

This:

SET SHOWPLAN_XML ON;
GO
SELECT * FROM sys.objects;
GO

Is equivalent to pressing Display Estimated Execution Plan on the toolbar (or hitting Ctrl + L). You'll notice that no rows are returned from the query, like there is when you use Include Actual Execution Plan (Ctrl + M).

The spill warning is only a runtime warning. There is no way that SQL Server can know, when displaying the estimated plan, that a spill will happen at runtime. This is because a spill is caused by factors that might only be present during certain invocations of the query (for example, when there is memory pressure). The estimated plan knows roughly how much memory it's going to ask for, but it can't know until execution that it isn't going to get it.

As an aside, may I recommend* our free tool, SQL Sentry Plan Explorer? I think it provides much more obvious information than Management Studio. I recently wrote a lengthy blog post that can act as a tutorial, and Jonathan Kehayias has a great PluralSight course on it as well.

_{* Disclaimer: I work for SQL Sentry.}

Best Answer

Related Solutions

Sql-server – Why does selecting top 1 from composite index DESC also used to partition by month not select the top value

Implied Index Keys

SQL Server Execution Plan – SHOWPLAN vs ‘Include Execution Plan’ Warnings for the Same Query

Related Question