Sql-server – Are RANK() and DENSE_RANK() deterministic or non-deterministic

sql server

According to official Microsoft BOL DENSE_RANK is nondeterministic (RANK()). But according to Ranking Functions by Itzik Ben-Gan "… the RANK() and DENSE_RANK() functions are always deterministic". Who is right?

What I have found so far:
Microsoft's Definition "Deterministic functions always return the same result any time they are called with a specific set of input values and given the same state of the database."

So in Set theory tables
Employees

Employee            Salary
Sue Right            1.00
Robin Page           1.00
Phil Factor          1.00

and
Employees2

Employee            Salary
Phil Factor          1.00
Sue Right            1.00
Robin Page           1.00

are the same. But Ranking functions return different values:

    CREATE TABLE [dbo].[Employees](
    --[ID] [int] IDENTITY(1,1) NOT NULL,
    [Employee] [varchar](150) NOT NULL,
    [Salary] [smallmoney] NULL,
) ON [PRIMARY]

GO
CREATE TABLE [dbo].[Employees2](
    --[ID] [int] IDENTITY(1,1) NOT NULL,
    [Employee] [varchar](150) NOT NULL,
    [Salary] [smallmoney] NULL,
) ON [PRIMARY]

INSERT INTO [dbo].[Employees]
([Employee] ,[Salary])
VALUES
('Sue Right', 1)
, ('Robin Page', 1)
,('Phil Factor', 1 )
GO
INSERT INTO [dbo].[Employees2]
([Employee] ,[Salary])
VALUES
('Phil Factor', 1 )
,('Sue Right', 1)
,('Robin Page', 1)
GO
SELECT RANK() OVER ( ORDER BY Salary) AS [Rank]
, DENSE_RANK() OVER (ORDER BY Salary ) AS [Dense_rank]
, [Employee]
FROM
dbo.Employees

SELECT RANK() OVER ( ORDER BY Salary) AS [Rank]
, DENSE_RANK() OVER (ORDER BY Salary ) AS [Dense_rank]
, [Employee]
FROM
dbo.Employees2

SELECT NTILE(3) OVER ( ORDER BY SALARY )
, [Employee]
FROM
dbo.Employees

SELECT NTILE(3) OVER ( ORDER BY SALARY )
, [Employee]
FROM
dbo.Employees2

Best Answer

According to official Microsoft BOL DENSE_RANK is nondeterministic (RANK()). But according to Ranking Functions by Itzik Ben-Gan "... the RANK() and DENSE_RANK() functions are always deterministic". Who is right?

They are both right, because they are using different senses of the word "deterministic".

From the SQL Server optimizer's point of view, "deterministic" has a very precise meaning; a meaning that existed before window and ranking functions were added to the product. To the optimizer, the "deterministic" property defines whether a function can be freely duplicated within its internal tree structures during optimization. This is not legal for a non-deterministic function.

Deterministic here means: the exact instance of the function always returns the same output for the same input, no matter how many times it is called. This is never true for windowing functions, by definition, because as a (single-row) scalar function, they do not return the same result within a row or across rows. To state it simply, using ROW_NUMBER as an example:

The ROW_NUMBER function returns different values for different rows (by definition!), so for optimization purposes it is nondeterministic

This is the sense BOL is using.

Itzik is making a different point about the determinism of the result as a whole. Over an ordered input set (with suitable tie-breaking) the output is a "deterministic" sequence. That is a valid observation, but it is not the "deterministic" quality that is important during query optimization.