SQL Server OVER Clause – Explanation and Usage

sql servert-sqlwindow functions

I saw a concise TSQL statement that effectively splits a string into its constituent characters, one per line, for the purpose of evaluating the ascii value on each character.

If I am reading the query correctly, effectively, 3 CTEs are being used to prepare a table of 1 column containing 10,000 rows, each with the value '0'.

A fourth CTE is defined as follows:

cteTally(n) AS(
    SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) n 
    FROM E4
)

Subsequently, this CTE is joined to a table containing a column with the strings of interest, with the following select:

SELECT n, SUBSTRING(LastName, n, 1), ASCII( SUBSTRING(LastName, n, 1))

That is, row number n, then the nth character in LastName, then the ascii value of that character.

My questions relate to the over clause in the CTE above.

Essentially, what exactly is it doing?

If we are querying row_number from 10,000 identical rows, why do we need an order by clause at all? Why is the order by put into an over clause rather than as an order by clause for the select statement – especially as the over clause isn't even specifying any partition? (I presume this means the window over which row_number operates is the full 10,000 rows?) And what does it mean to order by select null?

Best Answer

ROW_NUMBER() is a ranking window function, and ranking window functions require a mandatory ORDER BY clause. If you try to write it without the ORDER BY you will get a syntax error.

SELECT ROW_NUMBER() OVER()
FROM (VALUES ('A'), ('B'), ('C')) AS X(Y);
-- Msg 4112, Level 15, State 1, Line 1 
-- The function 'ROW_NUMBER' must have an OVER clause with ORDER BY.

The trick with the subquery was discovered by someone who blogged about it, as a performance optimization. SQL Server always performs a sort operation since constants are not allowed for the ORDER BY clause:

SELECT ROW_NUMBER() OVER(ORDER BY NULL)
FROM (VALUES ('A'), ('B'), ('C')) AS X(Y);
-- Msg 5309, Level 16, State 1, Line 1
-- Windowed functions, aggregates and NEXT VALUE FOR functions 
-- do not support constants as ORDER BY clause expressions.

And neither are integers which are treated as indices:

SELECT ROW_NUMBER() OVER(ORDER BY 1)
FROM (VALUES ('A'), ('B'), ('C')) AS X(Y);
-- Msg 5308, Level 16, State 1, Line 1
-- Windowed functions, aggregates and NEXT VALUE FOR functions 
-- do not support integer indices as ORDER BY clause expressions.

Turns out that due to some glitch in the code, you can circumvent this limitation by using a subquery, which for some reason is allowed, and eliminates the sort operator.

SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 1))
FROM (VALUES ('A'), ('B'), ('C')) AS X(Y);

No Sort Operator

You can use any constant in the sub query, the NULL is probably a reminiscent of the habit of using SELECT NULL within EXISTS predicates, which in the early history of SQL Server had a performance impact as opposed to * or any other column expression, as the optimizer wasn't smart enough to ignore it.

HTH

UPDATE @Erik-Darling commented that you can also circumvent it by using a computed expressions:

You can do SELECT ROW_NUMBER() OVER (ORDER BY 1/0);