SQL Server CTE Column Overflow – Order By Only

ctesql server

;WITH
cte_Date ( DateCode_FK ) AS (
    SELECT  DATEADD( DAY, 
                1 - ROW_NUMBER() OVER ( 
                        ORDER BY so1.object_id ),
                GETDATE() )
    FROM    sys.objects so1
    CROSS APPLY sys.objects so2 )
SELECT  TOP 10 d.DateCode_FK
FROM    cte_Date d
ORDER BY d.DateCode_FK DESC;

Nothing an overly interesting query, but i'm receiving an error message if I run it with the ORDER BY clause:

Msg 517, Level 16, State 1, Line 4

Adding a value to a 'datetime' column caused an overflow.

However, without the ORDER BY clause, it runs just fine. In addition, if I run the query on other catalogs contained in the same instance on the same server, the query runs fine with or without the ORDER BY clause.

I've taken a look at the configuration options and compatibility levels between the affected catalog and a catalog on which the query runs as expected, but have not found anything that might warrant the difference. Has anybody else run into a similar issue? I can work around it for now, but would ideally need to be able to fix the problem, whatever it is.

Potential hint – if you have a relatively large number of objects in a catalog ( > 5000 ), you -may- be able to reproduce the error… This is occurring on my largest catalog and it appears that if I include a TOP in the CTE, the ORDER BY issue goes away.

Best Answer

SQL Server does not guarantee the timing or number of evaluations for scalar expressions. This means that a query that might throw an error depending on the order of operations in an execution plan might (or might not) do so at runtime.

The script uses CROSS APPLY where CROSS JOIN was probably intended, but the point is that the potential number of rows over which the ROW_NUMBER is calculated depends on the size of the sys.objects cross product.

For a database with sufficient objects, an overflow for the DATEADD result is an expected risk. For example, this is reproducible using the AdventureWorks sample database, which has 637 entries in sys.objects. The size of the cross product is 637 * 637 = 405,769; the following throws an overflow error:

SELECT DATEADD(DAY, 1 - 405769, GETDATE());

One might argue that there is no need to materialize the result of an expression for rows that are not returned (and therefore not throw an error), but that is not the ways things work today.

Consider:

  • The highest ROW_NUMBER will give the lowest value for DateCode_FK in the DATEADD(DAY, 1 - ROW_NUMBER()... expression
  • The presentation ORDER BY is DateCode_FK DESC
  • Only the first 10 rows in presentation order are required

If the optimizer contained logic to reason that lower row numbers lead to higher DateCode_FK values, an explicit Sort would not be needed. Only ten rows would need to flow through the execution plan. The ten lowest row numbers are guaranteed to produce the ten highest DateCode_FK values.

Regardless, even where a Sort is present, the argument is that SQL Server should not throw an error because none of the ten rows actually returned are associated with an overflow error. As I said above, "that is not the ways things work today".


An alternative formulation that avoids the error (though it is still not guaranteed to do so - see my opening remark), makes the row numbering deterministic, and uses CROSS JOIN:

WITH cte_Date (DateCode_FK) AS
(
    SELECT TOP (10)
        DATEADD
        (
            DAY, 
            1 - ROW_NUMBER() OVER ( 
                ORDER BY 
                    so1.[object_id],
                    so2.[object_id]),
            GETDATE()
        )
    FROM sys.objects AS so1
    CROSS JOIN sys.objects AS so2
    ORDER BY
        ROW_NUMBER() OVER (
            ORDER BY 
                so1.[object_id],
                so2.[object_id]) ASC
)
SELECT TOP (10) -- Redundant TOP
    d.DateCode_FK
FROM cte_Date AS d
ORDER BY 
    d.DateCode_FK DESC;

Paste the Plan

Plan

Results