SQL Server CTE Column Overflow – Order By Only

ctesql server

;WITH
cte_Date ( DateCode_FK ) AS (
    SELECT  DATEADD( DAY, 
                1 - ROW_NUMBER() OVER ( 
                        ORDER BY so1.object_id ),
                GETDATE() )
    FROM    sys.objects so1
    CROSS APPLY sys.objects so2 )
SELECT  TOP 10 d.DateCode_FK
FROM    cte_Date d
ORDER BY d.DateCode_FK DESC;

Nothing an overly interesting query, but i'm receiving an error message if I run it with the ORDER BY clause:

Msg 517, Level 16, State 1, Line 4

Adding a value to a 'datetime' column caused an overflow.

However, without the ORDER BY clause, it runs just fine. In addition, if I run the query on other catalogs contained in the same instance on the same server, the query runs fine with or without the ORDER BY clause.

I've taken a look at the configuration options and compatibility levels between the affected catalog and a catalog on which the query runs as expected, but have not found anything that might warrant the difference. Has anybody else run into a similar issue? I can work around it for now, but would ideally need to be able to fix the problem, whatever it is.

Potential hint – if you have a relatively large number of objects in a catalog ( > 5000 ), you -may- be able to reproduce the error… This is occurring on my largest catalog and it appears that if I include a TOP in the CTE, the ORDER BY issue goes away.

Best Answer

SQL Server does not guarantee the timing or number of evaluations for scalar expressions. This means that a query that might throw an error depending on the order of operations in an execution plan might (or might not) do so at runtime.

The script uses CROSS APPLY where CROSS JOIN was probably intended, but the point is that the potential number of rows over which the ROW_NUMBER is calculated depends on the size of the sys.objects cross product.

For a database with sufficient objects, an overflow for the DATEADD result is an expected risk. For example, this is reproducible using the AdventureWorks sample database, which has 637 entries in sys.objects. The size of the cross product is 637 * 637 = 405,769; the following throws an overflow error:

SELECT DATEADD(DAY, 1 - 405769, GETDATE());

One might argue that there is no need to materialize the result of an expression for rows that are not returned (and therefore not throw an error), but that is not the ways things work today.

Consider:

The highest ROW_NUMBER will give the lowest value for DateCode_FK in the DATEADD(DAY, 1 - ROW_NUMBER()... expression
The presentation ORDER BY is DateCode_FK DESC
Only the first 10 rows in presentation order are required

If the optimizer contained logic to reason that lower row numbers lead to higher DateCode_FK values, an explicit Sort would not be needed. Only ten rows would need to flow through the execution plan. The ten lowest row numbers are guaranteed to produce the ten highest DateCode_FK values.

Regardless, even where a Sort is present, the argument is that SQL Server should not throw an error because none of the ten rows actually returned are associated with an overflow error. As I said above, "that is not the ways things work today".

An alternative formulation that avoids the error (though it is still not guaranteed to do so - see my opening remark), makes the row numbering deterministic, and uses CROSS JOIN:

WITH cte_Date (DateCode_FK) AS
(
    SELECT TOP (10)
        DATEADD
        (
            DAY, 
            1 - ROW_NUMBER() OVER ( 
                ORDER BY 
                    so1.[object_id],
                    so2.[object_id]),
            GETDATE()
        )
    FROM sys.objects AS so1
    CROSS JOIN sys.objects AS so2
    ORDER BY
        ROW_NUMBER() OVER (
            ORDER BY 
                so1.[object_id],
                so2.[object_id]) ASC
)
SELECT TOP (10) -- Redundant TOP
    d.DateCode_FK
FROM cte_Date AS d
ORDER BY 
    d.DateCode_FK DESC;

Paste the Plan

Related Solutions

Sql-server – SQL Select taking too much time to execute

I think in a million records query, you have to avoid things like OUTER JOINS. I suggest you use UNION ALL Instead of LEFT JOIN. As long as I think CROSS APPLY is more efficient than sub-query in the select clause I will modify the query written by Conard Frix, which I think is correct.

now: when I started to modify your query I noticed that you have a WHERE clause saying: JoinedTable.WhereColumn IN (1, 3). in this case, if the field is null the condition will become false. then why are you using LEFT JOIN while you are filtering null valued rows? just replace LEFT JOIN With INNER JOIN, I guarantee that it will become faster.

about INDEX:

please note that when you have an index on a table, say

table1(a int, b nvarchar)

and your index is :

nonclustered index ix1 on table1(a)

and you want to do something like this:

select a,b from table1
where a < 10

in your index you have not included the column b so what happens?

if sql-server uses your index, it will have to search in the index, called "Index Seek" and then refer to main table to get column b, called "Look Up". This procedure might take much longer than scanning the table itself: "Table Scan".

but based on the statistics that sql-server has, in such situations, it might not use your index at all.

so first of all check the Execution Plan to see if the index is used at all.

if yes or no both, alter your index to include all columns that you are selecting. say like:

nonclustered index ix1 on table1(a) include(b)

in this case Look Up will not be needed, and your query will execute so much faster.

Sql-server – Optimizing a CTE hierarchy

Edit: this is second attempt

Based on @Hannah Vernon's answer, here is a way to bypass the use of CTE inside an inline subquery, which is like self-joining the CTE and I presume is the reason for the poor efficiency. It uses analytic functions available only in the 2012 version of SQL-Server. Tested at SQL-Fiddle

This part can be skipped from reading, it's a copy-paste from Hannah's answer:

;With AccountHierarchy AS
(                                                                           
    SELECT
        Root.AcctId                                         AccountId
        , Root.Name                                         AccountName
        , Root.ParentId                                     ParentId
        , 1                                                 HierarchyLevel  
        , cast(Root.Acctid as varchar(4000))                IdHierarchyMatch        
        , cast(Root.Acctid as varchar(4000))                IdHierarchy
        , cast(replace(Root.Name,'.','') as varchar(4000))  NameHierarchy   
        , cast(Root.Acctid as varchar(4000))                HierarchySort
        , cast(Root.Name as varchar(4000))                  HierarchyLabel          ,
        Root.CustomerCount                                  CustomerCount   

    FROM 
        account Root

    WHERE
        Root.ParentID is null

    UNION ALL

    SELECT
        Recurse.Acctid                                      AccountId
        , Recurse.Name                                      AccountName
        , Recurse.ParentId                                  ParentId
        , Root.HierarchyLevel + 1                           HierarchyLevel
        , CAST(CAST(Root.IdHierarchyMatch as varchar(40)) + '.' 
            + cast(recurse.Acctid as varchar(40))   as varchar(4000))   IdHierarchyMatch
        , cast(cast(recurse.Acctid as varchar(40)) + '.' 
            + Root.IdHierarchy  as varchar(4000))           IdHierarchy
        , cast(replace(recurse.Name,'.','') + '.' 
            + Root.NameHierarchy as varchar(4000))          NameHierarchy
        , cast(Root.AccountName + '.' 
            + Recurse.Name as varchar(4000))                HierarchySort   
        , cast(space(root.HierarchyLevel * 4) 
            + Recurse.Name as varchar(4000))                HierarchyLabel
        , Recurse.CustomerCount                             CustomerCount
    FROM
        account Recurse INNER JOIN
        AccountHierarchy Root on Root.AccountId = Recurse.ParentId
)

Here we order the rows of the CTE using the IdHierarchyMatch and we calculate row numbers and a running total (from the next row up to the end.)

, cte1 AS 
(
SELECT
    h.AccountId
    , h.AccountName
    , h.ParentId
    , h.HierarchyLevel
    , h.IdHierarchy
    , h.NameHierarchy
    , h.HierarchyLabel
    , parsename(h.IdHierarchy,1) Acct1Id
    , parsename(h.NameHierarchy,1) Acct1Name
    , parsename(h.IdHierarchy,2) Acct2Id
    , parsename(h.NameHierarchy,2) Acct2Name
    , parsename(h.IdHierarchy,3) Acct3Id
    , parsename(h.NameHierarchy,3) Acct3Name
    , parsename(h.IdHierarchy,4) Acct4Id
    , parsename(h.NameHierarchy,4) Acct4Name
    , h.CustomerCount
    , h.HierarchySort
    , h.IdHierarchyMatch
        , Rn = ROW_NUMBER() OVER 
                  (ORDER BY h.IdHierarchyMatch)
        , RunningCustomerCount = COALESCE(
            SUM(h.CustomerCount)
            OVER
              (ORDER BY h.IdHierarchyMatch
               ROWS BETWEEN 1 FOLLOWING
                        AND UNBOUNDED FOLLOWING)
          , 0) 
FROM
    AccountHierarchy AS h  
)

Then we have one more intermediate CTE where we use the previous running totals and row numbers - basically to find where the end points for the branches of the tree structure:

, cte2 AS
(
SELECT
    cte1.*
    , rn3  = LAST_VALUE(Rn) OVER 
               (PARTITION BY Acct1Id, Acct2Id, Acct3Id 
                ORDER BY Acct4Id
                ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)       
    , rn2  = LAST_VALUE(Rn) OVER 
               (PARTITION BY Acct1Id, Acct2Id 
                ORDER BY Acct3Id, Acct4Id
                ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) 
    , rn1  = LAST_VALUE(Rn) OVER 
               (PARTITION BY Acct1Id 
                ORDER BY Acct2Id, Acct3Id, Acct4Id
                ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) 
    , rcc3 = LAST_VALUE(RunningCustomerCount) OVER 
               (PARTITION BY Acct1Id, Acct2Id, Acct3Id 
                ORDER BY Acct4Id
                ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)       
    , rcc2 = LAST_VALUE(RunningCustomerCount) OVER 
               (PARTITION BY Acct1Id, Acct2Id 
                ORDER BY Acct3Id, Acct4Id
                ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) 
    , rcc1 = LAST_VALUE(RunningCustomerCount) OVER 
               (PARTITION BY Acct1Id 
                ORDER BY Acct2Id, Acct3Id, Acct4Id
                ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) 
FROM
    cte1 
)

and finally we build the last part:

SELECT
    hier.AccountId
    , hier.AccountName
    ---                        -- columns skipped 
    , hier.CustomerCount

    , TotalCustomerCount = hier.CustomerCount
        + hier.RunningCustomerCount 
        - ca.LastRunningCustomerCount

    , hier.HierarchySort
    , hier.IdHierarchyMatch
FROM
    cte2 hier
  OUTER APPLY
    ( SELECT  LastRunningCustomerCount, Rn
      FROM
      ( SELECT LastRunningCustomerCount
              = RunningCustomerCount, Rn
        FROM (SELECT NULL a) x  WHERE 4 <= HierarchyLevel 
      UNION ALL
        SELECT rcc3, Rn3
        FROM (SELECT NULL a) x  WHERE 3 <= HierarchyLevel 
      UNION ALL
        SELECT rcc2, Rn2 
        FROM (SELECT NULL a) x  WHERE 2 <= HierarchyLevel 
      UNION ALL
        SELECT rcc1, Rn1
        FROM (SELECT NULL a) x  WHERE 1 <= HierarchyLevel 
      ) x
      ORDER BY Rn 
      OFFSET 0 ROWS
      FETCH NEXT 1 ROWS ONLY
      ) ca
ORDER BY
    hier.HierarchySort ;

And a simplification, using the same cte1 as the code above. Test at SQL-Fiddle-2. Please note that both solutions work under the assumption that you have a maximum of four levels in your tree:

SELECT
    hier.AccountId
    ---                      -- skipping rows
    , hier.CustomerCount

    , TotalCustomerCount = CustomerCount
        + RunningCustomerCount 
        - CASE HierarchyLevel
            WHEN 4 THEN RunningCustomerCount
            WHEN 3 THEN LAST_VALUE(RunningCustomerCount) OVER 
                   (PARTITION BY Acct1Id, Acct2Id, Acct3Id 
                    ORDER BY Acct4Id
                    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)       
            WHEN 2 THEN LAST_VALUE(RunningCustomerCount) OVER 
                   (PARTITION BY Acct1Id, Acct2Id 
                    ORDER BY Acct3Id, Acct4Id
                    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) 
            WHEN 1 THEN LAST_VALUE(RunningCustomerCount) OVER 
                   (PARTITION BY Acct1Id 
                    ORDER BY Acct2Id, Acct3Id, Acct4Id
                    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) 
          END

    , hier.HierarchySort
    , hier.IdHierarchyMatch
FROM cte1 AS hier
ORDER BY
    hier.HierarchySort ;

A third approach, with only one CTE, for the recursive part and then only window aggregate functions (SUM() OVER (...)), so it should work in any version from 2005 upwards. Test at SQL-Fiddle-3 This solution assumes, like the previous ones, that there are 4 levels maximum in the hierarchy tree:

;WITH AccountHierarchy AS
(                                                                           
    SELECT
          AccountId      = Root.AcctId                                         
        , AccountName    = Root.Name                                         
        , ParentId       = Root.ParentId                                     
        , HierarchyLevel = 1                                                   
        , HierarchySort  = CAST(Root.Acctid AS VARCHAR(4000))                
        , HierarchyLabel = CAST(Root.Name AS VARCHAR(4000))                   
        , Acct1Id        = CAST(Root.Acctid AS VARCHAR(4000))                
        , Acct2Id        = CAST(NULL AS VARCHAR(4000))                       
        , Acct3Id        = CAST(NULL AS VARCHAR(4000))                       
        , Acct4Id        = CAST(NULL AS VARCHAR(4000))                       
        , Acct1Name      = CAST(Root.Name AS VARCHAR(4000))                  
        , Acct2Name      = CAST(NULL AS VARCHAR(4000))                       
        , Acct3Name      = CAST(NULL AS VARCHAR(4000))                       
        , Acct4Name      = CAST(NULL AS VARCHAR(4000))                       
        , CustomerCount  = Root.CustomerCount                                   

    FROM 
        account AS Root

    WHERE
        Root.ParentID IS NULL

    UNION ALL

    SELECT
          Recurse.Acctid 
        , Recurse.Name 
        , Recurse.ParentId 
        , Root.HierarchyLevel + 1 
        , CAST(Root.AccountName + '.' 
            + Recurse.Name AS VARCHAR(4000)) 
        , CAST(SPACE(Root.HierarchyLevel * 4) 
            + Recurse.Name AS VARCHAR(4000)) 
        , Root.Acct1Id 
        , CASE WHEN Root.HierarchyLevel = 1 
              THEN cast(Recurse.Acctid AS VARCHAR(4000)) 
              ELSE Root.Acct2Id 
          END 
        , CASE WHEN Root.HierarchyLevel = 2 
              THEN CAST(Recurse.Acctid AS VARCHAR(4000)) 
              ELSE Root.Acct3Id 
          END 
        , CASE WHEN Root.HierarchyLevel = 3 
              THEN CAST(Recurse.Acctid AS VARCHAR(4000)) 
              ELSE Root.Acct4Id 
          END 
 
        , cast(Root.AccountName as varchar(4000))          
        , CASE WHEN Root.HierarchyLevel = 1 
              THEN CAST(Recurse.Name AS VARCHAR(4000)) 
              ELSE Root.Acct2Name 
          END 
        , CASE WHEN Root.HierarchyLevel = 2 
              THEN CAST(Recurse.Name AS VARCHAR(4000)) 
              ELSE Root.Acct3Name 
          END 
        , CASE WHEN Root.HierarchyLevel = 3 
              THEN CAST(Recurse.Name AS VARCHAR(4000)) 
              ELSE Root.Acct4Name 
          END 
        , Recurse.CustomerCount 
    FROM 
        account AS Recurse INNER JOIN 
        AccountHierarchy AS Root ON Root.AccountId = Recurse.ParentId
)

SELECT
      h.AccountId
    , h.AccountName
    , h.ParentId
    , h.HierarchyLevel
    , IdHierarchy = 
          CAST(COALESCE(h.Acct4Id+'.','') 
               + COALESCE(h.Acct3Id+'.','') 
               + COALESCE(h.Acct2Id+'.','') 
               + h.Acct1Id AS VARCHAR(4000))
    , NameHierarchy = 
          CAST(COALESCE(h.Acct4Name+'.','') 
               + COALESCE(h.Acct3Name+'.','') 
               + COALESCE(h.Acct2Name+'.','') 
               + h.Acct1Name AS VARCHAR(4000))   
    , h.HierarchyLabel
    , h.Acct1Id
    , h.Acct1Name
    , h.Acct2Id
    , h.Acct2Name
    , h.Acct3Id
    , h.Acct3Name
    , h.Acct4Id
    , h.Acct4Name
    , h.CustomerCount
    , TotalCustomerCount =  
          CASE h.HierarchyLevel
            WHEN 4 THEN h.CustomerCount
            WHEN 3 THEN SUM(h.CustomerCount) OVER 
                   (PARTITION BY h.Acct1Id, h.Acct2Id, h.Acct3Id)       
            WHEN 2 THEN SUM(h.CustomerCount) OVER 
                   (PARTITION BY Acct1Id, h.Acct2Id) 
            WHEN 1 THEN SUM(h.CustomerCount) OVER 
                   (PARTITION BY h.Acct1Id) 
          END
    , h.HierarchySort
    , IdHierarchyMatch = 
          CAST(h.Acct1Id 
               + COALESCE('.'+h.Acct2Id,'') 
               + COALESCE('.'+h.Acct3Id,'') 
               + COALESCE('.'+h.Acct4Id,'') AS VARCHAR(4000))   
FROM
    AccountHierarchy AS h  
ORDER BY
    h.HierarchySort ;

A 4th approach, that calculates as an intermediate CTE, the closure table of the hierarchy. Test at SQL-Fiddle-4. The benefit is that for the sums calculations, there is no resctriction on the number of levels.

;WITH AccountHierarchy AS
( 
    -- skipping several line, identical to the 3rd approach above
)

, ClosureTable AS
( 
    SELECT
          AccountId      = Root.AcctId  
        , AncestorId     = Root.AcctId  
        , CustomerCount  = Root.CustomerCount 
    FROM 
        account AS Root

    UNION ALL

    SELECT
          Recurse.Acctid 
        , Root.AncestorId 
        , Recurse.CustomerCount
    FROM 
        account AS Recurse INNER JOIN 
        ClosureTable AS Root ON Root.AccountId = Recurse.ParentId
)

, ClosureGroup AS
(                                                                           
    SELECT
          AccountId           = AncestorId  
        , TotalCustomerCount  = SUM(CustomerCount)                             
    FROM 
        ClosureTable AS a
    GROUP BY
        AncestorId
)

SELECT
      h.AccountId
    , h.AccountName
    , h.ParentId
    , h.HierarchyLevel 
    , h.HierarchyLabel
    , h.CustomerCount
    , cg.TotalCustomerCount 

    , h.HierarchySort
FROM
    AccountHierarchy AS h  
  JOIN
    ClosureGroup AS cg
      ON cg.AccountId = h.AccountId
ORDER BY
    h.HierarchySort ;

Best Answer

Related Solutions

Sql-server – SQL Select taking too much time to execute

Sql-server – Optimizing a CTE hierarchy

Related Question