Sql-server – COUNT(*) OVER ()

sql serversql-server-2008window functions

I have a COUNT(*) OVER () as part of my query to fetch a large amount of data with about 15 joins to tables (some big, some small). What's the best solution for count here?

Separate query to find count.
COUNT(*) OVER () as part of the query that fetches the data (since it is going to calculate count for each row?!)
Or a query to get count union the query to fetch data. (Of course, I will have to put null for all the other columns the data query is going to fetch.)

Best Answer

COUNT(*) OVER () is one of those operations that sounds like it ought to be cheap for the query optimizer. After all, SQL Server already knows how many rows are returned by the query. You're just asking it to project that value into the result set of the query. Unfortunately, COUNT(*) OVER () can be an expensive operation depending on the query that you're adding it to.

For a simple test I'll put a moderate amount of testing into a table. Really any table will do, but here is my sample data for those following along at home:

SELECT 
  CAST(REPLICATE('A', 100) AS VARCHAR(100)) ID1
, CAST(REPLICATE('Z', 100) AS VARCHAR(100)) ID2
  INTO #JUNK_DATA
FROM master..spt_values t1
CROSS JOIN master..spt_values t2;

This query takes 2.5 seconds when discarding result sets:

SELECT ID1, ID2 
FROM #JUNK_DATA;

This query takes 2.7 seconds when run serially but can parallelize:

SELECT COUNT(*)
FROM #JUNK_DATA
OPTION (MAXDOP 1);

The query with the COUNT aggregate query takes 47.2 seconds to complete on my machine.

SELECT ID1, ID2, COUNT(*) OVER () CNT
FROM #JUNK_DATA;

Perhaps my machine has issues but it's definitely doing more work. Here's the query plan:

SQL Server is loading all of the column values into a table spool then reading that data twice and joining it together. Why is it doing all of that work when it already knows how many rows are in the result set?

SQL Server query plans process data one row at a time (more or less) in a streaming fashion. So if we calculated the number of rows as it went it might look like this:

╔═════╦═════╦══════════════╗
║ ID1 ║ ID2 ║ COUNT_SO_FAR ║
╠═════╬═════╬══════════════╣
║ A   ║ B   ║ 1            ║
║ C   ║ D   ║ 2            ║
║ E   ║ F   ║ 3            ║
║ ... ║ ... ║ ...          ║
║ ZZZ ║ ZZZ ║ 6431296      ║
╚═════╩═════╩══════════════╝

That operation doesn't require a separate copy of the data, but how can the final value of 6431296 be applied to all of the previous rows? Sometimes this operation is implemented as the double spool that we saw in the query plan. It is possible to imagine more efficient internal algorithms that we'd like SQL Server to use but we obviously don't have direct control over that.

As I see it here are the your options to solve your problem:

Fix the application. SQL Server already knows how many rows that the query will return. Using @@ROWCOUNT or some other method is by far the most preferable.
Run a separate query to get the count. This can be a good choice if you're returning a large result set to the client and your data is guaranteed not to change inbetween queries. The separate COUNT may benefit from parallelism and from having the relevant tables already loaded into the buffer cache. Depending on indexes on the tables SQL Server may be able to do less IO to get the count as opposed to the full result set.
Add the COUNT aggregate to the query. This can be a good choice if your typical result set is small. That way you don't load as much data into the spool.

The rest of this answer contains information about some tricks that can be done with window functions. Not all of them are relevant to your version of SQL Server, 2008.

A simple ROW_NUMBER() addition does not require the result set to be put into a table spool. If you think about the earlier example that makes sense. SQL Server can just calculate the value as it goes and it doesn't need to apply a value to a previous row. This query ran in 3.1 seconds on my machine:

SELECT 
  ID1
, ID2
, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
FROM #JUNK_DATA;

We can see this in the query plan:

If you change your application to simply take the maximum value of RN as the count of rows then this could be a good option for you.

You might be tempted to use ROW_NUMBER() along with a UNION ALL to get the value that you're looking for in the final dummy row, similar to what you proposed in your question. For example:

SELECT
  t.ID1
, t.ID2
, CASE WHEN ROW_COUNT_ROW = 'Y' THEN -1 + ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) ELSE NULL END ROW_COUNT
FROM
(
    SELECT 
      ID1
    , ID2
    , 'N' ROW_COUNT_ROW
    FROM #JUNK_DATA

    UNION ALL

    SELECT 
      NULL ID1
    , NULL ID2
    , 'Y' ROW_COUNT_ROW
) t;

The above query finished in 3.4 seconds and returned the correct results for me. However, but because it relies on an unspecified join order it can also return the wrong results. Looking at the plan:

The part of the query in blue has to be the top part of the concatenation. The part of the query in red has to be the bottom half. The query optimization is free to rearrange the query plan or to implement the UNION with a different physical operator so you may not always get correct results.

As of SQL Server 2012 it's possible to use the LEAD window function to accomplish something similar to the above example:

SELECT
  t.ID1
, t.ID2
, CASE WHEN LD IS NULL THEN RN ELSE NULL END ROW_COUNT
FROM
(
    SELECT ID1, ID2
    , ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
    , LEAD(ID1) OVER (ORDER BY (SELECT NULL)) LD
    FROM #JUNK_DATA
) t;

This query avoids the explicit table spool and it feels safer than the other one. After all, why would SQL Server process the window functions in a different order? However, correct results are not guaranteed and it finishes in 12.5 seconds on my machine.

It is possible to add an explicit ORDER to the query to always get the right results:

SELECT
  t.ID1
, t.ID2
, CASE WHEN ORDER_BY_COL = 1 THEN -1 + ROW_NUMBER() OVER (ORDER BY ORDER_BY_COL) ELSE NULL END ROW_COUNT
FROM
(
    SELECT
      ID1
    , ID2
    , 0 ORDER_BY_COL
    FROM #JUNK_DATA

    UNION ALL

    SELECT 
      NULL ID1
    , NULL ID2
    , 1 ORDER_BY_COL
) t
OPTION (MAXDOP 1);

But this requires an expensive sort operator in the query plan:

The query ran in 21.5 seconds in serial and would have finished faster if I let it run in parallel.

Finally, as of SQL Server 2016 it's possible to add batch mode for the operators that implement the window functions. I'm going to create an empty CCI table and add it to the query:

CREATE TABLE #ADD_BATCH_MODE (
ID INT NOT NULL, 
INDEX CCI_BATCH CLUSTERED COLUMNSTORE
);

SELECT ID1, ID2, COUNT(*) OVER () ROW_COUNT
FROM #JUNK_DATA
LEFT OUTER JOIN #ADD_BATCH_MODE ON 1 = 0;

This tricks SQL Server into using batch mode for the aggregate. Here is the plan:

The query certainly looks simpler. The on-disk spool is no longer present and it finishes in 5.4 seconds. Adding batch mode to a complicated query will have other effects and in general they will be positive ones but you can't know without testing.

tl;dr

Don't do calculations in SQL language

Longer

The result scale and precision is well defined here on MSDN. It isn't intuitive, really. However, in simple terms, precision is lost when the input scales are high because the result scales need to be dropped to 38 with a matching precision drop.

To confirm things

Your extra CAST in the first example simply add zeroes
The truncation happens as per my MSDN link (2nd example)
The 3rd example with constants has implied decimal values that are just enough (5,1) and 18,14).
This means the result scale and precision have no truncation (see blow)

SQL Server – Can COUNT DISTINCT Be Used with an OVER Clause

This construction is not currently supported in SQL Server. It could (and should, in my opinion) be implemented in a future version.

Applying one of the workarounds listed in the feedback item reporting this deficiency, your query could be rewritten as:

WITH UpdateSet AS
(
    SELECT 
        AgentID, 
        RuleID, 
        Received, 
        Calc = SUM(CASE WHEN rn = 1 THEN 1 ELSE 0 END) OVER (
            PARTITION BY AgentID, RuleID) 
    FROM 
    (
        SELECT  
            AgentID,
            RuleID,
            Received,
            rn = ROW_NUMBER() OVER (
                PARTITION BY AgentID, RuleID, GroupID 
                ORDER BY GroupID)
        FROM    #TempTable
        WHERE   Passed = 1
    ) AS X
)
UPDATE UpdateSet
SET Received = Calc;

The resulting execution plan is:

Plan

This has the advantage of avoiding an Eager Table Spool for Halloween Protection (due to the self-join), but it introduces a sort (for the window) and an often-inefficient Lazy Table Spool construction to calculate and apply the SUM OVER (PARTITION BY) result to all rows in the window. How it performs in practice is an exercise only you can perform.

The overall approach is a difficult one to make perform well. Applying updates (especially ones based on a self-join) recursively to a large structure may be good for debugging but it is a recipe for poor performance. Repeated large scans, memory spills, and Halloween issues are just some of the issues. Indexing and (more) temporary tables can help, but very careful analysis is needed especially if the index is updated by other statements in the process (maintaining indexes affects query plan choices and adds I/O).

Ultimately, solving the underlying problem would make for interesting consultancy work, but it is too much for this site. I hope this answer addresses the surface question though.

Alternative interpretation of the original query (results in updating more rows):

WITH UpdateSet AS
(
    SELECT 
        AgentID, 
        RuleID, 
        Received, 
        Calc = SUM(CASE WHEN Passed = 1 AND rn = 1 THEN 1 ELSE 0 END) OVER (
            PARTITION BY AgentID, RuleID) 
    FROM 
    (
        SELECT  
            AgentID,
            RuleID,
            Received,
            Passed,
            rn = ROW_NUMBER() OVER (
                PARTITION BY AgentID, RuleID, Passed, GroupID
                ORDER BY GroupID)
        FROM    #TempTable
    ) AS X
)
UPDATE UpdateSet
SET Received = Calc
WHERE Calc > 0;

Plan 2

Note: eliminating the sort (e.g. by providing an index) might reintroduce the need for an Eager Spool or something else to provide the necessary Halloween Protection. Sort is a blocking operator, so it provides full phase separation.

Best Answer

Related Solutions

SQL Server – Automatic Decimal Rounding Issue

tl;dr

Longer

SQL Server – Can COUNT DISTINCT Be Used with an OVER Clause

Related Question