Sql-server – the difference between COUNT(*) and COUNT(*) OVER()

sql servert-sqlwindow functions

Take the following code example:

SELECT MaritalStatus,
       COUNT(*) AS CountResult
       COUNT(*) OVER() AS CountOverResult
       FROM (schema).(table)
       GROUP BY Marital Status

COUNT(*) Returns all rows ignoring nulls right?

What does COUNT(*) OVER() do?

This question came in a practice exam so I didn't have the data to query. I have been using Adventure Works and this site http://www.sqlishard.com/Exercise to practice.

If I enter a query like

SELECT ID, COUNT(*) AS 'COUNT(*)' , COUNT(*) OVER() AS 'COUNT(*) OVER()'
FROM Customers
GROUP BY ID

into the practice site I get 3794 rows returned with the Count(*) column full of ones and the Count(*) Over() column full of the total number of rows. I didn't understand this pattern (sorry) so I came here.

Best Answer

COUNT(*) Returns all rows ignoring nulls right?

I'm not sure what you mean by "ignoring nulls" here. It returns the number of rows irrespective of any NULLs

SELECT COUNT(*)       
FROM (VALUES (CAST(NULL AS INT)),
             (CAST(NULL AS INT))) V(C)

Returns 2.

Altering the above query to COUNT(C) would return 0 as when using COUNT with an expression other than * only NOT NULL values of that expression are counted.

Suppose the table in your question has the following source data

+---------+---------------+
|  Name   | MaritalStatus |
+---------+---------------+
| Albert  | Single        |
| Bob     | Single        |
| Charles | Single        |
| David   | Single        |
| Edward  | Married       |
| Fred    | Married       |
| George  | NULL          |
+---------+---------------+

The query

SELECT MaritalStatus,
       COUNT(*) AS CountResult
FROM   T
GROUP  BY MaritalStatus 

Returns

+---------------+-------------+
| MaritalStatus | CountResult |
+---------------+-------------+
| Single        |           4 |
| Married       |           2 |
| NULL          |           1 |
+---------------+-------------+

Hopefully it is obvious how that result relates to the original data.

What does COUNT(*) OVER() do?

Adding that into the SELECT list for the previous query produces

+---------------+-------------+-----------------+
| MaritalStatus | CountResult | CountOverResult |
+---------------+-------------+-----------------+
| Single        |           4 |               3 |
| Married       |           2 |               3 |
| NULL          |           1 |               3 |
+---------------+-------------+-----------------+

Notice that the result set has 3 rows and CountOverResult is 3. This is not a coincidence.

The reason for this is because it logically operates on the result set after the GROUP BY.

COUNT(*) OVER () is a windowed aggregate. The absence of any PARTITION BY or ORDER BY clause means that the window it operates on is the whole result set.

In the case of the query in your question the value of CountOverResult is the same as the number of distinct MaritalStatus values that exist in the base table because there is one row for each of these in the grouped result.