Sql-server – the difference between COUNT() and COUNT() OVER()

sql servert-sqlwindow functions

Take the following code example:

SELECT MaritalStatus,
       COUNT(*) AS CountResult
       COUNT(*) OVER() AS CountOverResult
       FROM (schema).(table)
       GROUP BY Marital Status

COUNT(*) Returns all rows ignoring nulls right?

What does COUNT(*) OVER() do?

This question came in a practice exam so I didn't have the data to query. I have been using Adventure Works and this site http://www.sqlishard.com/Exercise to practice.

If I enter a query like

SELECT ID, COUNT(*) AS 'COUNT(*)' , COUNT(*) OVER() AS 'COUNT(*) OVER()'
FROM Customers
GROUP BY ID

into the practice site I get 3794 rows returned with the Count(*) column full of ones and the Count(*) Over() column full of the total number of rows. I didn't understand this pattern (sorry) so I came here.

Best Answer

COUNT(*) Returns all rows ignoring nulls right?

I'm not sure what you mean by "ignoring nulls" here. It returns the number of rows irrespective of any NULLs

SELECT COUNT(*)       
FROM (VALUES (CAST(NULL AS INT)),
             (CAST(NULL AS INT))) V(C)

Returns 2.

Altering the above query to COUNT(C) would return 0 as when using COUNT with an expression other than * only NOT NULL values of that expression are counted.

Suppose the table in your question has the following source data

+---------+---------------+
|  Name   | MaritalStatus |
+---------+---------------+
| Albert  | Single        |
| Bob     | Single        |
| Charles | Single        |
| David   | Single        |
| Edward  | Married       |
| Fred    | Married       |
| George  | NULL          |
+---------+---------------+

The query

SELECT MaritalStatus,
       COUNT(*) AS CountResult
FROM   T
GROUP  BY MaritalStatus

Returns

+---------------+-------------+
| MaritalStatus | CountResult |
+---------------+-------------+
| Single        |           4 |
| Married       |           2 |
| NULL          |           1 |
+---------------+-------------+

Hopefully it is obvious how that result relates to the original data.

What does COUNT(*) OVER() do?

Adding that into the SELECT list for the previous query produces

+---------------+-------------+-----------------+
| MaritalStatus | CountResult | CountOverResult |
+---------------+-------------+-----------------+
| Single        |           4 |               3 |
| Married       |           2 |               3 |
| NULL          |           1 |               3 |
+---------------+-------------+-----------------+

Notice that the result set has 3 rows and CountOverResult is 3. This is not a coincidence.

The reason for this is because it logically operates on the result set after the GROUP BY.

COUNT(*) OVER () is a windowed aggregate. The absence of any PARTITION BY or ORDER BY clause means that the window it operates on is the whole result set.

In the case of the query in your question the value of CountOverResult is the same as the number of distinct MaritalStatus values that exist in the base table because there is one row for each of these in the grouped result.

Related Solutions

PostgreSQL – How to Find Consecutive Free Numbers Using Window Functions

This is a gaps-and-islands problem. Assuming there are no gaps or duplicates in the same id_set set:

WITH partitioned AS (
  SELECT
    *,
    number - ROW_NUMBER() OVER (PARTITION BY id_set) AS grp
  FROM atable
  WHERE status = 'FREE'
),
counted AS (
  SELECT
    *,
    COUNT(*) OVER (PARTITION BY id_set, grp) AS cnt
  FROM partitioned
)
SELECT
  id_set,
  number
FROM counted
WHERE cnt >= 3
;

Here's a SQL Fiddle demo^* link for this query: http://sqlfiddle.com/#!1/a2633/1.

UPDATE

To return only one set, you could add in one more round of ranking:

WITH partitioned AS (
  SELECT
    *,
    number - ROW_NUMBER() OVER (PARTITION BY id_set) AS grp
  FROM atable
  WHERE status = 'FREE'
),
counted AS (
  SELECT
    *,
    COUNT(*) OVER (PARTITION BY id_set, grp) AS cnt
  FROM partitioned
),
ranked AS (
  SELECT
    *,
    RANK() OVER (ORDER BY id_set, grp) AS rnk
  FROM counted
  WHERE cnt >= 3
)
SELECT
  id_set,
  number
FROM ranked
WHERE rnk = 1
;

Here's a demo for this one too: http://sqlfiddle.com/#!1/a2633/2.

If you ever need to make it one set per id_set, change the RANK() call like this:

RANK() OVER (PARTITION BY id_set ORDER BY grp) AS rnk

Additionally, you could make the query return the smallest matching set (i.e. first try to return the first set of exactly three consecutive numbers if it exists, otherwise four, five etc.), like this:

RANK() OVER (ORDER BY cnt, id_set, grp) AS rnk

or like this (one per id_set):

RANK() OVER (PARTITION BY id_set ORDER BY cnt, grp) AS rnk

_{* The SQL Fiddle demos linked in this answer use the 9.1.8 instance as the 9.2.1 one doesn't appear to be working at the moment.}

T-SQL – Difference Between <> All and Not In

There is no difference in result but there is a bit different semantics.

X [comparison] ALL(set) mean that set is empty or the comparison is TRUE for each entry in the set.

X NOT IN (set) means that X does not belong to the set.

While [comparison] is "not equal", both forms are equivalent. But for other comparisons it may be different.

Best Answer

Related Solutions

PostgreSQL – How to Find Consecutive Free Numbers Using Window Functions

T-SQL – Difference Between <> All and Not In

Related Question