Do you have some zero dates? Datetime values of 0000-00-00 00:00:00
are considered by MySQL to simultaneously satisfy is null
and is not null
:
steve@steve@localhost > create temporary table _tmp (a datetime not null);
Query OK, 0 rows affected (0.02 sec)
steve@steve@localhost > insert into _tmp values ('');
Query OK, 1 row affected, 1 warning (0.00 sec)
Warning (Code 1264): Out of range value for column 'a' at row 1
steve@steve@localhost > select a from _tmp where a is null;
+---------------------+
| a |
+---------------------+
| 0000-00-00 00:00:00 |
+---------------------+
1 row in set (0.00 sec)
steve@steve@localhost > select a from _tmp where a is not null;
+---------------------+
| a |
+---------------------+
| 0000-00-00 00:00:00 |
+---------------------+
1 row in set (0.00 sec)
See: http://bugs.mysql.com/bug.php?id=940
This is classified as "not a bug". They suggest a workaround: use strict mode, which will convert the insertion warning into an error.
Having said all that, this alone can't explain the wild variation in the results you're getting (the sum of the is null
and is not null
counts should exceed the unrestricted count)...
This may not directly help you, since I only know SQL Server, but in any case this is how I would approach this:
Sample Data:
CREATE TABLE #Example
(
Eid integer NULL
);
INSERT #Example
(Eid)
VALUES
(1),
(2),
(3),
(4),
(5),
(NULL),
(NULL),
(NULL),
(NULL),
(NULL);
My idea revolves around the idea that COUNT(column)
does not count NULLs
, whereas COUNT(*)
does:
SELECT
[Number] = COUNT(E.Eid),
[Null] = COUNT(*) - COUNT(E.Eid)
FROM #Example AS E;
Output 1:
+--------+------+
| Number | Null |
+--------+------+
| 5 | 5 |
+--------+------+
If the specific arrangement of rows and columns is important, I would UNPIVOT
:
WITH Previous AS
(
-- Same query as above
SELECT
[Number] = COUNT(E.Eid),
[Null] = COUNT(*) - COUNT(E.Eid)
FROM #Example AS E
)
SELECT
U.Eid,
U.Counts
FROM Previous
UNPIVOT
(
Counts FOR Eid IN ([Number], [Null])
) AS U;
Output 2:
+--------+--------+
| Eid | Counts |
+--------+--------+
| Number | 5 |
| Null | 5 |
+--------+--------+
Using CASE
SELECT
C.Name,
Counts = COUNT(*)
FROM
(
SELECT
Name =
CASE
WHEN E.Eid IS NULL
THEN 'Null'
ELSE 'Number'
END
FROM #Example AS E
) AS C
GROUP BY
C.Name;
Output 3:
+--------+--------+
| Name | Counts |
+--------+--------+
| Null | 5 |
| Number | 5 |
+--------+--------+
Edit:
This is the query form Babu ended up using:
select
count(*) as cnt,
case
when smpth is null
then 'NULL'
else 'NUMBER'
end numbertest
from employees
group by
case
when smpth is null
then 'NULL'
else 'NUMBER'
end numbertest
Best Answer
It's all about NULL handling. The expression
lag(id_type) OVER (ORDER BY date) <> id_type
evaluates to a boolean value, which can beTRUE
,FALSE
orNULL
. If either the left or the right operand is NULL, we get NULL. There are various ways NULL can be introduced on either side:OUTER JOIN
introduces NULL (no matching row found).There are various ways to fold three-valued logic to just two cases. A
CASE
statement, anOR
expression etc. Related answer with details:Here is a comparison of the involved techniques, plus (updated) benchmark. Basically, they all perform the same, it's a cheap operation. The aggregate
FILTER
clause, introduced Postgres 9.4, seems a bit faster in the updated benchmark:For absolute performance, is SUM faster or COUNT?
Return counts for multiple ranges in a single SELECT statement
There is no case
wherein only one of them works
, the techniques are interchangeable. What to count or sum exactly depends on details of setup and requirements. In this case we only countTRUE
- as long as the underlying table or query yields all non-null values (the corner case of the first row is open for debate).And this case only requires a simple comparison. Related cases need to look at the previous or next value more closely. Then it may come in handy to provide a default value as 3rd parameter to
lag()
orlead()
.