Postgresql – Comparing nullable column values using “IS NULL” versus “= NULL”

nullpostgresqlpostgresql-9.4

I'm dealing with nullable values in a column of a table, i'm wondering why this:

with cte as (
    select 'c1'::text as c1,null::text as c2
    union
    select 'c2'::text as c1,null::text as c2
)
select *
from cte
where c2 is null

returns a result set with the two "rows" having a null value in c2; comparting to this query:

with cte as (
    select 'c1'::text as c1,null::text as c2
    union
    select 'c2'::text as c1,null::text as c2
)
select *
from cte
where c2 = null

that returns an empty result set!!

so it looks like that field is null is actually different from field = null? the question is… is there "null-compatible operator" to use? so i can write the comparision like: column_name = $1 and argument $1 can be a not null or null value. IF $1 is a null value then it will semantically equals to write as IS NULL, ELSE (if $1 is not a null value) then it would be semantically equals to = $1.

P.D: This question also applies to the case is NOT NULL versus <> NULL

Best Answer

As documented in the manual it's not possible to use = to test for NULL values:

Ordinary comparison operators yield null (signifying “unknown”), not true or false, when either input is null. For example, 7 = NULL yields null, as does 7 <> NULL.

is there "null-compatible operator" to use - Yes, right below the above quote it states:

When this behavior is not suitable, use the IS [ NOT ] DISTINCT FROM predicates

So you can use

where c2 is not distinct from $1

One drawback of the IS DISTINCT or IS NOT DISTINCT operator is however, that they can't use an index.

Related Solutions

Sql-server – Why does ALTER COLUMN to NOT NULL cause massive log file growth

When you change a column to NOT NULL, SQL Server has to touch every single page, even if there are no NULL values. Depending on your fill factor this could actually lead to a lot of page splits. Every page that is touched, of course, has to be logged, and I suspect due to the splits that two changes may have to be logged for many pages. Since it's all done in a single pass, though, the log has to account for all of the changes so that, if you hit cancel, it knows exactly what to undo.

An example. Simple table:

DROP TABLE dbo.floob;
GO

CREATE TABLE dbo.floob
(
  id INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED, 
  bar INT NULL
);

INSERT dbo.floob(bar) SELECT NULL UNION ALL SELECT 4 UNION ALL SELECT NULL;

ALTER TABLE dbo.floob ADD CONSTRAINT df DEFAULT(0) FOR bar

Now, let's look at the page details. First we need to find out what page and DB_ID we're dealing with. In my case I created a database called foo, and the DB_ID happened to be 5.

DBCC TRACEON(3604, -1);
DBCC IND('foo', 'dbo.floob', 1);
SELECT DB_ID();

The output indicated that I was interested in page 159 (the only row in DBCC IND output with PageType = 1).

Now, let's look some select page details as we step through the OP's scenario.

DBCC PAGE(5, 1, 159, 3);

enter image description here

UPDATE dbo.floob SET bar = 0 WHERE bar IS NULL;    
DBCC PAGE(5, 1, 159, 3);

enter image description here

ALTER TABLE dbo.floob ALTER COLUMN bar INT NOT NULL;
DBCC PAGE(5, 1, 159, 3);

enter image description here

Now, I don't have all the answers to this, as I am not a deep internals guy. But it's clear that - while both the update operation and the addition of the NOT NULL constraint undeniably write to the page - the latter does so in an entirely different way. It seems to actually change the structure of the record, rather than just fiddle with bits, by swapping out the nullable column for a non-nullable column. Why it has to do that, I'm not quite sure - a good question for the storage engine team, I guess. I do believe that SQL Server 2012 handles some of these scenarios a lot better, FWIW - but I have yet to do any exhaustive testing.

Postgresql – Selecting the last record that meets a condition

I think LAG() cannot be used here, because with LAG you need to be specific about how many rows back you want to go. (It is 1 by default, but you can specify 3, 10 or any other number. The point, however, is that it must be a specific number.) In your situation, you do not know if the last matching value was on the previous row or on the row before it or even earlier.

So, a different approach would be to find the ID of the last row with the matching value, then look that ID up to get the value for the final output – something like this:

SELECT
  s.id,
  t.val
FROM
  (
    SELECT
      id,
      MAX(CASE WHEN val >=5 THEN id END) OVER (ORDER BY id ASC) AS last_id
    FROM
      test
  ) AS s
  INNER JOIN test AS t ON s.last_id = t.id
ORDER BY
  s.id ASC
;

Or you could use a correlated subquery to get the last value that is more than 5 in the subset from the lowest ID to the current ID:

SELECT
  id,
  (
    SELECT
      sub.val
    FROM
      test AS sub
    WHERE
      sub.id <= main.id
      AND sub.val >= 5
    ORDER BY
      id DESC
    LIMIT
      1
  ) AS val
FROM
  test AS main
ORDER BY
  id ASC
;

This would be similar to LAG() but more flexible (and likely less efficient) – a LAG() with a tweak, if you like.

Best Answer

Related Solutions

Sql-server – Why does ALTER COLUMN to NOT NULL cause massive log file growth

Postgresql – Selecting the last record that meets a condition

Related Question