PostgreSQL – Deleting with NOT EXISTS vs NOT IN Differences

postgresql

I'm working with an older schema, and don't understand why my delete operation (to clear out orphan records) doesn't work as expected in some cases. In others both of the queries return identical results :-/

For example, I have 3 models: house, office, and address. house and office both have a nullable reference to an address.id, so an address record can belong to either of them but the address itself doesn't indicate which.

-- query 1
DELETE FROM address adr
WHERE
  NOT EXISTS(SELECT * FROM house  H WHERE H.address_id = adr.id) AND
  NOT EXISTS(SELECT * FROM office O WHERE O.address_id = adr.id);

-- query 2 
DELETE FROM address adr
WHERE
  NOT adr.id IN (select address_id from house) AND
  NOT adr.id IN (select address_id from office);

query 1: deletes 3000 records (correct)

query 2: deletes 0 records

What is the obvious problem with query #2? When I check the records deleted by #1 they really do not exist in either house or office.

Best Answer

Your address_id columns contain nulls, and thus you got caught by the three-valued logic of SQL.

https://www.postgresql.org/docs/current/static/functions-subquery.html#functions-subquery-in describes the behaviour of IN:

Note that if the left-hand expression yields null, or if there are no equal right-hand values and at least one right-hand row yields null, the result of the IN construct will be null, not false. This is in accordance with SQL's normal rules for Boolean combinations of null values.

And of course NOT applied to null is null, not true, see https://www.postgresql.org/docs/current/static/functions-logical.html