I'm working with an older schema, and don't understand why my delete operation (to clear out orphan records) doesn't work as expected in some cases. In others both of the queries return identical results :-/
For example, I have 3 models: house, office, and address. house
and office
both have a nullable reference to an address.id
, so an address
record can belong to either of them but the address itself doesn't indicate which.
-- query 1
DELETE FROM address adr
WHERE
NOT EXISTS(SELECT * FROM house H WHERE H.address_id = adr.id) AND
NOT EXISTS(SELECT * FROM office O WHERE O.address_id = adr.id);
-- query 2
DELETE FROM address adr
WHERE
NOT adr.id IN (select address_id from house) AND
NOT adr.id IN (select address_id from office);
query 1: deletes 3000 records (correct)
query 2: deletes 0 records
What is the obvious problem with query #2? When I check the records deleted by #1 they really do not exist in either house
or office
.
Best Answer
Your
address_id
columns contain nulls, and thus you got caught by the three-valued logic of SQL.https://www.postgresql.org/docs/current/static/functions-subquery.html#functions-subquery-in describes the behaviour of
IN
:And of course
NOT
applied to null is null, not true, see https://www.postgresql.org/docs/current/static/functions-logical.html