Postgresql – Best way of finding rows referencing a given id on PostgreSQL

join;performancepostgresqlpostgresql-performanceunion

I'm trying to find if there would be anything violating the foreign key constraint when deleting a row, to check if I can delete it or not (can't use ON DELETE CASCADE)

For that, I have a piece of code that will give me a list of all (table, column) referencing that row's id. Then all I want to know is if any of those (table, column) have my row's id there, which would prevent its deletion.

I found 3 ways to do that, and all producing similar results on EXPLAIN ANALYZE on a very large database. I'll paste here some real examples:

The first one is to make a UNION of each (table, column) and check if the id is there (if it's null at the end, there's no references):

  (SELECT client_category_price.sellable_id
   FROM client_category_price
   WHERE client_category_price.sellable_id = '9bc202ca-f7c1-11e2-a751-062b1fc90460' LIMIT 1)
UNION
  (SELECT commission_source.sellable_id
   FROM commission_source
   WHERE commission_source.sellable_id = '9bc202ca-f7c1-11e2-a751-062b1fc90460' LIMIT 1)
UNION
  (SELECT loan_item.sellable_id
   FROM loan_item
   WHERE loan_item.sellable_id = '9bc202ca-f7c1-11e2-a751-062b1fc90460' LIMIT 1)
UNION
  (SELECT product.sellable_id
   FROM product
   WHERE product.sellable_id = '9bc202ca-f7c1-11e2-a751-062b1fc90460' LIMIT 1)
UNION
  (SELECT product_history.sellable_id
   FROM product_history
   WHERE product_history.sellable_id = '9bc202ca-f7c1-11e2-a751-062b1fc90460' LIMIT 1)
UNION
  (SELECT purchase_item.sellable_id
   FROM purchase_item
   WHERE purchase_item.sellable_id = '9bc202ca-f7c1-11e2-a751-062b1fc90460' LIMIT 1)
UNION
  (SELECT receiving_order_item.sellable_id
   FROM receiving_order_item
   WHERE receiving_order_item.sellable_id = '9bc202ca-f7c1-11e2-a751-062b1fc90460' LIMIT 1)
UNION
  (SELECT returned_sale_item.sellable_id
   FROM returned_sale_item
   WHERE returned_sale_item.sellable_id = '9bc202ca-f7c1-11e2-a751-062b1fc90460' LIMIT 1)
UNION
  (SELECT sale_item.sellable_id
   FROM sale_item
   WHERE sale_item.sellable_id = '9bc202ca-f7c1-11e2-a751-062b1fc90460' LIMIT 1)
UNION
  (SELECT service.sellable_id
   FROM service
   WHERE service.sellable_id = '9bc202ca-f7c1-11e2-a751-062b1fc90460' LIMIT 1)
UNION
  (SELECT stock_decrease_item.sellable_id
   FROM stock_decrease_item
   WHERE stock_decrease_item.sellable_id = '9bc202ca-f7c1-11e2-a751-062b1fc90460' LIMIT 1)
UNION
  (SELECT transfer_order_item.sellable_id
   FROM transfer_order_item
   WHERE transfer_order_item.sellable_id = '9bc202ca-f7c1-11e2-a751-062b1fc90460' LIMIT 1)
UNION
  (SELECT work_order.sellable_id
   FROM work_order
   WHERE work_order.sellable_id = '9bc202ca-f7c1-11e2-a751-062b1fc90460' LIMIT 1)
UNION
  (SELECT work_order_item.sellable_id
   FROM work_order_item
   WHERE work_order_item.sellable_id = '9bc202ca-f7c1-11e2-a751-062b1fc90460' LIMIT 1)

The second one is to make a cross join and return a "big tuple" containing those same ids, and then test if all of them are null:

SELECT client_category_price.sellable_id, commission_source.sellable_id, loan_item.sellable_id, product.sellable_id, product_history.sellable_id, purchase_item.sellable_id,
       receiving_order_item.sellable_id, returned_sale_item.sellable_id, sale_item.sellable_id, service.sellable_id, stock_decrease_item.sellable_id, transfer_order_item.sellable_id,
       work_order.sellable_id, work_order_item.sellable_id
FROM sellable
LEFT JOIN client_category_price ON client_category_price.sellable_id = sellable.id
LEFT JOIN commission_source ON commission_source.sellable_id = sellable.id
LEFT JOIN loan_item ON loan_item.sellable_id = sellable.id
LEFT JOIN product ON product.sellable_id = sellable.id
LEFT JOIN product_history ON product_history.sellable_id = sellable.id
LEFT JOIN purchase_item ON purchase_item.sellable_id = sellable.id
LEFT JOIN receiving_order_item ON receiving_order_item.sellable_id = sellable.id
LEFT JOIN returned_sale_item ON returned_sale_item.sellable_id = sellable.id
LEFT JOIN sale_item ON sale_item.sellable_id = sellable.id
LEFT JOIN service ON service.sellable_id = sellable.id
LEFT JOIN stock_decrease_item ON stock_decrease_item.sellable_id = sellable.id
LEFT JOIN transfer_order_item ON transfer_order_item.sellable_id = sellable.id
LEFT JOIN work_order ON work_order.sellable_id = sellable.id
LEFT JOIN work_order_item ON work_order_item.sellable_id = sellable.id
WHERE sellable.id = '9bc202ca-f7c1-11e2-a751-062b1fc90460' LIMIT 1

Note that the database is so big that if I don't put the LIMIT 1 at the end, the query would take a very long time to execute, but with the limit, it take as long as the UNION above.

And there's the third option: I'm actually executing this from python. So I could execute a COUNT on each of those UNION queries and if one of them returns something > 0, I already know there exists a reference for the row. On the best case I would execute one COUNT and at the worst case all of those COUNTS.

Since all those options produce very insignificant differences (both take +/- 0.300ms with about 1 million rows divided between those related tables), which one would be better to use? I mean, which would scale better?

Best Answer

I suggest your first option, with two improvements and some simplifications.

(
SELECT 1      -- irrelevant what you select here
FROM   client_category_price
WHERE  sellable_id = '9bc202ca-f7c1-11e2-a751-062b1fc90460'
LIMIT  1      -- may be redundant
)
UNION ALL     -- not just UNION

  ...

UNION ALL
(
SELECT 1
FROM   work_order_item
WHERE  sellable_id = '9bc202ca-f7c1-11e2-a751-062b1fc90460'
LIMIT  1
)
LIMIT  1;      -- this one is crucial 

Given that all you want to know is

if any of those (table, column) have my row's id there, which would prevent its deletion.

  • You don't need a full list of violating rows. Stop searching at the first one. All you need to do is add another LIMIT 1 at the end of the query. This way, Postgres skips rest of the query as soon as the first row is found. You probably don't need LIMIT 1 for each SELECT, just the one at the end. Test without, it may produce different query plans.

  • Use UNION ALL instead of UNION. Faster.

  • Some other simplifications.

Related answer on SO: