Postgresql – Query takes forever with a NOT EXISTS condition

performancepostgresqlpostgresql-performance

I have two queries, one which has a condition that uc_id and ce_id are in tableUCP , and another which only requires rows which doesn't exist in the tableUCP table.

tableA : ~600k rows
tableB : ~600k rows (exactly equal to TableA)
tableCEM: ~380k rows
tableCE: ~500k rows
tableUCP: ~200k rows
tableND: ~600k rows (exactly equal to TableA)

1st query: (Runs in ~ 20 seconds) (INSERTs 0 rows)

select count(*)
FROM tableA
    JOIN tableB
        ON tableA.user_id = tableB.user_id 
        AND tableA.module = tableB.module 
    LEFT JOIN (select DISTINCT c_id,module_id from tableCEM where current = TRUE) cem
        ON tableB.module_id = cem.module_id
    LEFT JOIN (select * from tableCE where type = 'module' AND current = TRUE) ce
        ON tableA.module = ce.module 
    JOIN (select * from tableUC where current = TRUE) uc
        ON cem.c_id = uc.c_id
        AND tableB.user_id = uc.user_id 
    JOIN tableND
        ON uc.uc_id= tableND.uc_id
        AND ce.ce_id = tableND.ce_id
    JOIN (SELECT * FROM tableUCP WHERE current = TRUE) ucp
        ON uc.uc_id = ucp.uc_id
        AND ce.ce_id = ucp.ce_id;

2nd query: (Keep running for hours) (Should INSERT ~600k rows)

select count(*)
FROM tableA
    JOIN tableB
        ON tableA.user_id = tableB.user_id 
        AND tableA.module = tableB.module 
    LEFT JOIN (select DISTINCT c_id,module_id from tableCEM where current = TRUE) cem
        ON tableB.module_id = cem.module_id
    LEFT JOIN (select * from tableCE where type = 'module' AND current = TRUE) ce
        ON tableA.module = ce.module 
    JOIN (select * from tableUC where current = TRUE) uc
        ON cem.c_id = uc.c_id
        AND tableB.user_id = uc.user_id 
    JOIN tableND
        ON uc.uc_id = tableND.uc_id
        AND ce.ce_id = tableND.ce_id
    LEFT JOIN (SELECT * FROM tableUCP WHERE current = TRUE) ucp
        ON uc.uc_id = ucp.uc_id
        AND ce.ce_id = ucp.ce_id
    WHERE ucp.uc_id is NULL OR ucp.ce_id is NULL;

The only difference between the two is the last JOIN condition.
Why am I getting so different results for the two queries ?

Query Plans for both the queries

I'm sorry I wouldn't be able to share any sample data, but any help or suggestions would be tremendously helpful.

I have also tried the following:

  1. Using NOT EXISTS for both columns individually.
  2. Using NOT EXISTS for both columns together.
  3. Using IN (tableA EXCEPT tableB).

All three keep running without an end.

Best Answer

The OR condition in your WHERE clause is killing the performance. You only need to NULL-check one of the keys you are joining with, not both.

Also, no need to do a SELECT * FROM tableUCP when just SELECT uc_id, ce_id... would be enough.