Personally, I'd consider using triggers to load a proper single table and query that.
Then you have control of partitioning (if needed), defining good indexes etc
Depending on volumes, you could use service broker to decouple the app and your tables.
Basically, anything to avoid massive UNIONs and dynamic SQL...
SELECT *
FROM task t1
WHERE EXISTS (
SELECT 1
FROM task t2
WHERE t2.reference = t1.reference -- same reference
AND t2.task_id <> t1.task_id -- different task_id (exclude self-join)
AND t2.end_date BETWEEN t1.end_date - 30
AND t1.end_date + 30
)
An EXISTS
semi-join should be the fastest way to arrive at DISTINCT
rows, because you don't get duplicates to begin with. Also, in the event of multiple matching rows (many matching tasks withing 30 days) evaluation can stop after the first match is found. I would expect this query to beat anything you have so far. Test with EXPLAIN ANALYZE
.
In a WHERE
clause or JOIN
condition don't use an expression like:
abs(t1.end_date - t2.end_date) < 30
If you can avoid it. As the left hand expression is derived from values of two tables, the only way for the query planner is to form a limited CROSS JOIN
(after applying other conditions) and compute a value for every possible combination. This is a well known anti-pattern for good performance!
As long as there are only few rows per reference
, it won't matter much. But the cost grows with O(N²) with more rows per reference
. I rewrote the condition to:
t2.end_date BETWEEN t1.end_date - 30 AND t1.end_date + 30
In my tests, the first form went from 5x slower to 500x slower quickly when I narrowed down the id-space for reference
(-> more matching rows).
The second form can also more easily use indexes. If you want to select a small sample of the table, with conditions like:
reference > 450
date_end > now()::date
And an index like:
CREATE INDEX test_idx ON task (reference, date_end);
CREATE INDEX test_idx2 ON task (date_end);
The performance gap becomes even more overwhelming.
Best Answer
UNION ALL
I would go with a simple
UNION ALL
query here:This is a single query to Postgres.
Parentheses are required in this case.
NOT EXISTS
Alternative, most likely slower:
-> SQLfiddle.