The following query aligns test data with the closest truth data equal to or earlier than the test data for all possible truth_id's. There are very few truth_ids and very few test ids, but each id has thousands of entries with consecutive epoch-times. Truth id's are not the same as test id's.
So for test id X, each row will be paired with the closest truth row of ID A,B,C…
For 4 unique test ids and 3 unique truth ids, and 1000 rows per test id and 2000 rows per truth id, the query should return (4*1000)*3 = 16000 rows. Note that the number of rows per truth id does not impact the number of rows returned.
This version quickly completes in 0.27 seconds:
SELECT *
FROM
(
SELECT
epoch as test_epoch,
id as test_id,
truth_id,
sqrt(x^2 + y^2) as test_r
FROM test_data
CROSS JOIN ( SELECT DISTINCT id AS truth_id FROM truth_data) as tids
) AS test
LEFT JOIN
(
SELECT
epoch as truth_epoch,
id as truth_id,
sqrt(x^2 + y^2) as truth_r
FROM truth_data
) AS truth_low
ON (
test.truth_id = truth_low.truth_id
AND truth_low.truth_epoch = (select max(epoch) from truth_data where epoch <= test_epoch AND id = test.truth_id )
)
I already enforce truth_id equality in the third to last line. But if I use the truth_low.truth_id instead of test.truth_id, ie:
AND truth_low.truth_epoch = (select max(epoch) from truth_data where epoch <= test_epoch AND id = truth_low.truth_id )
The query is terribly slow and takes 27.53 seconds.
I don't understand how something so minor could cause such an inefficient shift in the query plan. I understand that the query plans are indeed different and that one is faster, but they queries are so similar that it seems unreasonable that this would occur. Can anyone explain?
The fast query has the query plan:
Hash Left Join (cost=165.00..212716.05 rows=2800 width=44) (actual time=0.723..207.941 rows=800 loops=1)
Hash Cond: ((truth_data_1.id = truth_data.id) AND ((SubPlan 1) = truth_data.epoch))
Buffers: shared hit=72121
-> Nested Loop (cost=70.00..150.04 rows=2800 width=32) (actual time=0.304..1.589 rows=800 loops=1)
Buffers: shared hit=76
-> Seq Scan on track_data (cost=0.00..45.00 rows=1400 width=28) (actual time=0.035..0.099 rows=400 loops=1)
Buffers: shared hit=31
-> Materialize (cost=70.00..70.05 rows=2 width=4) (actual time=0.001..0.001 rows=2 loops=400)
Buffers: shared hit=45
-> HashAggregate (cost=70.00..70.02 rows=2 width=4) (actual time=0.262..0.263 rows=2 loops=1)
Group Key: truth_data_1.id
Buffers: shared hit=45
-> Seq Scan on truth_data truth_data_1 (cost=0.00..65.00 rows=2000 width=4) (actual time=0.009..0.099 rows=600 loops=1)
Buffers: shared hit=45
-> Hash (cost=65.00..65.00 rows=2000 width=28) (actual time=0.237..0.237 rows=600 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 41kB
Buffers: shared hit=45
-> Seq Scan on truth_data (cost=0.00..65.00 rows=2000 width=28) (actual time=0.012..0.112 rows=600 loops=1)
Buffers: shared hit=45
SubPlan 1
-> Aggregate (cost=75.83..75.84 rows=1 width=8) (actual time=0.125..0.125 rows=1 loops=1600)
Buffers: shared hit=72000
-> Seq Scan on truth_data truth_data_2 (cost=0.00..75.00 rows=333 width=8) (actual time=0.021..0.096 rows=150 loops=1600)
Filter: ((epoch <= track_data.epoch) AND (id = truth_data_1.id))
Rows Removed by Filter: 450
Buffers: shared hit=72000
Planning time: 0.293 ms
Execution time: 208.129 ms
While the slow query has the plan:
Hash Left Join (cost=160.00..212397803.04 rows=2800 width=44) (actual time=31.818..29884.990 rows=800 loops=1)
Hash Cond: (truth_data_1.id = truth_data.id)
Join Filter: (truth_data.epoch = (SubPlan 1))
Rows Removed by Join Filter: 239200
Buffers: shared hit=10800121
-> Nested Loop (cost=70.00..150.04 rows=2800 width=32) (actual time=0.490..2.688 rows=800 loops=1)
Buffers: shared hit=76
-> Seq Scan on track_data (cost=0.00..45.00 rows=1400 width=28) (actual time=0.030..0.116 rows=400 loops=1)
Buffers: shared hit=31
-> Materialize (cost=70.00..70.05 rows=2 width=4) (actual time=0.001..0.003 rows=2 loops=400)
Buffers: shared hit=45
-> HashAggregate (cost=70.00..70.02 rows=2 width=4) (actual time=0.447..0.449 rows=2 loops=1)
Group Key: truth_data_1.id
Buffers: shared hit=45
-> Seq Scan on truth_data truth_data_1 (cost=0.00..65.00 rows=2000 width=4) (actual time=0.026..0.207 rows=600 loops=1)
Buffers: shared hit=45
-> Hash (cost=65.00..65.00 rows=2000 width=28) (actual time=0.235..0.235 rows=600 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 41kB
Buffers: shared hit=45
-> Seq Scan on truth_data (cost=0.00..65.00 rows=2000 width=28) (actual time=0.017..0.117 rows=600 loops=1)
Buffers: shared hit=45
SubPlan 1
-> Aggregate (cost=75.83..75.84 rows=1 width=8) (actual time=0.122..0.122 rows=1 loops=240000)
Buffers: shared hit=10800000
-> Seq Scan on truth_data truth_data_2 (cost=0.00..75.00 rows=333 width=8) (actual time=0.020..0.093 rows=150 loops=240000)
Filter: ((epoch <= track_data.epoch) AND (id = truth_data.id))
Rows Removed by Filter: 450
Buffers: shared hit=10800000
Planning time: 0.582 ms
Execution time: 29885.446 ms
EDIT 1:
For Erwin:
Thank you for your response. Would you please elaborate? I don't understand what you mean by the second variant's correlated subquery depending on two disjunct tables while the original does not. It seems that both versions of the correlated sub query reference a source id and source epoch from either the test or truth table, and then query the truth table.
I added more rows to my tables to get a better of idea of runtime and your suggested solution. I also added an index on truth_data(epoch) and truth_data(id, epoch ORDER BY DESC NULLS LAST).
Variant 1 yields the following explain:
Hash Left Join (cost=248.00..2937.81 rows=4000 width=44) (actual time=4.621..135.524 rows=4000 loops=1)
Hash Cond: ((truth_data_1.id = truth_data.id) AND ((SubPlan 2) = truth_data.epoch))
Buffers: shared hit=28126
-> Nested Loop (cost=99.00..197.05 rows=4000 width=32) (actual time=1.409..7.245 rows=4000 loops=1)
Buffers: shared hit=77
-> Seq Scan on test_data (cost=0.00..48.00 rows=2000 width=28) (actual time=0.039..0.297 rows=2000 loops=1)
Buffers: shared hit=28
-> Materialize (cost=99.00..99.05 rows=2 width=4) (actual time=0.001..0.001 rows=2 loops=2000)
Buffers: shared hit=49
-> HashAggregate (cost=99.00..99.02 rows=2 width=4) (actual time=1.355..1.355 rows=2 loops=1)
Group Key: truth_data_1.id
Buffers: shared hit=49
-> Seq Scan on truth_data truth_data_1 (cost=0.00..89.00 rows=4000 width=4) (actual time=0.025..0.440 rows=4000 loops=1)
Buffers: shared hit=49
-> Hash (cost=89.00..89.00 rows=4000 width=28) (actual time=3.168..3.168 rows=4000 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 235kB
Buffers: shared hit=49
-> Seq Scan on truth_data (cost=0.00..89.00 rows=4000 width=28) (actual time=0.020..0.952 rows=4000 loops=1)
Buffers: shared hit=49
SubPlan 2
-> Result (cost=0.60..0.61 rows=1 width=8) (actual time=0.012..0.012 rows=1 loops=8000)
Buffers: shared hit=28000
InitPlan 1 (returns $2)
-> Limit (cost=0.28..0.60 rows=1 width=8) (actual time=0.010..0.010 rows=1 loops=8000)
Buffers: shared hit=28000
-> Index Scan Backward using truth_data_epoch_idx on truth_data truth_data_2 (cost=0.28..212.35 rows=667 width=8) (actual time=0.007..0.007 rows=1 loops=8000)
Index Cond: ((epoch IS NOT NULL) AND (epoch <= test_data.epoch))
Filter: (id = truth_data_1.id)
Rows Removed by Filter: 1
Buffers: shared hit=28000
Planning time: 0.590 ms
Execution time: 136.088 ms
Your suggested solution yields:
Nested Loop Left Join (cost=178.79..371076.05 rows=4000 width=40) (actual time=0.077..4886.702 rows=4000 loops=1)
Buffers: shared hit=47454
-> Seq Scan on test_data test (cost=0.00..48.00 rows=2000 width=28) (actual time=0.039..0.691 rows=2000 loops=1)
Buffers: shared hit=28
-> Unique (cost=178.79..185.45 rows=2 width=20) (actual time=2.169..2.438 rows=2 loops=2000)
Buffers: shared hit=47426
-> Sort (cost=178.79..182.12 rows=1333 width=20) (actual time=2.097..2.234 rows=2000 loops=2000)
Sort Key: tru1.id, tru1.epoch DESC NULLS LAST
Sort Method: quicksort Memory: 346kB
Buffers: shared hit=47426
-> Bitmap Heap Scan on truth_data tru1 (cost=30.61..109.60 rows=1333 width=20) (actual time=0.142..1.161 rows=2000 loops=2000)
Recheck Cond: (epoch <= test.epoch)
Heap Blocks: exact=33486
Buffers: shared hit=47426
-> Bitmap Index Scan on truth_data_epoch_idx (cost=0.00..30.28 rows=1333 width=0) (actual time=0.131..0.131 rows=2000 loops=2000)
Index Cond: (epoch <= test.epoch)
Buffers: shared hit=13940
Planning time: 0.310 ms
Execution time: 4887.492 ms
I noticed that it does not make use of the truth_data(id, epoch) index and instead uses just the truth_data(epoch) index. I'm guessing that's because there are so few id's that it doesn't need to index by id?
I think the the big reason between this performance difference is that variant 1 yields a merge join while the suggested solution yields a nested loop join.
EDIT 2:
I was able to answer at least my question about the disjunct tables. Comparing just the sub plans from the correlated sub queries of my variation 1 and 2:
Variation 1:
SubPlan 2
-> Result (cost=0.60..0.61 rows=1 width=8) (actual time=0.013..0.013 rows=1 loops=8000)
Buffers: shared hit=28000
InitPlan 1 (returns $2)
-> Limit (cost=0.28..0.60 rows=1 width=8) (actual time=0.010..0.010 rows=1 loops=8000)
Buffers: shared hit=28000
-> Index Scan Backward using truth_data_epoch_idx on truth_data truth_data_2 (cost=0.28..212.35 rows=667 width=8) (actual time=0.008..0.008 rows=1 loops=8000)
Index Cond: ((epoch IS NOT NULL) AND (epoch <= test_data.epoch))
Filter: (id = truth_data_1.id)
Rows Removed by Filter: 1
Buffers: shared hit=28000
Variation 2:
SubPlan 2
-> Result (cost=0.60..0.61 rows=1 width=8) (actual time=0.011..0.011 rows=1 loops=8000000)
Buffers: shared hit=28000000
InitPlan 1 (returns $2)
-> Limit (cost=0.28..0.60 rows=1 width=8) (actual time=0.009..0.009 rows=1 loops=8000000)
Buffers: shared hit=28000000
-> Index Scan Backward using truth_data_epoch_idx on truth_data truth_data_2 (cost=0.28..212.35 rows=667 width=8) (actual time=0.007..0.007 rows=1 loops=8000000)
Index Cond: ((epoch IS NOT NULL) AND (epoch <= test_data.epoch))
Filter: (id = truth_data.id)
Rows Removed by Filter: 1
Buffers: shared hit=28000000
It becomes evident that even though the ON condition references both tables, the correlated subquery only references the test data in variation 1, while it references both test and truth in variation 2. Thus variation 2's correlated sub query is dealing with a table approximately (size of truth_data) times bigger than variation 1's correlated sub query.
Best Answer
Why the difference in performance? The correlated subquery in the join condition of the 1st variant is already potentially expensive. The second variant makes it worse by depending on two disjunct tables - forcing a Carthesian Product between the two (for nothing as it turn out later). The query planner is no AI, nor does it try to be. That could quickly add more overhead cost for the plan than what the actual query costs. Postgres is not smart enough to make the logical deductions you mention. Many other possible factors, too, depending on undisclosed information.
Be that as it may, this equivalent query is less complex and should be (much) faster:
One minor difference:
truth_id
is not output a second time. It's noise and many clients can't deal with duplicate column names in the result.If you don't need
truth_id
without data (as I suspect), you can simplify further:Be sure to have a btree index on
truth_data (id, epoch DESC NULLS LAST)
for best performance.NULLS LAST
may be unnecessary, depending on undisclosed information. See:More performance optimization may be possible, depending on undisclosed table definitions, cardinalities, value distribution / frequencies. See: