I have to perform a join between table_a
and table_b
, using three fields as key being one of them the date of the event, say, both tables have id1
, id2
and evnt_time
for eache record.
As it happens though, the evnt_time
can be displaced in a few seconds between these tables. Thus the join has to behave sort of table_a.id1 = table_b.id1 and table_a.id2 = table_b.id2 and table_a.evnt_time ~ table_b.evnt_time
, where:
~
means approximately, given+
or-
seconds- There must be a way to give
~
parameters of how precise should be. e.g.:table_a.evnt_time ~(2) table_b.evnt_time
will jointable_a.evnt_time
withtable_b.evnt_time - 2s
,table_b.evnt_time - 1s
,table_b.evnt_time
,table_b.evnt_time + 1s
andtable_b.evnt_time + 2s
.
To tackle the situation, there are a few possibilities I am experimenting, but my doubt is on which architecture should I use to perform a very efficient "approximated join" – these tables are partitioned, sub-partitioned and each sub part may contain billions of records… (although I have a reasonable amount of resources).
For once, I thought of storing the unique sencond of the event (i.e. second it happened since julian calendar) on both sides so the real join (give a "~(2)
") could simply look like:
select *
from
table_a,
table_b
where
table_a.id1 = table_b.id1
and table_a.id2 = table_b.id2
and (table_a.evnt_sec = table_b.evnt_sec
or table_a.evnt_sec = table_b.evnt_sec + 1
or table_a.evnt_sec = table_b.evnt_sec + 2
or table_a.evnt_sec = table_b.evnt_sec - 1
or table_a.evnt_sec = table_b.evnt_sec - 2)
But I am unsure of how efficiently would that perform with the indexes and scans..
This is just an example, I am not stuck to any sort of architecture.
Also, I am using Oracle 11gR2.
Best Answer
I think that what you need to use is a Nonequi join