Efficient way to perform approximated search

oracleoracle-11g-r2performance

I have to perform a join between table_a and table_b, using three fields as key being one of them the date of the event, say, both tables have id1, id2 and evnt_time for eache record.

As it happens though, the evnt_time can be displaced in a few seconds between these tables. Thus the join has to behave sort of table_a.id1 = table_b.id1 and table_a.id2 = table_b.id2 and table_a.evnt_time ~ table_b.evnt_time, where:

  • ~ means approximately, given + or - seconds
  • There must be a way to give ~ parameters of how precise should be. e.g.: table_a.evnt_time ~(2) table_b.evnt_time will join table_a.evnt_time with table_b.evnt_time - 2s, table_b.evnt_time - 1s, table_b.evnt_time, table_b.evnt_time + 1s and table_b.evnt_time + 2s.

To tackle the situation, there are a few possibilities I am experimenting, but my doubt is on which architecture should I use to perform a very efficient "approximated join" – these tables are partitioned, sub-partitioned and each sub part may contain billions of records… (although I have a reasonable amount of resources).

For once, I thought of storing the unique sencond of the event (i.e. second it happened since julian calendar) on both sides so the real join (give a "~(2)") could simply look like:

select *
from
    table_a,
    table_b
where
    table_a.id1 = table_b.id1
    and table_a.id2 = table_b.id2
    and (table_a.evnt_sec = table_b.evnt_sec 
        or table_a.evnt_sec = table_b.evnt_sec + 1
        or table_a.evnt_sec = table_b.evnt_sec + 2
        or table_a.evnt_sec = table_b.evnt_sec - 1
        or table_a.evnt_sec = table_b.evnt_sec - 2)

But I am unsure of how efficiently would that perform with the indexes and scans..

This is just an example, I am not stuck to any sort of architecture.
Also, I am using Oracle 11gR2.

Best Answer

I think that what you need to use is a Nonequi join

select *
from
    table_a,
    table_b
where
    table_a.id1 = table_b.id1
    and table_a.id2 = table_b.id2
    and table_a.evnt_sec BETWEEN table_b.evnt_sec -2 and table_b.evnt_sec +2