Oracle – large table with large IN

optimizationoracleselect

Table students has about 10 million records; ID is indexed.
Student grades has 20 million records – student_id is indexed.
I am querying about 20,000 students by their ids ids with a join between students and parents:

select * from students s left join grades g on s.id=g.student_id 
where (s.id IN (s1, s2... s1000)
or (s.id IN (s1001, s1002... s2000)
or (s.id IN (s.2001, s2002...s3000) 
//until s20000)

I need to split the INs into multiple batches as IN can only get 1000 or less values.

The query takes about 5 minutes to return. Is there any way I can optimize it?

Thanks!

Best Answer

Data Comes from External Source

The IN clause is limited to 1000 values. Don't use it for such searches.

Workaround:

  • Use a GTT (Global Temporary Table)
  • insert "values of interest" into that GTT
  • Modify the SELECT statement to JOIN against the GTT.

Example

Create the GTT

create global temporary table students_gtt (
  student_id int primary key
);

Insert values into the GTT

insert into students_gtt
select level
from dual
connect by level <= 50000;

and JOIN it in the SELECT statement

select
    *
from
    students       s
    left join grades         g on s.id = g.student_id
    join students_gtt   g on s.student_id = g.student_id

Notes

  1. First, I suspect a good portion of your time is spent Hard Parsing your query.
    • I'm taking a 22,974,957 row table and left outer joining it to a 93,501,293 row table in addition to joining the 50,000 row GTT. The values return in seconds.
  2. Eventually, you are going to a run into a "node" limit for the parser.
  3. The GTT method does not require you to "hack your SQL statement".

Data from another SQL Statement?

Just include that SQL within the SELECT statement.