My query is:
SELECT category, count(*)
FROM large_data_table t
WHERE
(t.val IN (SELECT val FROM SMALL_TABLE WHERE id = 1))
OR
(t.val IN (SELECT val FROM SMALL_TABLE WHERE id = 2))
GROUP BY GROUPING SETS ((category), ());
The resulting execution plan is like this:
SELECT STATEMENT
SORT (GROUP BY ROLLUP)
FILTER
large_data_table TABLE ACCESS (FULL)
SMALL_TABLE TABLE ACCESS (BY INDEX ROWID)
SMALL_TABLE INDEX (RANGE SCAN)
SMALL_TABLE TABLE ACCESS (BY INDEX ROWID)
SMALL_TABLE INDEX (RANGE SCAN)
Where FILTER is, essentially, NESTED LOOP JOIN which never finishes as NESTED LOOPS often do on large tables. (the two subqueries on the small table return, say, 100 rows each).
I am looking for a way to tell Oracle to NOT use the stupid FILTER here. I've tried sticking USE_HASH hint everywhere including some places I'd rather not talk about now….
GROUPING SETS seem to play a role in this, but getting rid of them is very difficult here. The one reasonable thing that works is to replace OR with a UNION, but it doesn't cover all cases either.
Best Answer
On 11R2 I usually start with gathering more statistics for the tables:
If that helps then table preferences for automatic jobs can be set:
If large_data_table is really big (tens GB and more) then 1% or something like that may be needed.
And do not believe that in dba_tables sample_size=num_rows. For big tables actual auto sample size is much much lower. I had the SR with Oracle about that. They found actual sample percentage only from session trace file. It was ~0.004% for 170GB table.