The query causing me the issue is too big, but the piece that seems to be the core is quite simple, I will try to go with that:
The query has an structure like:
SELECT
... a lot of stuff ...
WHERE
... lots of complex clauses ...
AND my_id in ( select my_id from small_table where .. something simple ..)
And that has a cardinality of a few billions, bytes read around 12gb and explodes my temp.
However, if I execute the select my_id from small_table
, which yields (always) 7 records, take these records and change the query to:
SELECT
... a lot of stuff ...
WHERE
... lots of complex clauses ...
AND my_id in ( 1, 2, 3, 4, 5, 6, 7 )
(I mean, values hardcoded)
Cost, cardinality, and bytes read drop dramatically and the query executes in a few minutes.
Now, I have tried to isolate the "small query" in a with
clause, tried to use an join instead a sub query and nothing… the result is always the same.
Why is it this way and how could I possibly prevent it from happening?
Maybe worth mentioning that in both cases (fast and slow) the costly part of the query is a FTS on one of the big tables used in the join.
Also, I am using Oracle 11gR2
[EDIT] These are the explain plans of the two example executions
The bad one. Notice I didn't use in ( )
, but rather a simple join adding small_table
to the from
clause.
Plan
SELECT STATEMENT ALL_ROWSCost: 5,736,441
22 HASH JOIN RIGHT SEMI Cost: 5,736,441 Bytes: 52,324,480 Cardinality: 158,080
11 VIEW VIEW VW_NSO_1 Cost: 20 Bytes: 13 Cardinality: 1
10 NESTED LOOPS
8 NESTED LOOPS Cost: 18 Bytes: 91 Cardinality: 1
6 NESTED LOOPS Cost: 4 Bytes: 70 Cardinality: 1
4 TABLE ACCESS BY INDEX ROWID TABLE MYUSER.SMALL_TABLE Cost: 1 Bytes: 35 Cardinality: 1
3 INDEX UNIQUE SCAN INDEX (UNIQUE) MYUSER.PK_SMALL_TABLE Cost: 0 Cardinality: 1
2 TABLE ACCESS BY INDEX ROWID TABLE MYUSER.SMALL_TABLE Cost: 2 Bytes: 27 Cardinality: 1
1 INDEX RANGE SCAN INDEX MYUSER.IDX_SMALL_TABLE_1ATUAL Cost: 1 Cardinality: 6
5 TABLE ACCESS FULL TABLE MYUSER.SMALL_TABLE Cost: 3 Bytes: 35 Cardinality: 1
7 INDEX RANGE SCAN INDEX (UNIQUE) MYUSER.PK_MEDIUM_TABLE Cost: 1 Cardinality: 53
9 TABLE ACCESS BY INDEX ROWID TABLE MYUSER.MEDIUM_TABLE Cost: 14 Bytes: 420 Cardinality: 20
21 HASH JOIN Cost: 5,736,193 Bytes: 15,281,925,342 Cardinality: 48,056,369
19 NESTED LOOPS
17 NESTED LOOPS Cost: 951,151 Bytes: 500,185,440 Cardinality: 1,736,755
15 NESTED LOOPS Cost: 3 Bytes: 792 Cardinality: 22
13 TABLE ACCESS BY INDEX ROWID TABLE MYUSER.SMALL_TABLE Cost: 2 Bytes: 27 Cardinality: 1
12 INDEX RANGE SCAN INDEX MYUSER.IDX_SMALL_TABLE_1ATUAL Cost: 1 Cardinality: 6
14 INDEX RANGE SCAN INDEX (UNIQUE) MYUSER.PK_MEDIUM_TABLE Cost: 1 Bytes: 477 Cardinality: 53
16 INDEX RANGE SCAN INDEX MYUSER.IDX_HUGE_TABLE_1 Cost: 18,849 Cardinality: 1,322,763
18 TABLE ACCESS BY INDEX ROWID TABLE MYUSER.HUGE_TABLE Cost: 413,818 Bytes: 19,620,720 Cardinality: 77,860
20 TABLE ACCESS FULL TABLE MYUSER.HUGE_TABLE Cost: 3,958,129 Bytes: 12,063,594,150 Cardinality: 402,119,805
The good one. All I did here was taking the result of select my_id from small_table where .. something simple ..
(seven records), and added an my_id in ( 1, 2, 3, 4, 5, 6, 7)
to the end of the big query. Same as described:
Plan
SELECT STATEMENT ALL_ROWSCost: 4,558,125
18 HASH JOIN Cost: 4,558,125 Bytes: 36,100,625 Cardinality: 122,375
16 NESTED LOOPS
14 NESTED LOOPS Cost: 413,809 Bytes: 5,271,671 Cardinality: 122,597
12 VIEW VIEW VW_NSO_1 Cost: 20 Bytes: 13 Cardinality: 1
11 HASH UNIQUE Bytes: 91 Cardinality: 1
10 NESTED LOOPS
8 NESTED LOOPS Cost: 18 Bytes: 91 Cardinality: 1
6 NESTED LOOPS Cost: 4 Bytes: 70 Cardinality: 1
4 TABLE ACCESS BY INDEX ROWID TABLE MYUSER.SMALL_TABLE Cost: 1 Bytes: 35 Cardinality: 1
3 INDEX UNIQUE SCAN INDEX (UNIQUE) MYUSER.PK_SMALL_TABLE Cost: 0 Cardinality: 1
2 TABLE ACCESS BY INDEX ROWID TABLE MYUSER.SMALL_TABLE Cost: 2 Bytes: 27 Cardinality: 1
1 INDEX RANGE SCAN INDEX MYUSER.IDX_SMALL_TABLE_1ATUAL Cost: 1 Cardinality: 5
5 TABLE ACCESS FULL TABLE MYUSER.SMALL_TABLE Cost: 3 Bytes: 35 Cardinality: 1
7 INDEX RANGE SCAN INDEX (UNIQUE) MYUSER.PK_MEDIUM_TABLE Cost: 1 Cardinality: 53
9 TABLE ACCESS BY INDEX ROWID TABLE MYUSER.MEDIUM_TABLE Cost: 14 Bytes: 420 Cardinality: 20
13 INDEX RANGE SCAN INDEX MYUSER.IDX_HUGE_TABLE_1 Cost: 18,849 Cardinality: 1,322,763
15 TABLE ACCESS BY INDEX ROWID TABLE MYUSER.HUGE_TABLE Cost: 413,788 Bytes: 3,677,910 Cardinality: 122,597
17 TABLE ACCESS FULL TABLE MYUSER.HUGE_TABLE Cost: 3,962,804 Bytes: 3,655,562,652 Cardinality: 14,506,201
hope it helps.
Best Answer
Most likely your tables have up-to-date statistics but sometimes the optimizer is baffled because it oversimplifies the cardinality estimation.
This seems to be a good candidate for dynamic sampling. In its default value (2 in 10g and 11g), dynamic sampling will only be used if one of the table has no statistics. In your case you would need to change its value to be able to let the optimizer collect statistics to build a better plan.
I suggest you use the
DYNAMIC_SAMPLING
hint that will let you modify the optimizer behaviour for a single query. I tested with a subquery and you need to use one of the following syntax:the full hint directly on the top of the query, this will sample all tables, which will definitely work but may take too much time.
the full hint on the subquery:
the hint with a query block name:
The second and third option should produce the same result: sampling only on one table.