It seems the CBO does not consider a skip scan at all with dynamic
sampling, is this true?
Actually this is really easy to verify, you can do this by enabling 10053 trace. You will see that the optimizer does not even consider skip scan at all. The reason for this, is the "_optimizer_skip_scan_guess" parameter. The default value for this parameter is FALSE, meaning the optimizer will not consider skip scan when all it has is "guessed" selectivity, which is the case with dynamic sampling.
If you set "_optimizer_skip_scan_guess" to TRUE, skip scan will be considered, this can be also confirmed with the 10053 trace again.
PS: your db_file_multiblock_read_count parameter seems to be lower than the default value. On my 11.2.0.4 sandbox, with the default value of 128, after collecting statistics on the table, index FFS had about one-third of the cost of index SS.
Edit: added output
SQL> create table foo( bar integer, baz integer, qux char(99), constraint pk_foo primary key (bar, baz) ) organization index compress;
Table created.
SQL> insert into foo(bar,baz) select mod(level,1000), level from dual connect by level<1000000;
999999 rows created.
SQL> commit;
Commit complete.
Original:
SQL> explain plan for
select * from foo where baz=1
union all
select /*+ full(foo) */ * from foo where baz=1
union all
select /*+ index_ss(foo) */ * from foo where baz=1; 2 3 4 5 6
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1715140356
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 66 | 8382 | 1302 (1)| 00:00:16 |
| 1 | UNION-ALL | | | | | |
|* 2 | INDEX FAST FULL SCAN| PK_FOO | 22 | 2794 | 546 (1)| 00:00:07 |
|* 3 | INDEX FAST FULL SCAN| PK_FOO | 22 | 2794 | 546 (1)| 00:00:07 |
|* 4 | INDEX SKIP SCAN | PK_FOO | 22 | 2794 | 211 (0)| 00:00:03 |
--------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("BAZ"=1)
3 - filter("BAZ"=1)
4 - access("BAZ"=1)
filter("BAZ"=1)
Note
-----
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
- dynamic sampling used for this statement (level=2)
Enable skip scan for guessed selectivity:
SQL> alter session set "_optimizer_skip_scan_guess"=true;
Session altered.
SQL> explain plan for
select * from foo where baz=1
union all
select /*+ full(foo) */ * from foo where baz=1
union all
select /*+ index_ss(foo) */ * from foo where baz=1; 2 3 4 5 6
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3033162421
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 66 | 8382 | 772 (1)| 00:00:10 |
| 1 | UNION-ALL | | | | | |
|* 2 | INDEX SKIP SCAN | PK_FOO | 22 | 2794 | 211 (0)| 00:00:03 |
|* 3 | INDEX FAST FULL SCAN| PK_FOO | 22 | 2794 | 350 (1)| 00:00:05 |
|* 4 | INDEX SKIP SCAN | PK_FOO | 22 | 2794 | 211 (0)| 00:00:03 |
--------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("BAZ"=1)
filter("BAZ"=1)
3 - filter("BAZ"=1)
4 - access("BAZ"=1)
filter("BAZ"=1)
Note
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
-----
- dynamic sampling used for this statement (level=2)
24 rows selected.
Just for completeness:
SQL> analyze table foo compute statistics;
Table analyzed.
SQL> explain plan for
select * from foo where baz=1
union all
select /*+ full(foo) */ * from foo where baz=1
union all
select /*+ index_ss(foo) */ * from foo where baz=1; 2 3 4 5 6
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1715140356
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 30 | 1717 (2)| 00:00:21 |
| 1 | UNION-ALL | | | | | |
|* 2 | INDEX FAST FULL SCAN| PK_FOO | 1 | 10 | 357 (3)| 00:00:05 |
|* 3 | INDEX FAST FULL SCAN| PK_FOO | 1 | 10 | 357 (3)| 00:00:05 |
|* 4 | INDEX SKIP SCAN | PK_FOO | 1 | 10 | 1002 (0)| 00:00:13 |
--------------------------------------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("BAZ"=1)
3 - filter("BAZ"=1)
4 - access("BAZ"=1)
filter("BAZ"=1)
19 rows selected.
I have a big Index Organized Table (IOT) with 74 columns
Why do you think that this table is a candidate for an IOT? IMHO, IOT's are meant to be narrow tables that don't change often. I have done testing with narrow tables and have found IOT's to do inserts at a slower rate than a traditional table with indexes. You may want to start by recreating this as a traditional table and then think about what indexes you want on the table. Also there is a note in this article on IOT's. Performance Tuning Guide
Best Answer
Output is from a 11.2.0.4.6 Enterprise Edition database on Oracle Linux 7.1 x86-64 platform.
Lets start with question 2 and an easy example.
DISTINCT and GROUP BY are handled differently: the optimizer is able to completely eliminate a DISTINCT under certain circumstances, but it can not do the same with GROUP BY. Here is an example:
Note that setting statistics_level to ALL significantly increases the execution time for the original queries in the question.
The 2 queries will be:
This is what we expect. Full table scan and HASH UNIQUE for DISTINCT, HASH GROUP BY for GROUP BY. Now add a NOT NULL constraint and an index.
The optimizer noticed the index on the relevant column, and because of the NOT NULL constraint it is able to use it avoid sorting the data for producing the unique values, because the data is already sorted in the index. Now add a UNIQUE constraint to this column:
Notice how Oracle completely skipped SORT/HASH UNIQUE when using DISTINCT, but nothing changed with the GROUP BY query.
Enable tracing the optimizer when running the DISTINCT version:
In the trace file we can see the following:
Note that this is a Query Transformation, but NOT a Cost-based Query Transformation. As you can see from the "Final query" part, the optimizer removed DISTINCT from the query. But there is no such optimization for GROUP BY. DISTINCT is used for retrieving distinct values, but GROUP BY is used for producing aggregates, not just distinct values. Even if the optimizer can skip sorting or hashing the data, it can not skip counting, adding, calculating the average, etc, and this is the important difference, so DISTINCT and GROUP BY are not handled in the same way (even if aggregates are not specified).
Another case of eliminating the DISTINCT part, when it is obviously unnecessary, for example:
This was set to prevent Complex View Merging Transformations like DISTINCT Placement, GROUP BY Placement, Subquery Unnesting appearing and making things complicated.
Notice how the optimizer completetely eliminated the DISTINCTs in the first and second queries, but not the GROUP BYs.
Unfortunately in these cases, since the DISTINCT elimination happens in subqueries ("views"), this information is not present in the optimizer trace, just like for the original queries in the question.
So now we know that DISTINCT and GROUP BY are handled indeed differently, lets go back to question 1.
To be continued in the next post...
(Both answers together exceed the 30000 characters limit.)