Oracle says about Indexes and Index-Organized Tables under Full Index Scan: In a full index scan, the database reads the entire index in order.
Yet, unter Fast Full Index Scan, it reads: A fast full index scan is a full index scan in which the database accesses the data in the index itself without accessing the table, and the database reads the index blocks in no particular order. (Emphasis mine)
Now, probably, the question should be: why did the optimzier choose INDEX FAST FULL SCAN
over INDEX FULL SCAN
.
A hint to the answer of the latter question is given in under 11.2.3.7 Fast Full Index Scans: A fast full scan is faster than a normal full index scan because it can use multiblock I/O and can run in parallel just like a table scan.
If you insist that Oracle use a full index scan, you might want to try the /*+ index() */
hint:
create table tq84_foo (
fooID number not null
);
create table tq84_bar (
fooID number not null
);
create unique index ix_foo on tq84_foo(fooID);
create unique index ix_bar on tq84_bar(fooID);
explain plan for
select /*+ index(f ix_foo) */fooID from tq84_foo f
MINUS
select /*+ index(b ix_bar) */ fooID from tq84_bar b;
select * from table(dbms_xplan.display);
resulting in
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26 | 4 (75)| 00:00:01 |
| 1 | MINUS | | | | | |
| 2 | SORT UNIQUE NOSORT| | 1 | 13 | 2 (50)| 00:00:01 |
| 3 | INDEX FULL SCAN | IX_FOO | 1 | 13 | 1 (0)| 00:00:01 |
| 4 | SORT UNIQUE NOSORT| | 1 | 13 | 2 (50)| 00:00:01 |
| 5 | INDEX FULL SCAN | IX_BAR | 1 | 13 | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------
Anyone know why this task is costed at 0.172434 in the first plan but 3.01702 in the second?
Generally speaking, an inner side seek below a nested loops join is costed assuming a random I/O pattern. There is a simple replacement-based reduction for subsequent accesses, accounting for the chance that the required page has already been brought into memory by a previous iteration. This basic assessment produces the standard (higher) cost.
There is another costing input, Smart Seek Costing, about which little detail is known. My guess (and that is all it is at this stage) is that SSC attempts to assess inner side seek I/O cost in more detail, perhaps by considering local ordering and/or the range of values to fetch. Who knows.
For example, the first seeking operation brings in not just the requested row, but all rows on that page (in index order). Given the overall access pattern, fetching the 1000 rows in 1000 seeks requires only 2 physical reads, even with read-ahead and prefetching disabled. From that perspective, the default I/O costing represents a significant overestimate, and the SSC-adjusted cost is closer to reality.
It seems reasonable to expect that SSC would be most effective where the loop drives an index seek more or less directly, and the join outer reference is the basis of the seeking operation. From what I can tell, SSC is always attempted for suitable physical operations, but most often produces no downward adjustment when the seek is separated from the join by other operations. Simple filters are one exception to this, perhaps because SQL Server can often push these into the data access operator. In any case, the optimizer has pretty deep support for selections.
It is unfortunate that the Compute Scalar for the subquery outer projections seems to interfere with SSC here. Compute Scalars are usually relocated above the join, but these ones have to stay where they are. Even so, most normal Compute Scalars are pretty transparent to optimization, so this is a bit surprising.
Regardless, when the physical operation PhyOp_Range
is produced from a simple selection on an index SelIdxToRng
, SSC is effective. When the more complex SelToIdxStrategy
(selection on a table to an index strategy) is employed, the resulting PhyOp_Range
runs SSC but results in no reduction. Again, it seems that simpler, more direct operations work best with SSC.
I wish I could tell you exactly what SSC does, and show the exact calculations, but I don't know those details. If you want to explore the limited trace output available for yourself, you can employ undocumented trace flag 2398. An example output is:
Smart seek costing (7.1) :: 1.34078e+154 , 0.001
That example relates to memo group 7, alternative 1, showing a cost upper bound, and a factor of 0.001. To see cleaner factors, be sure to rebuild the tables without parallelism so the pages are as dense as possible. Without doing that, the factor is more like 0.000821 for your example Target table. There are some fairly obvious relationships there, of course.
SSC can also be disabled with undocumented trace flag 2399. With that flag active, both costs are the higher value.
Best Answer
The optimal index for that query is
INDEX(faculty, wam)
, in that order.MySQL does not (in most situations) use two different indexes. However, it will use a 'composite' index to good use in some situations, such as this.
Without knowing how many rows have
Arts
and home many have> 75
, you cannot accurately conclude which ofINDEX(wam)
orINDEX(faculty)
would be better. Nor can the Optimizer, since it's statistics gathering is not very good.SHOW INDEXES
andEXPLAIN SELECT
provide numbers that come from the same statistics, so you won't necessarily come to the optimal conclusion.A "Hash" index (which InnoDB does not have) is essentially no better than a BTree index. A Hash index is useless for a range (eg,
> 75
). (Hence, MySQL uses BTrees, and does not provide Hash indexes.)Another point: If more than a small percentage of the table has
wam > 75
,INDEX(wam)
will be shunned; scanning the entire table is likely to be faster than bouncing between the index's BTree and the data BTree. Again> 75
and60 pages
and10
and100
give no good clues of which path the Optimizer should take. The answer to the quoted question is "it depends" or "not enough info".You could tackle the question by presenting two answers: One where very few rows have
wam > 75
, and one where most meet that predicate.Has your instructor had any experience using databases, or only textbooks?
More discussion on devising optimal indexes: http://mysql.rjweb.org/doc.php/index_cookbook_mysql