Large cost change in execution-plan why

execution-planoracleoracle-10g-r2

I had a query change execution plans for an unknown reason. The stats were gathered about 24 hours before the change and the rate of data increase has not changed over the period in question. Here is the original GOOD execution plan:
GOOD return about 428 row per run

SELECT STATEMENT Optimizer=ALL_ROWS (Cost=9)
SORT (ORDER BY) (Cost=9 Cardinality=1 Bytes=82 Time=1)
TABLE ACCESS (BY INDEX ROWID) OF DE.MY_TABLE (TABLE) (Cost=8 Cardinality=1 Bytes=82 Time=1)
INDEX (RANGE SCAN) OF DE.MY_TABLE_MY_COL_01_DATE (INDEX) (Cost=3 Cardinality=6 Time=1)

Here is the UGLY plan, FTS on 73 million rows spinning the CPU near 100% still returns about 400 rows

SELECT STATEMENT Optimizer=ALL_ROWS (Cost=4911)
SORT (ORDER BY) (Cost=4911 Cardinality=32894 Bytes=2697308 Time=59)
TABLE ACCESS (FULL) OF DE.MY_TABLE (TABLE) (Cost=4278 Cardinality=32894 Bytes=2697308 Time=52)

After rebuilding the index, creating a covering index and that not working we finally added a hint to use the original index and that worked well. This restored previous timings. Working well the query runs in 22ms. The Ugly plan ran at about 14 sec.

After the query hint this is the query plan:

SELECT STATEMENT Optimizer=ALL_ROWS (Cost=56571 Cardinality=13697 Bytes=1164245)
SORT (ORDER BY) (Cost=56571 Cardinality=13697 Bytes=1164245)
TABLE ACCESS (BY INDEX ROWID) OF MY_TABLE (TABLE) (Cost=56297 Cardinality=13697 Bytes=1164245)
INDEX (RANGE SCAN) OF MY_TABLE_MY_COL_01_DATE (INDEX) (Cost=521 Cardinality=62447)

Notice the Cost and Cardinality of the 1st and the last plans both using the same index. In the 1st the Cardinality is wrong (1). The query returns a pretty consistent 400 rows. In the last plan the Cardinality is 13697. That's wrong too. But look at the Cost of the last plan 56571, no wonder it took a hint to force the use of the working index.

My research is pointing to a Statistics problem. I verified that the GATHER_STATS_JOB is running. I've used the dba_tab_stats_history table to verify the stats are actually updated on the table I'm interested in.

So my questions are: Why would the Cardinality and Cost of the query change so dramatically? If it is how the statistics are gathered (I am using gather_database_stats(auto)) where should I start reading what Stats gathering options to use?

If it is unlikely the statistics where else should I look?

I'm running Oracle Database 10g Release 10.2.0.4.0 – 64bit Production

EDIT NEW PLAN 03/21/2018
The latest status is the system no longer considers the hint index expensive.
Here is the latest plan detail:

-----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                   | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                        |    41 |  3444 |   998   (1)| 00:00:12 |
|   1 |  SORT ORDER BY               |                        |    41 |  3444 |   998   (1)| 00:00:12 |
|*  2 |   TABLE ACCESS BY INDEX ROWID| MY_TABLE               |    41 |  3444 |   997   (1)| 00:00:12 |
|*  3 |    INDEX RANGE SCAN          | MY_TABLE_MY_COL_01_DATE|  1128 |       |    12   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("FO"."STATUSID"<6800 AND "FO"."PARENTID" IS NULL AND "FO"."DELETED"<>1)
   3 - access("FO"."MY_COL_01"=:AGENCY AND "FO"."EVENTDATE">SYSDATE@!-:TIMENUMBER AND 
          "FO"."EVENTDATE" IS NOT NULL)

Best Answer

As far as I remember, 10g had issues with bind peeking, and symptoms are very similar to what you have. I suspect that good plan picks index (MY_TABLE_MY_COL_01_DATE ) for fo.eventdate>sysdate - :2 when value of variable was small, or it was very few rows that satisfy that condition according to statistics. Then either due to statistics change, or you ran it with different value of variable, or sql got evicted from SGA, either it generated a new plan . You might try to check V$SQL_SHARED_CURSOR , it shows why a new cursor was generated. In theory, optimizer should switch between FTS and index scan, but in my memory it never happened in 10g - once it chose FTS, it would keep using it. If upgrading Oracle is not an option, I'd try using literal in this particular case instead of bind variables, i.e. fo.eventdate>sysdate - 1 . (or maybe combine literals and bind variables to help optimizer, say if it's never greater than 10 days you can do fo.eventdate>sysdate - 10 and fo.eventdate>sysdate - :2