Oracle Query execution finishes first time but hangs up from second run

oracleoracle-11gview

We have a very complex view that finishes execution only on first time, but hungs up when run from the same session second time.

complex view – view with nested views, lots of inner and outer joins, multiple union all etc… But no locking statements, only pure read.

$ sqlplus sys/system as sysdba;
SQL> SELECT * FROM v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0  Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production

SQL> alter session set current_schema = DB_2017_2018;

SQL> alter session set NLS_DATE_FORMAT = 'YYYY-MM-DD';

SQL> select 1 from complex_view where company_id = 1 and invoice_month = 1 and invoice_year = 2018;
-- runs in 33 sec

SQL> select 1 from complex_view where company_id = 1 and invoice_month = 1 and invoice_year = 2018;
-- hangs up.. Ctrl + C after 3 minutes

SQL> select * from complex_view where company_id = 1 and invoice_month = 1 and invoice_year = 2018;
-- runs in 1m 30 sec

SQL> select * from complex_view where company_id = 1 and invoice_month = 1 and invoice_year = 2018;
-- hangs up.. Ctrl + C after 5 - 10 minutes

-- ALL THESE COMMANDS / QUERIES RUN IN SAME SESSION

I figured that running the exact same query second time is when the problem occurs. If I simply change any select column or where condition, then it runs for first time and hangs up from second time (see the above commands).

Suspect 1

One thing I suspect from this is may be it's related to oracle's cache. So when I run the following commands before each query execution, I didn't get any query hang up.

SQL> alter system flush shared_pool;
SQL> alter system flush buffer_cache;

Suspect 2 (root cause might be Suspect 1)

Once any query hangs up, even if I run the same query in another session, it still hangs up. So If I kill the active sessions that's running the hang up query, then I can run the query successfully only for first time but stuck with same problem from second run.

SQL> ALTER SYSTEM KILL SESSION '12,256' IMMEDIATE;

AFAIK, only thing I can be sure is that something is badly wrong with oracle's buffer_cache and shared_pool in our version.

This problem doesn't happen for tables or simple views, only for this complex view (didn't tested other complex views). Also we were facing this kind of strange issue only in last 2 – 3 weeks. It's hosted in cloud (AWS EC2), so it may be related to spectre / meltdown patches by Amazon as that's what changed in last 2 – 3 weeks (unrelated ?!). Or might be due to some db data threshold level reached in that period.

Ready to provide any other analysis or metrics

Best Answer

This is quite typical on 11.2 because of a feature called "cardinality feedback". Check the execution plan of the slow runs, and if you see "cardinality feedback" in the Notes section of the execution plan, try setting the below parameter: alter session set "_optimizer_use_feedback"=false;, and run your tests again.

Some bugs related to this:

Bug 12557401 - Suboptimal plan on 2nd execution caused by cardinality feedback (Doc ID 12557401.8)

Bug 16837274 - Cardinality feedback produces poor subsequent plan (Doc ID 16837274.8)

Bug 8608703 - SubOptimal Execution Plan created by Cardinality Feedback (Doc ID 8608703.8)

Bug 8521689 - SubOptimal execution plan on second execution of GROUP BY query (Doc ID 8521689.8)