EXPDP snapshot too old live monitoring

expdporacle-11g-r2oracle-12csnapshot

The likely errors that raises "snapshot too old"s during datapump export are well documented all over the internet such as :

undo_retention parameter being too short
redo logs groups are too small or there are too few
undo tablespace is too small

Is there a way to monitor those failure cases while the job is running ?

The kind of solution I'm looking for would be a query that would report :

total percentage of the snapshot's undo tablespace usage until it needs to auto-extend or maxes out
total percentage of redo logs usage until it overwrites itself
time left until undo retention is reached by the snapshot

Any query that does part of these things would be welcome. Does looking up such information requires oracle analytic packages? Is it possible at all to query such a thing ?

Best Answer

You need to fully understand how undo works to troubleshoot this error. I recommend this blog post:

http://blog.oracle48.nl/wordpress/oracle-database-undo-space-explained/

You can use this query to see the current profile of your Undo tablespace(s):

SELECT DISTINCT STATUS,TABLESPACE_NAME, SUM(BYTES), COUNT(*)
    FROM DBA_UNDO_EXTENTS 
    GROUP BY STATUS, TABLESPACE_NAME;

ACTIVE extents contain uncommitted or currently-rolling-back transactions.
UNEXPIRED extents contain transactions required to be kept in the Undo tablespace to meet the undo_retention parameter
EXPIRED extents are transactions still in the Undo tablespace that are older than the undo_retention parameter.

The database will delete the oldest expired extents to make room for new active ones. If there are no expired extents, it will attempt to allocate a new extent. If that means extending the datafile, it will do that.

If the datafile cannot extend, the database will delete some Unexpired extents to make room, violating the undo_retention value. Thus, the undo_retention parameter is not a guarantee, but a guideline that the database will attempt to adhere to as best it can.

Now, there are two common failure modes here:

If a query tries to read from a deleted extent, it will fail with ORA-01555 Snapshot Too Old
If the whole undo tablespace becomes full of active undo, transactions will fail with ORA-30036 unable to extend segment in Undo tablespace

total percentage of the snapshot's undo tablespace usage until it needs to auto-extend or maxes out

If the sum of Active and Unexpired bytes in DBA_UNDO_EXTENTS is close to the size of the relevant undo tablespace, you have a chance of experiencing an ORA-01555.

If the sum of Active bytes is close to the size of the relevant undo tablespace, you are likely to experience an ORA-30036 error.

total percentage of redo logs usage until it overwrites itself

I'm not aware of anything you can change in the redo log configuration that will effect Snapshot Too Old errors. Where did you get this info?

time left until undo retention is reached by the snapshot

It is very difficult if not impossible to do this accurately, because it depends on every other transaction in the database. If you subtract the query running time from the undo_retention parameter, you will have a good idea, but that's only assuming that the undo tablespace was big enough to meet the undo_retention value. Once that line has been crossed, all bets are off.

Hints

Just a quick summary of possible ways to provide you with additional information.

Job History

You might get additional information if you query the job history:

SELECT  /*+ RULE */
    A.JOB JOB#, 
    SCHEMA_USER MVIEW_OWNER,
    substr(DECODE(SUBSTR(WHAT,INSTR(WHAT,'.',1,2)+2,INSTR(WHAT,'”',1,4)-4-INSTR(WHAT,'.',1,2)+2),NULL,SUBSTR(WHAT,1,80), 
    SUBSTR(WHAT,INSTR(WHAT,'.',1,2)+2,INSTR(WHAT,'”',1,4)-4-INSTR(WHAT,'.',1,2)+2)),0,80) JOB_DESCRIPTION,
    to_char(LAST_DATE, 'yyyy-mm-dd hh24:mi:ss') LAST_RUN_DATE, 
    to_char(NEXT_DATE, 'yyyy-mm-dd hh24:mi:ss') NEXT_SCHED_RUN_DATE, 
    DECODE(BROKEN,'Y','YES','N','NO',' ') IS_BROKEN,
    FAILURES,  
    RUNNING IS_RUNNING, 
    B.SID SID 
FROM DBA_JOBS A LEFT OUTER JOIN 
    (SELECT /*+ RULE */ JOB,'YES' RUNNING,SID FROM DBA_JOBS_RUNNING ) B 
ON A.JOB = B.JOB 
ORDER BY SCHEMA_USER, JOB_DESCRIPTION;

Reference: My refresh materialized view is not refreshing – Part 1

Depending on what results you get, you might be able to look for additional information.

Rollback vs. Undo

Is your database running with Rollback Undo Segments? You might want to consider switching to the Undo Segments instead (10g+).

To enable SMU, set the UNDO_MANAGEMENT parameter to TRUE

Why?

Earlier releases of Oracle Database used rollback segments to store undo. Oracle9i introduced automatic undo management, which simplifies undo space management by eliminating the complexities associated with rollback segment management. Oracle strongly recommends (Oracle 9i and on words) to use undo tablespace (automatic undo management) to manage undo rather than rollback segments.

Reference: What is the difference between Rollback and Undo Tablespace?, section Undo vs. Rollback

Advisor Framework

If you are running in Undo Segements mode then you can retrieve advisory information about the Undo configuration from the system with the following script (Advisor Framework):

set serveroutput on

 DECLARE
  v VARCHAR2(300); 
 BEGIN
   v := dbms_undo_adv.undo_advisor(SYSDATE-1/1440, SYSDATE, 1);
   dbms_output.put_line(v);
 END;
 /

Reference: Oracle DBMS_UNDO_ADV

If you find your database doesn't have sufficient Undo Segments, then you might have to resize your UNDO tablespace to accommodate for the large amount of modifications/moves your MV is performing.

Best Answer

Related Solutions

Expdp error UDE-22303

Snapshot too old from materialized view refresh job

Hints

Job History

Rollback vs. Undo

Advisor Framework

Related Question