EXPDP snapshot too old live monitoring

expdporacle-11g-r2oracle-12csnapshot

The likely errors that raises "snapshot too old"s during datapump export are well documented all over the internet such as :

  • undo_retention parameter being too short
  • redo logs groups are too small or there are too few
  • undo tablespace is too small

Is there a way to monitor those failure cases while the job is running ?

The kind of solution I'm looking for would be a query that would report :

  • total percentage of the snapshot's undo tablespace usage until it needs to auto-extend or maxes out
  • total percentage of redo logs usage until it overwrites itself
  • time left until undo retention is reached by the snapshot

Any query that does part of these things would be welcome. Does looking up such information requires oracle analytic packages? Is it possible at all to query such a thing ?

Best Answer

You need to fully understand how undo works to troubleshoot this error. I recommend this blog post:

http://blog.oracle48.nl/wordpress/oracle-database-undo-space-explained/

You can use this query to see the current profile of your Undo tablespace(s):

SELECT DISTINCT STATUS,TABLESPACE_NAME, SUM(BYTES), COUNT(*)
    FROM DBA_UNDO_EXTENTS 
    GROUP BY STATUS, TABLESPACE_NAME;
  • ACTIVE extents contain uncommitted or currently-rolling-back transactions.
  • UNEXPIRED extents contain transactions required to be kept in the Undo tablespace to meet the undo_retention parameter
  • EXPIRED extents are transactions still in the Undo tablespace that are older than the undo_retention parameter.

The database will delete the oldest expired extents to make room for new active ones. If there are no expired extents, it will attempt to allocate a new extent. If that means extending the datafile, it will do that.

If the datafile cannot extend, the database will delete some Unexpired extents to make room, violating the undo_retention value. Thus, the undo_retention parameter is not a guarantee, but a guideline that the database will attempt to adhere to as best it can.

Now, there are two common failure modes here:

  1. If a query tries to read from a deleted extent, it will fail with ORA-01555 Snapshot Too Old

  2. If the whole undo tablespace becomes full of active undo, transactions will fail with ORA-30036 unable to extend segment in Undo tablespace

total percentage of the snapshot's undo tablespace usage until it needs to auto-extend or maxes out

If the sum of Active and Unexpired bytes in DBA_UNDO_EXTENTS is close to the size of the relevant undo tablespace, you have a chance of experiencing an ORA-01555.

If the sum of Active bytes is close to the size of the relevant undo tablespace, you are likely to experience an ORA-30036 error.

total percentage of redo logs usage until it overwrites itself

I'm not aware of anything you can change in the redo log configuration that will effect Snapshot Too Old errors. Where did you get this info?

time left until undo retention is reached by the snapshot

It is very difficult if not impossible to do this accurately, because it depends on every other transaction in the database. If you subtract the query running time from the undo_retention parameter, you will have a good idea, but that's only assuming that the undo tablespace was big enough to meet the undo_retention value. Once that line has been crossed, all bets are off.