How to find what is contributing the most to “ORA-01555: snapshot too old” error

oracle

A customer has reported this error more then once in the last three months. What I'm trying to do is figuring out how to identify the procedures within the application that contributed the most in changing database blocks when this error is raised (or eventually other activities on the database from tools not strictly related to the application).

For the sake of clarity, before acting on UNDO size and UNDO RETENTION parameters, I'd like to investigate the application in order to identify bad practices that can raise the probability of hitting this error.

What I know for sure is:

  1. the statement that errored is not the root cause, it's just a victime: here a bit of SQL tuning may make it run faster and avoid the issue
  2. frequent commits (sort of loop like approach) made by concurrent session on tha same tables used by the errored statement, may contribute to hit this problem

How can I investigate the Oracle database in order to identify the procedures at the point #2 ? What Oracle dynamic views or diagnostic views can help me in this quest?

We're facing this porblem on Oracle 12.1 with Diagnostic and Tuning Pack

Best Answer

What I know for sure is ... the statement that errored is not the root cause, it's just a victim

Probably, but by no means certainly.

First thing to understand is that Oracle absolutely, totally, guaranteed-edly will not allow a Dirty Read. Ever. Each and every query that it runs is Point-in-Time consistent with the moment (System Change Number) that the query started.
To achieve this, Oracle uses Undo to effectively "roll-back", in-memory, any changes made to the data blocks since the query started. Even if you're table-scanning a billion row table, any records added to the end of the table while the query is running will not appear in your results.

Obviously, Oracle needs the Undo to achieve this and, if those particular Undo entries get removed from the Undo Tablespace, then Bang! ORA-01555.

Now, as you quite rightly say, lots of stuff could be making changes and "using up" all the Undo space but, almost as likely, is that the query is taking so long to do whatever it's doing that the Undo get aged out naturally.

How long does this query take to run?

  • Seconds? OK, that would be a big worry.
  • Minutes? This is the most likely scenario and, in the first instance, I would suggest you look into tuning the query, if possible.
  • Hours? What did you expect would happen? :-)