“ORA-03113: end-of-file on communication channel” on startup

oraclestartup

I have been reading posts here, on Oracle support, and anywhere else I can find for the last three days and I've given up on this problem…

An Oracle database hung. Shutdown of the database sat for a few hours and then it quit. It wouldn't restart. The server was restarted. Oracle was restarted. Going step by step: startup nomount works, alter database mount works, alter database open returns ORA-03113. This is all on localhost – not over the network. The machine has no firewall of any kind running.

Any idea how to get past this ORA-03113 error? I've been on the phone with support in India for the last 4.5 hours and I haven't found anyone helpful yet.

Best Answer

After hours of misdirection from official Oracle support, I dove into this on my own and fixed it. I am documenting it here in case someone else has this problem.

To do any of this, you must be the oracle user:

$ su - oracle

Step 1: You need to look at the alert log. It isn't in /var/log as expected. You have to run an Oracle log reading program:

$ adrci
ADRCI: Release 11.2.0.1.0 - Production on Wed Sep 11 18:27:56 2013
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
ADR base = "/u01/app/oracle"
adrci>

Notice the ADR base. That is not the install. You need to see the homes so you can connect to the one that you use.

adrci> show homes
ADR Homes:
diag/rdbms/cci/CCI
diag/tnslsnr/cci/listener
diag/tnslsnr/cci/start
diag/tnslsnr/cci/reload

CCI is the home. Set that.

adrci> set home diag/rdbms/cci/CCI
adrci>

Now, you can look at the alert logs. It would be very nice if they were in /var/log so you could easily parse the logs. Just stop wanting and deal with this interface. At least you can tail (and I hope you have a scrollback buffer):

adrci> show alert -tail 100

Scroll back until you see errors. You want the FIRST error. Any errors after the first error are likely being caused by the first error. In my case, the first error was:

ORA-19815: WARNING: db_recovery_file_dest_size of 53687091200 bytes is 100.00% used, and has 0 remaining bytes available.

This is caused by transactions. Oracle is not designed to be used. If you do push a lot of data into it, it saves transaction logs. Those go into the recovery file area. Once that is full (50GB full in this case). Then, Oracle just dies. By design, if anything is messed up, Oracle will respond by shutting down.

There are two solutions, the proper one and the quick and dirty one. The quick and dirty one is to increase db_recovery_file_dest_size. First, exit adrci.

adrci> exit

Now, go into sqlplus without opening the database, just mounting it (you may be able to do this without mounting the database, but I mount it anyway).

$ sqlplus /nolog
SQL*Plus: Release 11.2.0.1.0 Production on Wed Sep 11 18:40:25 2013
Copyright (c) 1982, 2009, Oracle. All rights reserved.
SQL> connect / as sysdba
Connected.
SQL> startup mount

Now, you can increase your current db_recovery_file_dest_size, increased to 75G in my case:

SQL> alter system set db_recovery_file_dest_size = 75G scope=both

Now, you can shutdown and startup again and that previous error should be gone.

The proper fix is to get rid of the recovery files. You do that using RMAN, not SQLPLUS or ADRCI.

$ rman
Recovery Manager: Release 11.2.0.1.0 - Production on Wed Sep 11 18:45:11 2013
Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.
RMAN> backup archivelog all delete input;

If you've got RMAN-06171: not connected to target database, than try to use rman target / instead of just rman

Wait a long time and your archivelog (that was using up all that space) will be gone. So, you can shutdown/startup your database and be back in business.