Oracle RAC not starting up after reboot of system

oracle

I have a 2 node RAC (11gR2) machine that was running fine till i did a hard reboot of both the RAC nodes as i had to extend the disk space of the vms they were running on.
I also deleted the archieve logs related to the RAC nodes.

My database unique name is "rac" and instance names are "rac1" and "rac2".

I tried running srvctl to get the database up. Here is the status output i got:

-bash-4.1$ srvctl status database -d RAC -v
Instance rac1 is running on node eng1. Instance status: Mounted (Closed).
Instance rac2 is running on node eng2. Instance status: Mounted (Closed).

However, the database instance rac1 and rac2 is in mounted state or shutdown state. Im not able to bring them to open state. The db instances are in mounted state when i connect to them through sqlplus but then they move to shutdown state after some time. Here, are my errors:

for rac1:

SQL> select status from v$instance;

STATUS
------------
MOUNTED

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 27913
Session ID: 31 Serial number: 5

(after some time ie 1 min, in the same session )

SQL> select status from v$instance;
ERROR:
ORA-03114: not connected to ORACLE


SQL> startup mount
ORA-24324: service handle not initialized
ORA-01041: internal error. hostdef extension doesn't exist

for rac2:

 SQL> select status from v$instance
  2  ;

STATUS
------------
MOUNTED

SQL>  alter database open;
 alter database open
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 9311
Session ID: 30 Serial number: 3

(after some time ie 1 min, in the same session )


SQL> select status from v$instance;
ERROR:
ORA-03114: not connected to ORACLE


SQL> startup mount;
ORA-24324: service handle not initialized
ORA-01041: internal error. hostdef extension doesn't exist
SQL>

I dont know if im missing out on setting some environment variables. This is the first time im bringing down the machines after RAC installation.

This is the alter log entries when i get ORA-03113 :

Reusing ORACLE_BASE from an earlier startup = /u01/app/oracle
Wed Apr 02 12:48:40 2014
ALTER SYSTEM SET local_listener='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=<my ip>)(PORT=1521))))' SCOPE=MEMORY SID='rac1';
ALTER DATABASE MOUNT /* db agent *//* {0:16:150} */
Successful mount of redo thread 1, with mount id 2431794449
Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
Lost write protection disabled
Completed: ALTER DATABASE MOUNT /* db agent *//* {0:16:150} */
Wed Apr 02 12:53:38 2014
alter database open
This instance was first to open
Picked broadcast on commit scheme to generate SCNs
Wed Apr 02 12:53:38 2014
LGWR: STARTING ARCH PROCESSES
Wed Apr 02 12:53:38 2014
ARC0 started with pid=31, OS id=28965
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Wed Apr 02 12:53:39 2014
ARC1 started with pid=32, OS id=28967
ARCH: Encountered disk I/O error 19502
ARCH: Closing local archive destination LOG_ARCHIVE_DEST_1: '/mnt/eng_rac_archive/1_135_840477924.dbf' (error 19502) (rac1)
ARCH: I/O error 19502 archiving log 1 to '/mnt/eng_rac_archive/1_135_840477924.dbf'
Errors in file /u01/app/oracle/diag/rdbms/rac/rac1/trace/rac1_ora_28960.trc:
ORA-16038: log 1 sequence# 135 cannot be archived
ORA-19502: write error on file "", block number  (block size=)
ORA-00312: online log 1 thread 1: '/mnt/eng_rac_control/rac/redo01.log'
USER (ospid: 28960): terminating the instance due to error 16038
Wed Apr 02 12:53:40 2014
System state dump requested by (instance=1, osid=28960), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/rac/rac1/trace/rac1_diag_28806.trc
Dumping diagnostic data in directory=[cdmp_20140402125340], requested by (instance=1, osid=28960), summary=[abnormal instance termination].
Instance terminated by USER, pid = 28960

I have set DB_UNIQUE_NAME, ORACLE_SID, ORACLE_BASE, ORACLE_HOME, GRID_SOFTWARE_LOCATION,PATH, LD_LIBRARY_PATH,CLASSPATH variables in my bash profile. Please Help !

Best Answer

Issue resolved.

The problem was that my NFS archive volume was full. I assumed that doing "rm -rf *" in the archive log volumen, would solve this issue. But the volume remained full even after removing all the dbf files. There was a hidden .snapshot folder with volume level snapshots that had grown in size. Thus my archive volume remained full and my db could not use this anymore. To solve this i logged on to my storage server (ONTAP) and removed the snapshots for the archive log volume ( as i didnt want any ). This ends up being a storage management solution.

Related Solutions

Unable to connect oracle as sysdba tables have been dropped

This is probably as complete a way of killing an Oracle database as you could wish for. The sys tables contain all the metadata about every object in the database -- objects, segments, extents ... so the database now contains no information on what user tables it stores, including the tables that store the data about that.

New database, I think.

And no more sys connection accidents.

Oracle – Fixing ORA-03113 End-of-File on Communication Channel on Startup

After hours of misdirection from official Oracle support, I dove into this on my own and fixed it. I am documenting it here in case someone else has this problem.

To do any of this, you must be the oracle user:

$ su - oracle

Step 1: You need to look at the alert log. It isn't in /var/log as expected. You have to run an Oracle log reading program:

$ adrci
ADRCI: Release 11.2.0.1.0 - Production on Wed Sep 11 18:27:56 2013
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
ADR base = "/u01/app/oracle"
adrci>

Notice the ADR base. That is not the install. You need to see the homes so you can connect to the one that you use.

adrci> show homes
ADR Homes:
diag/rdbms/cci/CCI
diag/tnslsnr/cci/listener
diag/tnslsnr/cci/start
diag/tnslsnr/cci/reload

CCI is the home. Set that.

adrci> set home diag/rdbms/cci/CCI
adrci>

Now, you can look at the alert logs. It would be very nice if they were in /var/log so you could easily parse the logs. Just stop wanting and deal with this interface. At least you can tail (and I hope you have a scrollback buffer):

adrci> show alert -tail 100

Scroll back until you see errors. You want the FIRST error. Any errors after the first error are likely being caused by the first error. In my case, the first error was:

ORA-19815: WARNING: db_recovery_file_dest_size of 53687091200 bytes is 100.00% used, and has 0 remaining bytes available.

This is caused by transactions. Oracle is not designed to be used. If you do push a lot of data into it, it saves transaction logs. Those go into the recovery file area. Once that is full (50GB full in this case). Then, Oracle just dies. By design, if anything is messed up, Oracle will respond by shutting down.

There are two solutions, the proper one and the quick and dirty one. The quick and dirty one is to increase db_recovery_file_dest_size. First, exit adrci.

adrci> exit

Now, go into sqlplus without opening the database, just mounting it (you may be able to do this without mounting the database, but I mount it anyway).

$ sqlplus /nolog
SQL*Plus: Release 11.2.0.1.0 Production on Wed Sep 11 18:40:25 2013
Copyright (c) 1982, 2009, Oracle. All rights reserved.
SQL> connect / as sysdba
Connected.
SQL> startup mount

Now, you can increase your current db_recovery_file_dest_size, increased to 75G in my case:

SQL> alter system set db_recovery_file_dest_size = 75G scope=both

Now, you can shutdown and startup again and that previous error should be gone.

The proper fix is to get rid of the recovery files. You do that using RMAN, not SQLPLUS or ADRCI.

$ rman
Recovery Manager: Release 11.2.0.1.0 - Production on Wed Sep 11 18:45:11 2013
Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.
RMAN> backup archivelog all delete input;

If you've got RMAN-06171: not connected to target database, than try to use rman target / instead of just rman

Wait a long time and your archivelog (that was using up all that space) will be gone. So, you can shutdown/startup your database and be back in business.

Best Answer

Related Solutions

Unable to connect oracle as sysdba tables have been dropped

Oracle – Fixing ORA-03113 End-of-File on Communication Channel on Startup

Related Question