Oracle RAC not starting up after reboot of system

oracle

I have a 2 node RAC (11gR2) machine that was running fine till i did a hard reboot of both the RAC nodes as i had to extend the disk space of the vms they were running on.
I also deleted the archieve logs related to the RAC nodes.

My database unique name is "rac" and instance names are "rac1" and "rac2".

I tried running srvctl to get the database up. Here is the status output i got:

-bash-4.1$ srvctl status database -d RAC -v
Instance rac1 is running on node eng1. Instance status: Mounted (Closed).
Instance rac2 is running on node eng2. Instance status: Mounted (Closed).

However, the database instance rac1 and rac2 is in mounted state or shutdown state. Im not able to bring them to open state. The db instances are in mounted state when i connect to them through sqlplus but then they move to shutdown state after some time. Here, are my errors:

for rac1:

SQL> select status from v$instance;

STATUS
------------
MOUNTED

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 27913
Session ID: 31 Serial number: 5

(after some time ie 1 min, in the same session )

SQL> select status from v$instance;
ERROR:
ORA-03114: not connected to ORACLE


SQL> startup mount
ORA-24324: service handle not initialized
ORA-01041: internal error. hostdef extension doesn't exist

for rac2:

 SQL> select status from v$instance
  2  ;

STATUS
------------
MOUNTED

SQL>  alter database open;
 alter database open
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 9311
Session ID: 30 Serial number: 3

(after some time ie 1 min, in the same session )


SQL> select status from v$instance;
ERROR:
ORA-03114: not connected to ORACLE


SQL> startup mount;
ORA-24324: service handle not initialized
ORA-01041: internal error. hostdef extension doesn't exist
SQL>

I dont know if im missing out on setting some environment variables. This is the first time im bringing down the machines after RAC installation.

This is the alter log entries when i get ORA-03113 :

Reusing ORACLE_BASE from an earlier startup = /u01/app/oracle
Wed Apr 02 12:48:40 2014
ALTER SYSTEM SET local_listener='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=<my ip>)(PORT=1521))))' SCOPE=MEMORY SID='rac1';
ALTER DATABASE MOUNT /* db agent *//* {0:16:150} */
Successful mount of redo thread 1, with mount id 2431794449
Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
Lost write protection disabled
Completed: ALTER DATABASE MOUNT /* db agent *//* {0:16:150} */
Wed Apr 02 12:53:38 2014
alter database open
This instance was first to open
Picked broadcast on commit scheme to generate SCNs
Wed Apr 02 12:53:38 2014
LGWR: STARTING ARCH PROCESSES
Wed Apr 02 12:53:38 2014
ARC0 started with pid=31, OS id=28965
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Wed Apr 02 12:53:39 2014
ARC1 started with pid=32, OS id=28967
ARCH: Encountered disk I/O error 19502
ARCH: Closing local archive destination LOG_ARCHIVE_DEST_1: '/mnt/eng_rac_archive/1_135_840477924.dbf' (error 19502) (rac1)
ARCH: I/O error 19502 archiving log 1 to '/mnt/eng_rac_archive/1_135_840477924.dbf'
Errors in file /u01/app/oracle/diag/rdbms/rac/rac1/trace/rac1_ora_28960.trc:
ORA-16038: log 1 sequence# 135 cannot be archived
ORA-19502: write error on file "", block number  (block size=)
ORA-00312: online log 1 thread 1: '/mnt/eng_rac_control/rac/redo01.log'
USER (ospid: 28960): terminating the instance due to error 16038
Wed Apr 02 12:53:40 2014
System state dump requested by (instance=1, osid=28960), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/rac/rac1/trace/rac1_diag_28806.trc
Dumping diagnostic data in directory=[cdmp_20140402125340], requested by (instance=1, osid=28960), summary=[abnormal instance termination].
Instance terminated by USER, pid = 28960

I have set DB_UNIQUE_NAME, ORACLE_SID, ORACLE_BASE, ORACLE_HOME, GRID_SOFTWARE_LOCATION,PATH, LD_LIBRARY_PATH,CLASSPATH variables in my bash profile. Please Help !

Best Answer

Issue resolved.

The problem was that my NFS archive volume was full. I assumed that doing "rm -rf *" in the archive log volumen, would solve this issue. But the volume remained full even after removing all the dbf files. There was a hidden .snapshot folder with volume level snapshots that had grown in size. Thus my archive volume remained full and my db could not use this anymore. To solve this i logged on to my storage server (ONTAP) and removed the snapshots for the archive log volume ( as i didnt want any ). This ends up being a storage management solution.

Related Question