How to make dataguard switchover seamlessly

dataguardoracle

In my Dataguard configuration on a 12c environment, when I perform a switchover from primary (foo) to standby (bar) and then try to sqlplus in with user@foo, I receive the "ORA-011033: ORACLE initialization or shutdown in progress" error instead of being able to connect to the new primary(bar).

Now, I know that this is because the connection is still pointing to the foo address instead of the bar address, and according to many articles, you need to create a new "virtual" service to handle the different connections, as well as a trigger to stop/start this service after a rolechange.

I did this and with the service_names parameter set to this new "virtual" service, a switchover will work and can also switch back but when trying to sqlplus with user@foo it throws ORA-011033. The trigger is useless and the service continues to run on both foo and bar because it seems that having that parameter set overrides whatever the trigger tries to do.

Now if I don't define that parameter, my switchover works fine, and the triggers do their job, and when I sqlplus user@foo, it will correctly redirect me to the bar address. However, when I try to switchover back to foo, I receive the following in the DG alert log:

Redo transport problem detected: redo transport to database foo has the following error:
ORA-16047: DGID mismatch between destination setting and target database
08/05/2016 16:01:02
Initialization of connection failed.
Expected destination db_unique_name is foo.
Database actually reached is bar
Failed to send message to site foo. Error code is ORA-16642.
Data Guard Broker Status Summary:
Type Name Severity Status
Configuration foo_dg Warning ORA-16607
Primary Database bar Error ORA-16778
Physical Standby Database foo Error ORA-16642

ON FOO:

log_archive_dest_2='service="bar"','ASYNC NOAFFIRM delay=0 optional
compression=disable max_failure=0 max_connections=1 reopen=300
db_unique_name="bar"
net_timeout=30','valid_for=(online_logfile,all_roles)'
fal_client='foo' fal_server='bar'

ON BAR:

log_archive_dest_2='service="foo", ASYNC NOAFFIRM delay=0 optional
compression=disable max_failure=0 max_connections=1 reopen=300
db_unique_name="foo" valid_for=(online_logfile,all_roles)'
fal_client='bar' fal_server='foo'

Entries in LDAP:
foo:

(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=foo_home)(PORT=15940))(ADDRESS=(PROTOCOL=TCP)(HOST=bar_home)(PORT=15940)))(CONNECT_DATA=(SERVICE_NAME=virtual_foo)(FAILOVER_MODE=(TYPE=SESSION)(METHOD=basic))))

bar:

(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=bar_home)(PORT=15940)))(CONNECT_DATA=(SERVICE_NAME=bar)))

I'm not sure what I'm missing or misunderstanding. Can anyone shed any light as to what I might be doing wrong? Can provide more info if needed.

Best Answer

Starting with 11.2, you do not need to create the trigger, you can define a role based service with Grid Infrastructure.

foo:

srvctl add service -db foo -service virtual_foo -role primary
srvctl start service -db foo -service virtual_foo

bar:

srvctl add service -db bar -service virtual_foo -role primary
srvctl start service -db bar -service virtual_foo

If you do not use Grid Infrastructure, well, just use the trigger.

Your virtual service was misconfigured, that is why you received ORA-01033. You should not update the value of service_names manually. After you did that and performed the switchover, virtual_foo service still ran on foo. Your client tried the first address in the lists, found that the listener was alive, and a service existed with the requested name, so it tried to log in. But the standby database was in MOUNT state, so the client could not log in --> ORA-01033.

For the switchover back to foo part: do not use the same TNS entries for clients and Data Guard configuration. Clients do not need to know which site they connect to, but Data Guard Broker and database background processes need to connect exactly to foo or bar, a virtual service does not provide that functionality. Seperate the TNS entries needed for Data Guard configuration and the one used by clients.

Usually I configure DG with the below naming and TNS entries.

If the db_name is foo, then:

  • primary: db_unique_name = foo_site1
  • standby: db_unique_name = foo_site2

(or something similar, that identifies the site)

TNS:

  • foo_site1: TNS entry pointing to the host at site1, with service_name defined as foo_site1 - used by DG configuration
  • foo_site2: TNS entry pointing to the host at site2, with service_name defined as foo_site2 - used by DG configuration
  • foo : TNS entry pointing to both sites, with service_name defined as foo (and that is defined as role based service) - used by clients

Finally, this part in your connection string:

(FAILOVER_MODE=(TYPE=SESSION)(METHOD=basic))

Unless you have RAC databases with multiple instances per site, and foo_home and bar_home are actually SCAN names, this is unnecessary. These parameters are used to configure Transparent Application Failover (session failover between RAC instances).

By the way, this is far from a really seamless switchover, all that was achieved by this, was that clients can use a single connection string to connect to the database after a role transition.

Client sessions will fail and need to reconnect manually, that could take minutes. Or you need to stop your applications and have planned downtime for this.

Having a really seamless switchover requires far more work (like configuring ONS, FAN). More details at:

Client Failover Best Practices for Highly Available Oracle Databases

Fast Application Notification