Network design considerations for Oracle Database Appliance

Networkoracleoracle-11g-r2oracle-database-appliancerac

With the introduction of Oracle Engineered Systems the DBA is moved somewhat closer to infrastructure design decisions, and expected to at least have some opinions of the network design requirements for the database. At least that is the situation I find myself in 🙂

After deploying an ODA for testing, I find myself with the current setup:

System Controller 0 has the public bonded interface (bond0) connected to a typical edge switch, a Catalyst 2960 series. A management interface (bond1) is connected to a second edge switch of the same type.

System Controller 1 similarly has it's public interface connected to the second switch, while the management interface is connected to the first switch.

This way, if one of the switches goes down, an operator will be able to reach each system controller either via the public or the management interface to facilitate diagnostics.

On the Cisco end of things, EtherChannel groups are configured for the 4 bonded interfaces of the ODA. The two switches are individually wired to the rest of the network, with no direct links between the two.

At first glance this does look like a reasonable design, but the more I think about different fault scenarios the more questions I seem to come up with.

Taking into consideration that these edge-type switches are not in themselves redundant, it seems rather important that the cluster can deal with one switch becoming unavailable due to power supply failure, or one switch failing to forward packages.

The database clients (zend server application servers in this case) are each similarly connected with a bonded interface to only to one of the two switches. This brings up some questions with regard to load balancing: The way I understand 11gR2 RAC, simply connecting to the SCAN address will quite possibly let the client go the long way to the main network and back through the other switch, which can hardly be considered to be very efficient.

What happens if a switch fails or stops forwarding packets? Will connections find the accessible VIP listener through SCAN? Will RAC somehow detect the network fault and move the SCAN and VIP to the System Controller with a working and accessible public interface? I honestly can't see how it would.

And while clients taking the long way through the core network and back is acceptable during a failover scenario, it sure would be nice to avoid it in normal production.

I'm sure Oracle has a very clear idea of how this should all work together, but I'm afraid I just don't see it all that clearly.

Is it possible to achieve full redundancy with edge-class/non-reduntant switches? Can we somehow add some control on where client connections are routed in production and failover situations? Perhaps there is a good way to interconnect the two switches to allow traffic directly between clients on one switch and database listener on the other?

At this point I'm looking for any best practices and fundamental network design considerations that should be applied to a typical high availability ODA implementation.

Hopefully this will then be of use to any DBA that is faced with making network design decisions for their ODA 🙂

Update:

The ODA is configured with bonds in active-backup configuration. I think this may allow for a setup where each interface on the bond is connected to a different switch, without any switch side configuration.

Anyone know if this is the case?

[root@oma1 ~]# cat /proc/net/bonding/bond0 
Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth2

Best Answer

As it turns out, the ODA is factory configured with active-backup bonds. I've tested this to work well without any switch-side LACP/EtherChannel configuration, and each bonded connection may be split across two switches. In my tests, no simulated failure or network reconfiguration caused more than a a few hundred milliseconds worth of network outage.

This means that one can set up an isolated redundant front network for web applications using any layer two switches that are not inherently redundant.

To avoid client connections taking the long way into the company network and back through the other switch (and thus making production dependent on that equipment), one can have a private VLAN that only lives on the two edge switches and on an EtherChannel trunk between them.

As such, only the application servers and the database appliance will exist on that virtual network segment.

I don't see a way to control which path the connections from the application servers take to the database listeners, so the link between the two switches will have to be redundant, less this link becomes a single point of failure. This rules out using unmanaged switches without support for VLAN and either LACP or STP.

Using Cisco Catalyst 2960-series switches, I believe a combination of EtherChannel and Port Fast would be the better choice for a solid independent connection between the two. I would also use Port Fast on the ports for all the bonded connections to ODA and application servers.

Since the production network is isolated, one would need separate network connections for management, backup and connectivity to the rest of the company network.

Naturally, in order for this front production network to be fully self contained, any dependencies to external resources, such as DNS or authentication services, must also be resolved. Ideally production would be able to continue independently, without regard to any faults, ongoing maintenance or network outages anywhere else in the data center or company network.

Illustration of an isolated front network with ODA