There are a few things that I can think of off the top of my head. You're running a multi-instance failover cluster so in theory I'd expect to see each node to be sized such that at any given point in time it can handle the load of all three instances. Chances are that this is not the case, but maybe it is. Ideally, you'd also have a spare node that can handle failures but that doesn't sound like it's the case here.
There are some configurations that you can check to ensure that you've not set yourself up for failure and the first one I'd check would be to run
sp_configure 'max server memory (MB)'
If your run value
is 2147483647
then you've got it set to allow SQL Server to take as much memory as it thinks it needs. This is set per instance so when you have multiple instances trying to consume all available RAM you will get memory pressure.
Having said that (read: actually, start here), you've not given us any other information about what you've done to discover why the application stops responding. Is it just the application that connects to the C
node that chokes, or does the original application also not work? This could end up being something as simple as the application connection string is connecting to the IP/DNS name of the C
node and not the VIP. If this is the case, then when C
is no longer serving SQL Server then you're not going to be able to connect.
Step 1: Ensure the connection strings are actually connecting to the instance/VIP name and not the nodes.
Step 1.5: (Thanks to Thomas Stringer), make sure that you're giving the new instance enough time to actually recover the database. Connect to the instance via SSMS and see if your databases are in recovery.
Step 2: If Step 1 is correct, then get on the node that is running multiple instances and see what's going on. I'd recommend using PerfMon because "Task Manager is a dirty, filthy liar" and looking at metrics for the various subsystems starting with Memory, Network, CPU, and Disk IO. This answer contains much of what you'd need in order to check for resource pressure assuming you have connectivity to the instance and the databases are all fully recovered.
This is possible and we successfully completed an upgrade of a single failover cluster instance to SQL 2012 on a cluster with multiple SQL 2008 failover cluster instances.
Perform the upgrade of the instance on the two nodes not hosting the instance first (be sure to select the correct instance when asked as in the screenshot), then finally perform the upgrade on the node hosting the instance.
Best Answer
You've essentially got it right. Active/Active is really just a multi-instance SQL cluster. When you install the Failover Cluster Instance (FCI) I1 on N1 of the Windows Server Failover Cluster (WSFC) you then have to run Add Node on N2 of the WSFC for I1, and vice versa for Instance I2. The SQL Server and Agent services are installed on both nodes, but will only be actively running on the node that actively has control of the SQL Instance.