Sql-server – Failover timeout problem

clusteringfailoversql-server-2008-r2windows-server

I'm looking for an additional bit of help with regard to some problems I'm looking at for one of my company's clients. Basically I have two related problems with regard to a 2-node active/active cluster hosting an instance of 2008 R2 on each instance.

One of the instances, normally on node#2, is at RTM. And when failing over to node #1 there is a time-out bringing the SQL Server service on-line, however once the time-out has, er, 'timed out', the service can actually be brought on-line OK manually. And looking at the Log there are errors of the form

‘Login failed ….. Reason: Server is in script upgrade mode…..’.

At first I thought this was the result of a failed attempt at installing SP1 however I'm now not so sure. SP1 has definitely been installed on both nodes and the other instance, normally on node #1, is at SP1. I assume this was following the recommendation to install the service pack on the 'inactive' node, failing over and repeating the process. However I'm having difficulty interpreting the installation logs to see if just one instance has been updated or both, with one failing. So I was hoping someone could help me with that in terms of what log file I should be looking at.

Also, what might be the meaning of the 'script upgrade mode' errors? Does this indeed sound like a failed SP1 upgrade or is something else at play here? Curiously the problem only occurs in one direction – from node #2 to node #1. When failing back to node #2 the SQL Server service comes back on-line without the need for any manual intervention.

Best Answer

Had forgotten about updating this one after getting help from Microsoft Support. Turns out that the build number on one nodes was slightly different to the other due to a Security Patch that had only been applied to one node, possibly due to a Windows Update.