Sql-server – How important is the build number on a SQL Cluster

clusteringsql-server-2005

I did not actually perform this install, nor am I responsible for fixing issues on this particular cluster. I just happen to be checking it and found some things that puzzle me. My point in this is trying to understand why it is working, at least to some respect.

OS: Window Server 2003 Enterprise (64-bit)
SQL: SQL Server 2005 Enterprise (64-bit)

Started with a 2 node active/active cluster: Server1 running Instance1, then Server2 running Instance2.

Instance 1 and Instance2 are at the service pack 3 build number for SQL Server 2005 (+ a few hotfixes). I believe it is 9.00.4053. Two new servers are built to replace Server1 and Server2. So the servers were built and added into the cluster, Server3 and Server4. Since I did not do the installation I am assuming to a point that the person followed the steps to add a new SQL Server node as described here in BOL:
http://msdn.microsoft.com/en-us/library/ms191545(v=SQL.90).aspx

I found that Instance1 and Instance2 are now both running on Server3. However the build number of the instances now show to be 9.00.1399, which is an installation unpatched. The instances are active and still running on this RTM build of SQL Server within the cluster. ?????

My thought process here is you cannot take a backup of master database and restore it to an instance that is at a lower build number than what the backup was taken from. So if you just go on that point, how can a cluster instance failover to a node that is not at the same build number? Why would SQL Server (or Microsoft) even allow it to do this?

Also the last step in the BOL article linked above has "All nodes of a failover cluster instance must be at the same version level." I cannot find anything that states what happens if you are not. The only thing I have found on Instance2 is the SQL Agent jobs don't seem to be working anymore and when the databases are being brought online there is a stack dump from something like the IO listener (I don't recall the exact message showing). I have also seen a few messages referencing IO write issues.

The databases themselves are online and appears the applications are functioning as desired. They have been running in this manner for the past week or so. Any thoughts?

Best Answer

You probably don't need anyone to tell you that you need to fix the build level on that third node! I suspect it's all working ok because of the introduction of the resource database in SQL2005: http://msdn.microsoft.com/en-us/library/ms190940.aspx

I vaguely remember an old Windows 2000 / SQL Server 2000 cluster I used to work with, where SQL thought it was a lower patched level on one node compared to the other.