The logs led us to review our Cluster services:
There was an error to lookup cluster resources. Error: There was a failure to call cluster code from a provider. Exception message: Generic failure . Status code: 4104. Description: .
HResult : 0x86d80014
FacilityCode : 1752 (6d8)
ErrorCode : 20 (0014)
Data:
errorMessage = There was a failure to call cluster code from a provider. Exception message: Generic failure . Status code: 4104. Description: .
We determined that the replication agent service was the cause of the incomplete installation.
We removed the service from Cluster Resources and re-applied SP2 without success. We stopped the service on the node and re-applied SP2 without success.
Error: Failed to run patch request for instance: MSSQLSERVER (exit code: -2032664552)
We uninstalled the service from the server nodes, and then successfully "re-applied" SP2.
Follow-up actions:
- We'll be contacting the replication agent vendor to discuss why this
was required.
- Review the Microsoft document "Patching SQL Server Failover Cluster Instances with Cluster-Aware Updating (CAU)", available here, which I discovered today.
So just in case someone wants to know what caused this, It was a group policy!
Some time ago, unbeknown to me, the domain controllers for the domain in question had been upgraded to Server 2012. Along with this came a whole bunch of Windows Server 2012 group policies. Additional policies had been added to one of the parent server OU's with a filter applied.
Unfortunately the filter had a typo, and so it was being applied to all of my database servers on this domain.
I had almost ruled out it being a GP issue, as I seemed to have all the permissions I needed, and I could see connections coming in on the correct ports between each node. The servers were happily running as single nodes!
I asked the server team to move the servers (just to see) to the 'computers' OU, and after a forced gpupdate, bingo!
Unfortunately, I am unsure exactly which policy caused the problem as there were a lot! There was a number to do with NTLM usage and authentication, and I'm convinced the issue was related to this. I will at some point set up a test lab to try and replicate the issue.
So there you go! Always check group policies, even if you believe (like I did) that nothing had changed!
Best Answer
Before an upgrade, you should:
SYNCHRONIZED
Then here is the Rolling Upgrade Process: