Sql-server – Upgrade and migrate SQL Server 2014 AlwaysOn AGs to SQL Server 2016, using the existing WSFC name

availability-groupsmigrationsql serversql-server-2016

Here are the components:
WSFC Name: SQLPROD1
Existing Nodes in this "cluster": SQL2014OnSite and SQL2014OffSite
New Nodes to be added (?) to SQLPROD1: SQL2016OnSite and SQL2016OffSite
All 4 servers running Windows Server 2012 R2 Enterprise. All 4 instances of SQL Server are Enterprise Edition.

I successfully added the two new servers to the WSFC. I successfully configured the two new SQL2016 instances to use AlwaysOn.

For testing purposes, I created a new AOAG named TestMigrate with a single database, AdventureWorks2014 and configured it to be identical to the 3 production AOAGs (Asynchronous commit, manual failover and non-readable secondary).

I then proceeded to add the 2 new nodes as replicas to the AG (worked great). I read this article (https://msdn.microsoft.com/en-us/library/dn178483.aspx), in the section titled "Availability Group with One Remote Secondary Replica" that indicated it was necessary to change the Availability Mode to Synchronous Commit, then failover to the new primary node, and finally change the Availability Mode back to Asynchronous Commit. Up to this point, everything worked as expected.

The problem happened when I removed the original 2 nodes from the AG. Within a few seconds, both replica copies of the database switched to "Restoring", and the AG disappeared from both 2016 instances. Is there some "background" metadata that links the AG to the instance where it was created?

Also, it was my assumption that this was the best way to upgrade/migrate these AGs, but since it completely dropped the AG once I removed the two original nodes/instances, I'm guessing it's not…or I have missed a rather crucial step. I've searched high and low online for a straight-forward approach to accomplish this task, but haven't found anything that addresses this specific requirement. Several "similar" tasks, but not quite the same…and not close enough to give me the "missing piece(s)".

Any suggestions or recommendations would be very much appreciated.

Best Answer

Ah-ha!! I figured out what I did wrong. In my first attempts, I was doing all the work from the existing primary (SQL2014OnSite), and when I removed it from the AG, I must have done it in such a way that it blew away the AG. This time, after failing over to the new primary node (SQL2016OnSite), I made the change on this instance by right-clicking the AG, choosing Properties, and in that window, highlighted SQL2014OnSite, and clicked the Remove button. It has now been running in the new configuration, which is SQL2016OnSite as primary and SQL2016OffSite as secondary, for about 15 minutes after removing SQL2014OnSite from the AG, and...so far, so good. I think that was my error: performing the "after-migration" work from the 2014 instance instead of the 2016 instance. Still, if anyone has any insights or recommendations for a better way to do this, by all means, please suggest away.