Sql-server – SQL Server AG Listener issues in an AWS VPC

amazon ec2availability-groupslistenersql serversql-server-2016

I'm experimenting with creating a 2-node SQL Server Availability Group following Guy Glantser's excellent guide at MadeiraData where he describes a configuration with no AD domain or internal DNS. Note that I'm building everything on AWS EC2 instances. Everything is working well (including the certificates and logins, the AG is fine and can fail over successfully) however I cannot get the AG Listener to function properly.

Guy's instructions do not mention adding a reference to the Listener in the HOSTS files, however, I have done so.

I can connect to the Listener via the name OR its IP address ONLY from whichever is the Primary AG node. I cannot connect to the Listener (on an unused IP address and listening on port 1433) from another node or another machine – timeout. When I failover the AG, I can then only connect to the Listener on the "current" Primary, and the "former" Primary cannot reach it anymore. I've been testing the connections with SSMS, UDL files and telnet to port 1433.

Am I missing something fundamental about AGs, or as I suspect, am I running into an issue with the Listener and DNS/IP addressing inside an AWS VPC?

I'm running Windows Server 2016 Datacenter edition AMI instances and SQL Server 2016 Evaluation Edition SP1.

Best Answer

Can you please review the AWS documentation on how to properly set up AG on their environment? http://docs.aws.amazon.com/quickstart/latest/sql/part3.html

There are some specific steps that need to be done for this to work. You may also want to read about the MultisubnetFailover (https://docs.microsoft.com/en-us/dotnet/framework/data/adonet/sql/sqlclient-support-for-high-availability-disaster-recovery) as without using this feature it may take several minutes (depending on how much you set up your HostRecordTTL setting) for the DNS setting to be updated and the listener name start pointing to the new active node IP. On my tests, without changing the HostRecordTTL and the multisubnetfailover it was taking up to 15 minutes for the listener name to start working again after a failover so I had to play with these parameters to have something that would work for all my client applications