Sql-server – Duplicate Ips for cluster nodes causing backup issues

clusteringdnssql serverwindows-server

We have a four node cluster (SQL Server 2014 on Windows 12) and we recently setup heart beat network on the nodes following a best practices source after experiencing some cluster instability issues. Ever since a couple of secondary nodes are returning multiple IPs in the DNS. We have removed the entries from the DNS server for both the nodes multiple times but the IPs of the heartbeat network (the IPs are supposed to be non-routable) keep popping ultimately returning two IPs for the same node and this is causing issues with our backups. I have checked the settings on all four nodes and they are the same in the network adapter properties. What am I doing wrong?

PS: In addition, I have also changed some settings in the TCP/IPv4 properties such as removing the DNS server addresses and unchecking the 'register this connection's address in DNS'

Best Answer

We have a four node cluster (SQL Server 2014 on Windows 12) and we recently setup heart beat network on the nodes following a best practices source after experiencing some cluster instability issues.

Health checking goes across any available interface, some interfaces have metrics lower or higher than others but the loss of a single interface should be handled assuming there are other interfaces available. Just for future reference there is no heartbeat network, specifically I believe since 2008 came out.

Ever since a couple of secondary nodes are returning multiple IPs in the DNS.

You've found the issue, duplicate IPs on the network. Look at all of the adaptors and figure out which are duplicated... or better yet, run the cluster validation wizard and have it do the heavy lifting for you!

We have removed the entries from the DNS server for both the nodes multiple times but the IPs of the heartbeat network (the IPs are supposed to be non-routable) keep popping ultimately returning two IPs for the same node and this is causing issues with our backups.

This doesn't make much sense to me... if the underlying issue is that there are duplicated IP addresses (assuming IPv4) then the fix shouldn't be deleting records from DNS, that'll just delay the inevitable which would be that they register back with DNS at some point in the future. The fix would be to identify which adaptors and interfaces are improperly configured.

Non-routable doesn't mean it can't register or otherwise talk with anything else. It obviously is getting registered in DNS and can talk to other servers.

So, yes, this seems like it's working as it's been configured though configured incorrectly.

I have checked the settings on all four nodes and they are the same in the network adapter properties. What am I doing wrong?

My guess is there is a duplicate IP address (tongue-in-cheek). Give the cluster validation wizard a run and choose the networking tests, that should give some actionable output.

PS: In addition, I have also changed some settings in the TCP/IPv4 properties such as removing the DNS server addresses and unchecking the 'register this connection's address in DNS'.

That's probably not a good thing to do and could have undesirable side effects... like the cluster not working at all. Again, you know the root issue so instead of managing the symptoms fix the root cause.