Elijah. There's two separate questions here:
1. Is DTC supported with AlwaysOn Availability Groups?
You're using SQL Server 2012, and according to Microsoft's Documentation, that answer is no. I totally understand that you want to try it anyway, but keep in mind that you're now putting something into production that Microsoft simply will not support, AND you're using two separate niche features together (AGs and DTC). If anything whatsoever goes wrong, you're going to be in a world of hurt. This just isn't something I'd ever even think about trying in production.
Keep in mind that if your managers find out that you deployed something Microsoft specifically says in big letters, "YOU CAN'T DO THIS," and you have any kind of outage where you have to call Microsoft for support, you're going to have some ugly explaining to do.
Technically, DTC is supported starting with SQL Server 2016 SP2 and later, but it just means that you can pick which database loses data on failover, and the application has no idea data was permanently lost. That's not what a normal database administrator would call DTC support.
2. How should DTC be configured in a multi-node, multi-subnet cluster?
Read Allan Hirt's post on configuring DTC with multiple instances of SQL Server in a cluster, and make sure to read all of the links in the post as well.
It can connect from SQL 1 server, which is the primary for AG.
By "connect", do you mean it can ping AG-LISTENER
from SQL1
?
It sounds like what your problem might be is with the port number you chose for your listener. By choosing 5525, you are selecting a non-default port (1433 would be the default).
So when you try to connect to the listener, what does your connection string look like? I'm guessing it looks something like this:
data source = ag-listener; initial catalog = ...
You have two options here. You can either be explicit with your listener's port number:
data source = ag-listener,5525; initial catalog = ...
Likewise, if you're testing this out with SQL Server Management Studio (SSMS), then for the Connect to Server dialog box, instead of putting in ag-listener
for the Server name text box, put in ag-listener,5525
.
Or you can change the port that your listener is listening on to 1433 (read the below BOL reference before considering this change):
alter availability group YourAvailabilityGroupName
modify listener 'AG-LISTENER'
(
port = 1433
);
It is worth noting when you can use the default port (1433). Take a look at this reference on BOL explaining when you can and can't use 1433 for the listener (excerpt copy/pasted below for reference):
You can configure the default port to 1433 in order to allow for simplicity of the client connection strings. If using 1433, you do not need to designate a port number in a connection string. Also, since each availability group listener will have a separate virtual network name, each availability group listener configured on a single WSFC can be configured to reference the same default port of 1433.
(portions omitted for brevity)
If you use the default port of 1433 for availability group listener VNNs, you will still need to ensure that no other services on the cluster node are using this port; otherwise this would cause a port conflict.
EDIT: If the above isn't your problem (as seen from your comment below) then, after you look through the logs, I'd say the next best course of action is to start looking at the network traffic to see what is (and isn't) happening. You can use a network monitoring tool like netmon to accomplish this.
Another thing I'd do, and I realize you said the firewall isn't a problem, but I'd see if the port is actually listening (my favorite tool for this is portqry).
Best Answer
Well that doesn't help you application stay available! You can absolutely have an availability group without a listener, but the useful of such setups are low.
That's not quite correct. The listener always points to the primary replica and the main purpose of it is to facilitate transparent failover so that the connection strings do not need to change. One additional use is in read only routing as you've said but the primary use is actually for transparent failover.
The CNO has no idea about what an availability group is, which should be primary, etc. The CNO absolutely DOES NOT always point to the primary replica. You can connect using it only when the core cluster resources (which holds the CNO) happens to be owned by the same node as the AG primary but if it isn't - and the CNO doesn't follow the AG around - then it'll fail. Here's an example:
The one without the listener might be working fine... does it still work fine on a failover? Are they using some other DNS alias to "act" like the listener may.
The one with the listener, what does their connection string look like? If it isn't using the listener name in the connection string then of course it isn't going to work... they need to point it to the right place, that's why the listener exists!
The connection to the listener may also fail if there are multiple subnets, older client libraries are used, or certain keywords aren't in the connection string. If there are multiple subnets, the client driver should be something that supports the MultiSubnetFailover keyword and this should be set to TRUE.
Well, that's 50% of the problem.
If they update their connection string to use the listener and set MultiSubnetFailover = True then I'm betting it'll work... assuming the client library used to connect supports it.
If SQL Server created the listener for you, I highly doubt it is setup incorrectly... I won't rule out edge cases though.
I would do the opposite. Have the GIS team change their connection string to the listener and set multisubnetfailover. It should, then, work... again with previously said assumptions.
Yes, you do want to use the listener. The fact that it is working is either because there is some document that says "always have this on node3" or you've been getting lucky that it stays working. They might have used the replica name directly in the connection string also... that would allow it to connect. Depending on the settings for the secondary role it may or may not still connect and work properly... just depends on said settings.