Sql-server – Login failed for user ‘NT AUTHORITY\ANONYMOUS LOGON’

kerberossql server

I have an Operations server running Windows Server 2012R2 and SQL Server 2014 Enterprise back end. This server is used to deploy new code to other production servers via cmd file called by a SQL Agent Job. All of the servers are on the same domain.

Server A runs Windows Server 2008R2 and SQL Server 2008R2 back end. This server has a linked server connecting to a database on Server B, running the same OS and SQL Server version as A. The linked server is configured with option @useself=TRUE.

  • All three servers use a service account enabled for delegation.
  • This service account has sa priveleges on all three SQL Server instances.
  • All three servers have SPNs configured with that account and are delegated to use Kerberos for the associated MSSQLSVC services.

I can run the following on each and "Kerberos" is returned

    SELECT auth_scheme FROM sys.dm_exec_connections WHERE session_id = @@spid

In addition I can telnet, ping, etc. from any of those servers to any other server without issue – everything is connected. Code deployments have never had a problem and the linked server is referenced often and without issue…except for one scenario and I don't understand why.

Double Hop Scenario

  1. A SQL Agent job on the Operations server is run adhoc by any sysadmin other than the SQL Agent service account and executes a cmd file, also on the Operations server.
  2. The SQL Agent job is configured to "Run As" a SQL Agent service account, having sa priveleges.
  3. The code that is deployed comes from a .sql file on the Operations server.
  4. The cmd file calls SQLCMD to execute the code in the .sql file against Server A.
  5. The code in the .sql file references the linked server and fails with error

    *Msg 18456, Level 14, State 1, Server ServerB, Line 1 Login failed for user 'NT AUTHORITY\ANONYMOUS LOGON'. *.

The error is directed at the linked server – Server B. If I run this code manually, using SQLCMD, from the Operations server against Server A, it runs fine. If I logon to a box as the SQL Agent service account and run the SQL Agent job, it runs fine. It's only when the SQL Agent job is executed by someone other than the SQL Agent service account that I get the error.

I've read through so many posts and blogs and MSDN articles regarding Kerberos, double hop, etc telling me to do what has already been done. What am I missing?

Additional Info
I've finally been able to come back to this and located some additional info. Using Bogdan's advice, I ran Process Explorer and verified the credentials for the first hop are from the SQL Agent Service account as expected and that TCP is being used. Alas, that is all the useful info I was able to get out of the tool.

I dug into the Window's application logs and dug around for login information for the different instances on the machines and noticed that Kerberos isn't even being used!!! Instead NTLM is being used.

So that's the new path I'm heading down – why is NTLM authentication being used, when Kerberos is set up and proper FQDN SDNs exist by port and instance? Do I need to somehow specify authentication type in the cmd file or SQLCMD call? Or do I have something misconfigured that I'm not thinking about?

The Mystery Deepens
The SQLCMD call referenced the "first hop" server via Alias. I modified the SQLCMD server reference to the actual named instance and reran the job. It worked! We also use cNames for our machines and SQL instances and so I tried using that. It also worked! For grins and giggles I re-tried the alias…it worked??? I go check the Windows applications logs for each of these and it is still reporting authentication as NTLM!

I am completely baffled at this point and at a loss as to how to explain this behavior to fix the rest of our environment.

Best Answer

Finally figured this out!

We reference many of our very active SQL Servers and Instances with SQL aliases and canonical names. This allows us to perform parallel upgrades and switch machines in and out without code interruption. The SQL aliases are all configured to reference the canonical names for the same reason. The standard SPNs we had registered are for the machine names.

EXAMPLE SQL Server SSxyz\xyzInstance listening on port 12345 has registered SPNs:

  • MSSQLSVC/SSxyz.mydomain.com:12345
  • MSSQLSVC/SSxyz:12345
  • MSSQLSVC/SSxyz:xyzInstance

These are all registered to the service account that owns MSSQLSVC on that machine and delegation is set for each to pass credentials.

The server and instance also have canonical name cxyz\xyzInstance. Finally, the SQL Server has a 32bit and 64bit alias of xyzAlias, pointing to canonical name cxyz\xyzInstance.

During a double hop scenario from my local pc to xyzAlias to abcAlias, Kerberos authentication fails and falls back to NTLM because the alias and SPN cannot be resolved. This is why the logs were showing NTLM.

The fix was to add a few more SPNs to register the canonical name of the server to the actual name. Then the SQL service is registered with another SPN. like this:

  • setspn -s host/cxyz SSxyz
  • setspn -s host/cxyz.mydomain.com SSxyz
  • setspn -s MSSQLSVC/cxyz:12345 sqlserviceaccountname
  • setspn -s MSSQLSVC/cxyz:xyzInstance sqlserviceaccountname

Once registered, each SPN is configured for delegation. These appear as unique SPNs and allow the SQL alias to resolve to the actual machine name. The double hop authentication logged is kerberos and everyone is happy! Except the AD guys...they're going to have a lot of SPNs to register in a couple of hours.