Sql-server – SQL Server and SSPI handshake failed error

authenticationsql serverwindows-server

Recently I faced an issue where users from one domain were unable to login to SQL Server and getting this error:

Error: 17806, Severity: 20, State: 2.
SSPI handshake failed with error code 0x8009030c while establishing a connection with integrated security; the connection has been closed. [CLIENT: xxxxxxx]

Error: 18452, Severity: 14, State: 1.
Login failed for user ”. The user is not associated with a trusted SQL Server connection. [CLIENT: xxxxxx]

We have two domains let's say A and B. My SQL Server is running in domain B. In a normal scenario, both domain users are able to connect. But few days ago, all the users from domain A were unable to connect to SQL Server running in domain B. However domain B users were able to connect it successfully. Users approached SQL Server DBA team to look into the issue.

When I checked, even I was unable to connect to SQL Server remotely using SSMS and using credentials of domain A. However when I tried to RDP the server using domain A's credential it was working fine and there after remote access to SQL server also started working fine. So it looked to me like when I RDPed the server something got refreshed and my login remotely started working fine. I was sure that the actual issue was something related to AD and kerberos authentication but unable to prove so.

I researched this issue on web and found two solutions to fix it. One was to fix malfunctioning DC and second was to restart the server where SQL Server was running. I went the 2nd way as windows admins were not accepting that it is an DC issue. I checked everything from SQL end nothing got changed recently. Also found that this issue started happening after below event was logged on server where SQL Server was running:

Log Name: System
Source: NETLOGON
Date: xxxxxxxx
Event ID: 5719 Task
Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: xxxxxxxxxx
Description: This computer was not able to set up a secure
session with a domain controller in domain xxxxxx due to the
following: There are currently no logon servers available to service
the logon request. This may lead to authentication problems. Make sure
that this computer is connected to the network. If the problem
persists, please contact your domain administrator.

ADDITIONAL INFO If this computer is a domain controller for the
specified domain, it sets up the secure session to the primary domain
controller emulator in the specified domain. Otherwise, this computer
sets up the secure session to any domain controller in the specified
domain.

My questions are:

  1. How to prove that this is a domain controller issue?
  2. How reboot of server fixes this issue?

Thanks for your help.

Best Answer

Why does a reboot of the server sometimes fix this issue?

I think the OS has an issue talking with the DC (for whatever reason(s) you determine) and when this happens, it just gets out of sync somehow with DC, AD, etc. and a quick fix is to reboot the SQL Server OS when this occurs and there is no network communication with the DC at the time of the reboot, and all comes up fine with everything back in sync at that point. To accurately answer this, you probably need to find the reason first.


Proving that [the] Domain Controller(s) is/are or is/are not the issue

1. Troubleshoot & Consider

  • I'm not suggesting you make a bunch of changes before trying to find and fix the specific problem beforehand but there are many factors to consider I would think. It's just something you have to troubleshoot one step at a time unless there's something obvious causing the issue.

2. Access Perspective

  • In my setup, I'm the domain admin as well as the DBA so I have access to everything and all servers (two domains with two-way trusts) to troubleshoot at all levels whereas you may be a bit more restricted with what you can review, etc. on each server in the loop of the issue.

3. No Simple Oversight

  • I usually start with the simple items first that don't require changes and see if there are any obvious issues (e.g. event viewers on all servers, DNS testing on all servers, review all server configurations, etc.).

4. No Telling Logic

  • This could be an issue at the OS level of the SQL Server for the Windows version it is on. This could also be an issue with the NIC of the SQL Server or a mis-configuration of TCP/IP or DNS settings on the SQL Server or any of your DCs.

5. Collect & Update

  • Gather information on or ensure the SQL Server is fully updated with Windows Updates, and ensure server firmware is up-to-date if applicable with BIOS version as well as any hardware device firmware updates. The domain admins could also do the same on the DCs but use caution with making a bunch of changes and do so one step at a time.

6. Best Practices

  • I'm not sure how your AD and DCs are configured topology wise, but best practices should be followed as best as possible and a DC should be close to the network where both domains are physically rather than going across a slower network pipe or routers, etc. which adds another level where there could be issues or configurations to review.

    The DCs should have their TCP/IP settings configured so that their DNS settings point to other DNS servers or follow best practices otherwise as per Microsoft for the DCs configurations applicable in your environment.

7. Request Access or Configuration Disclosure Assistance

  • You'll need to do basic troubleshooting to find the issues to resolve, ask questions, see what all could be updated and patched or changed, and get your domain admins to help you troubleshoot or disclose domain configurations to you if you're not sure how to see if everything is configured as best it could be for optimum performance and per best practices.

Important: This not to say there couldn't be a very simple reason this suddenly happened but if nothing has changed anywhere (i.e. OS on DCs or SQL Server, network configurations, AD or forest functional levels, hardware upgrades, Windows Updates, etc.) then you just have to troubleshoot for the reason why one step at a time.