SQL Server – Fix Login Failure at Same Time Daily with Service Account

errorsloginsNetworkservice-accountssql server

I have dozens of SQL Servers, on many different versions. Many of them have an error in the SQL Server error log EVERY DAY at just about 2 am.

Login failed for user 'TheDomain\SQLServerAgentServiceAccount'

Some show this reason:

Reason: Token-based server access validation failed with an infrastructure error.  
Check for previous errors. [CLIENT: <local machine>]

Error: 18456, Severity: 14, State: 11.

The rest show this:

Reason: Could not find a login matching the name provided. [CLIENT: <local machine>]
Error: 18456, Severity: 14, State: 5.

Our network team doesn't see anything suspicious in this time frame. The second reason makes sense and is true – there ISN'T a matching login for the service account. I don't know why there would be an explicit one created for a service account.

What could the servers be trying to do at the same time every day? There are no Agent jobs, backups, maintenance plans, or any other known services that would do anything.

Best Answer

Using an Extended Events session I was able to track down the issue and see that it came from the syspolicy_purge_history job that SQL Server creates by default to clean up Policy Management records. Step 3 is "Erase Phantom System Health Records". This contains a PowerShell script that tries to connect to OTHER instances on the machine, causing the login failures. This is a known bug and only affects machines with more than 1 SQL Server instance. https://connect.microsoft.com/SQLServer/feedback/details/606801/syspolicy-purge-history-job. So when you see login failures on an instance, they are not that instance trying to connect to itself, but another instance on the same machine.

The link here explains further and the code below will update the job step to connect to only the instance the job is running on.

https://www.codykonior.com/2015/05/31/login-errors-with-syspolicy-purge-history/

Exec msdb.dbo.sp_update_jobstep @job_name = 'syspolicy_purge_history', @step_name = 'Erase Phantom System Health Records.', @step_id = 3, @command=N'$applicationName = "SQLPS ($env:USERNAME@$env:COMPUTERNAME)" $SQLServerConnection = New-Object System.Data.SqlClient.SqlConnection $SQLServerConnection.ConnectionString = "Data Source=$(ESCAPE_NONE(SRVR));Initial Catalog=master;Integrated Security=SSPI;Application Name=$applicationName" $PolicyStoreConnection = New-Object Microsoft.SqlServer.Management.Sdk.Sfc.SqlStoreConnection($SQLServerConnection) $PolicyStore = New-Object Microsoft.SqlServer.Management.Dmf.PolicyStore ($PolicyStoreConnection) $PolicyStore.EraseSystemHealthPhantomRecords() '

OR, copy and paste the below into the job step:

$applicationName = "SQLPS ($env:USERNAME@$env:COMPUTERNAME)"
$SQLServerConnection = New-Object System.Data.SqlClient.SqlConnection
$SQLServerConnection.ConnectionString = "Data Source=$(ESCAPE_NONE(SRVR));Initial Catalog=master;Integrated Security=SSPI;Application Name=$applicationName"
$PolicyStoreConnection = New-Object Microsoft.SqlServer.Management.Sdk.Sfc.SqlStoreConnection($SQLServerConnection)
$PolicyStore = New-Object Microsoft.SqlServer.Management.Dmf.PolicyStore ($PolicyStoreConnection)
$PolicyStore.EraseSystemHealthPhantomRecords()