I have dozens of SQL Servers, on many different versions. Many of them have an error in the SQL Server error log EVERY DAY at just about 2 am.
Login failed for user 'TheDomain\SQLServerAgentServiceAccount'
Some show this reason:
Reason: Token-based server access validation failed with an infrastructure error.
Check for previous errors. [CLIENT: <local machine>]
Error: 18456, Severity: 14, State: 11.
The rest show this:
Reason: Could not find a login matching the name provided. [CLIENT: <local machine>]
Error: 18456, Severity: 14, State: 5.
Our network team doesn't see anything suspicious in this time frame. The second reason makes sense and is true – there ISN'T a matching login for the service account. I don't know why there would be an explicit one created for a service account.
What could the servers be trying to do at the same time every day? There are no Agent jobs, backups, maintenance plans, or any other known services that would do anything.
Best Answer
Using an Extended Events session I was able to track down the issue and see that it came from the syspolicy_purge_history job that SQL Server creates by default to clean up Policy Management records. Step 3 is "Erase Phantom System Health Records". This contains a PowerShell script that tries to connect to OTHER instances on the machine, causing the login failures. This is a known bug and only affects machines with more than 1 SQL Server instance. https://connect.microsoft.com/SQLServer/feedback/details/606801/syspolicy-purge-history-job. So when you see login failures on an instance, they are not that instance trying to connect to itself, but another instance on the same machine.
The link here explains further and the code below will update the job step to connect to only the instance the job is running on.
https://www.codykonior.com/2015/05/31/login-errors-with-syspolicy-purge-history/
OR, copy and paste the below into the job step: