Sql-server – How to clear an Error 15404 after a DB restart (besides rebooting)

active-directoryjobssql serversql-server-agent

Every so often (e.g. ~months) a SQL Server Agent hourly Job will begin reporting an Error 15404 and continue to do so until intervened with.

[298] SQLServer Error: 15404, Could not obtain information about
Windows NT group/user 'DOMAIN_NAME\SomeDomainAccount', error code
0x6e. [SQLSTATE 42000] (ConnIsLoginSysAdmin)

Sometimes the first failure occurs immediately after a manual restart of the SQL Server Engine and SQL Server Agent services. The problem can be cleared by rebooting the machine.

The Job Owner is the name listed in the error message and is a SQL Server Admin.

The SQL Server Engine Service account looks to be a service account (I believe it is the default install account (one notch better than generic NetworkService to prevent interference between Engine/Agent instances):

   NT Service\MSSQL$INSTNAME

It would be one thing if the job always failed but since the job succeeds after a reboot it makes me think that a service account like is supposed to be working and that there is some A/D timing issue or possibly a bug. When IT is asked about the A/D configuration, the response is usually "nothing has changed."

  • Restarting the engine and agent services can cause the job to start failing.
  • A machine reboot clears the problem.
  • An immediate subsequent restart of the engine and agent no longer cause the job to fail.

Link:
How to troubleshoot a SQL Server 8198 error

Best Answer

Not a solution, but you can work around the issue by making the job owner a SQL account.

Each time a job is started, SQL Server verifies the identity of the job owner and checks that it has permission to execute the job. If the owner is a Windows account, the engine needs to query Active Directory. If for any reason that fails, the job will not run. It could be because the AD server is busy, down or cut off from the network, or that the SQL Server service account doesn't have rights.

Since it works immediately after rebooting, that makes me think it might have something to do with cached credentials. Windows will save the credentials it looks up for later use. That cache is cleared on reboot. Perhaps something is corrupting the cache.