Sql-server – SQL agent jobs hung state (SSIS jobs hung)

jobssql serverssis

For a couple of weeks I have been experiencing a weird issue on SQL Server 2016 SP1. We migrated a critical server from SQL Server 2008 R2 to SQL Server 2016 with SP1 using side-by-side. We migrated all the databases, logins, db mail, jobs etc.

We have around 150 jobs deployed in SQL Server Agent. At the time of database cut-over (Saturday 5 AM ET) from SQL Server 2008 R2 to SQL Server 2016, few SSIS jobs are in a hung state. Jobs have neither succeeded nor failed; they were showing as in the 'executing' state. We found this kind of behavior only for SSIS jobs (around 39 jobs); the remaining T-SQL jobs are fine.

Interestingly, we could see hung state over transaction log backup as well (since we had created TRN backup job from maintenance plan which run every 1 hour). We had verified the event viewer logs, SQL Server Agent logs, error logs and everything and nothing captured in the logs. We had checked the server health status and everything is up and running fine (CPU 5%, memory 15%) and also verified the locking, blocking status and we couldn't find anything wrong.

Once we restarted the SQL Server Agent, these jobs started working smoothly without any issue and also running successfully on their schedules. After a week, the same issue was repeated at same time (Saturday 5 AM ET), that set of jobs went into the hung state.

Again we verified the logs, blocking…etc. Nothing captured and again restarted the SQL Server Agent service from configuration manager then it got resolved. Here our intention is it's a production server and users reported that if jobs not run then data would become stale. Below log was for all hung jobs after restarting the agent.

Unable to terminate process 1208 launched by step 1 of job
0xB4CF917BAF53234796F42A38EC45B871 (reason: Access is denied)

This issue recurred at Saturday 5 AM EST after the weekly maintenance job (rebuild index–> update stats). Duration of this job is 1 hr 50 mins. After that we found that a few SSIS jobs hung. We don't say it's the exact cause. I was changed the schedule to Sunday 5 AM EST to get the root cause. We had verified the wait types as well and everything was fine.

Finally we came to know that a vendor changed the security key at SFTP side and one batch which caused for remaining jobs hung. So our developers changed the security key manually at appropriate step in SQL job then job running fine. we again facing one issue here in test env.

We didn't accept the key in FTP and it's keep on running (executing) but not affecting to other jobs in SQL Server 2008 R2. Here the scenario is completely different in SQL Server 2016 as this FTP job which is affecting to remaining SQL agent jobs to hung.

Can anyone please guide me if any patch which we required to apply on server for permanent fix or it's a bug in SQL Server 2016?

Jobs working fine over SQL Server 2008 R2 in scheduled manner except batch (FTP) job keep on executing mode. We had migrated all the jobs to SQL Server 2016 and we found the issue when SFTP batch job started executing then after all remaining jobs goes to hung.

Best Answer

We noticed similar behavior few times within a year on one of the production server with SQL Server 2014 SP2 installed.

Symptom: All SQL Agent jobs that run SSIS packages suddenly hanged in running state, while no real activity on data engine side.

Resolution: Both times this issue was resolved by pushing all pending windows updated patches to be installed. After this SQL Agent was restarted and jobs behavior was normal again.

Seems the issue is somehow related to .NET stack on a server and OS that we obtain via WSUS