Sql-server – ETL Issue failing with Timeout error

data-warehouseetllinked-serversql serversql server 2014

We have a ETL job which is running each day at 02:00 AM.

Until last week the Job ran successfully with no issues.

The ETL runs the extract by connecting through the Linked server to the PROD server.

Checked the access to PROD server all good.

All of a sudden the job started failing with below error.

Error: Timeout expired. The timeout period elapsed prior to
completion of the operation or the server is not responding.

Checked below things :

Exec updatestats on each database of this Database server where the ETL is pointed to.

I also tried setting Remote Query Timeout to 0

NOTE: But a particular script is causing issue only during extraction.

What else should I check on as the ETL is failing only for specific script.

But the particular script is running good when executed manually (13 seconds)

There are many scripts which run for this ETL job.

But during the extraction it is failing.

P.S. I am very sorry as the code is little big I couldn't post here.

Please see linked server properties.

enter image description here

Kindly help me on the things I am missing.

Regards

Best Answer

While this won't necessarily get to the root of the problem, you may want to see if your job can be set up to retry on failure.

If the problem is a temporary issue, then trying to run the job (or step) 15 seconds or 3 minutes later might work fine.

Note that you need to consider how your job works. If it starts by clearing a work table, then loads the table from a file, then processes the file data, and it's actually failing after the data's loaded, you want to be sure that you still have your file around to reload the data, or you want to break the job into steps, and just rerun the processing step. Also, you need to be sure that shifting the runtime doesn't interfere with other processes (sometimes, if a job takes less than a minute to run normally, you'll have something that has to follow it scheduled to run 15 minutes later; if you schedule up to 10 retries, 3 minutes apart, and the ninth retry works, it's running after that other job has started...).

Sometimes, seeing the length of time the failures take, and having multiple attempts a few minutes apart, will provide more info about the situation, and help you diagnose the solution.

I know that the SQL Server job scheduler allows you to retry job steps on failure; you specify the number of retries, and the time to wait before each retry. Other scheduling mechanisms may or may not have this ability.