Sql-server – Why would running a SQL query overnight break the database for the following day

sql serversql-server-2008

I apologise in advance if this is a duplicate question (which I bet it will be). I had a good search through and found similar questions but nothing that seemed an exact match.

I wrote a report that takes around 30 minutes to run. It does a lot of number crunching and I am quite happy that it takes so long to execute but this means the report can't be interactive. Basically the report has to calculate a profit and loss report as if the company has lost exactly one client. Then it puts that client back and runs the report again as if the next client was lost, etc. I think 30 minutes is actually rather good performance considering how intensive this is.

Users are happy with this report being updated nightly and then they can see the cached results the following day. So the plan was to run the report overnight and save the output somewhere.

I wrote a stored procedure to do this and I can run it during the day in around 30 minutes. However, when the report runs overnight as a SQL Agent job it NEVER completes. It kicks off at 9pm and when I get into work the next day I will find my inbox flooded with emails saying that SQL Server is broken, isn't performing properly, etc. I kill the job and everything goes back to normal.

Except it doesn't go back to normal. What happens next is that ANY report that is run against the same database will timeout. The only way to correct this is to either wait a day (without running the overnight report) or to run another query against the same database directly from SSMS as this seems to reset the problem for some reason.

Obvious questions:

  • is it the SQL Agent job causing the problem (maybe permissions)? No, if I run the SQL Agent Job during the day it works fine, it's some sort of timing issue I think.
  • have you tried moving the schedule? I have tried everything from running it at as early as 6pm to running it at as late as 6am and it doesn't seem to make any difference.
  • what else runs overnight? There are a whole load of other jobs that run for different databases. I have moved the schedule around so my job runs in isolation (in theory). There is a job that runs an index optimisation across ALL databases but I can't turn this job off as it is managed externally.
  • Does the query plan change for other reports? I have no way to tell, if I run a report via SSMS then it works, if I run it via the Excel add-in that does exactly the same thing, but via a SQL connection then it times out. I can view the process in Activity Monitor and watch it run for ages before it eventually dies.

My personal opinion is that the query engine is making bad decisions about which index to use but when I run a report interactively from SSMS this somehow resets "something" internally and it goes back to running reports using the right indexes again. I have no evidence to back this up with apart from the behaviour is the same if I disable one of the indexes on one of the "raw data" tables.

When I say, "run a report interactively" I just mean this:

SELECT * FROM FormatMonthEndReport(2013, 1);

Where FormatMonthEndReport is a table-valued function.

I don't know if this is even relevant but I make a lot of use of APPLY in all of these reports.

I imagine more detail will be asked for but I don't want to add too much detail at this stage as this would rapidly turn into a wall of text. I will come back and see what is asked for first so my detail is more targetted.

Edit #1 – it looks like running a SQL Profile is going to be the first step. I am going to set this up to run overnight and then report back tomorrow. I need to leave so I won't have time to respond to the other suggestions until then… but I am not ignoring you!

Edit #2 – so it worked last night, very odd as this is the first time this job has run successfully overnight! The only thing that appeared to be different was that the trace was running. The trace did pick up some interesting activity though and there was a deadlock victim at around 23:58. I checked the overnight job and it took 12.5 hours to run so this still is taking longer than it should but at least it is completing now.

Best Answer

My gut is that it's related to parameter sniffing; when you call the TVF's, do you pass in parameterized values or hard-coded values? For example:

SELECT * FROM FormatMonthEndReport(2013, 1);

will perform differently than

DECLARE @p1 int = 2013, @p2 int = 1; SELECT * FROM FormatMonthEndReport(@p1, @p2);