Sql-server – ASP.NET & SQL Server 2012 timeouts once a month

performancesql serversql-server-2012

We have an intranet site which works fine, except one day a month.

There are slow queries, unoptimized tables, tables with ~8 million records, etc. but daily usage is fine, although some functionality could be faster. The database server has 180GB RAM, 16 cores; it's a very fast server. The website is on another very decent server, with a very fast connection between.

Employees need to create a lot of invoices on the last day of each month and that's when it goes sour. On the last day of each month, people get a lot of time outs, a lot of pages which only run SELECT FROM TABLE will fail to load because of timeouts.

While people are getting the timeouts, I turned on Activity Monitor, and the CPU is at 20%, there are 300 calls per second, which is really not much at all for that server. On daily usage it's 100-200 calls per second and it works fine. I ran sp_who2, sp_who3, they don't report any locks. I have also enabled deadlock report, to report anything for more than 20 secs and it doesn't report anything.

Some queries have 7000 different plans, although that's bad it doesn't explain timeouts. We have a lot of SELECT queries without a NOLOCK but that doesn't explain why pages with SELECT only time out.

We get very low physical reads, since RAM is 180GB, most of the data is in RAM, so slow disk is not the problem. Network is not the problem, since two servers (database server and web site server) have a very fast connection between them. We backup the database every 15 mins so every 15 mins I see a high I/O activity, but that happens on other days as well without a problem.

The only oddity I noticed was that when I ran sp_who there were 300 Waiting_Command state connections, that's much lower on other days.

Do you reckon there's anything else I could check? The problem is we only have a few hours once a month when the time outs occur so we have very limited time to check what's going on, at other times it's OK. I wrote a JMeter script to reproduce the problem but it's not the same as 300 people creating invoices at the same time. We're thinking of contracting a DBA but I'm not sure even if a DBA could do anything with what happens once a month for a few hours.

Any ideas?

Update:

  • I checked sys.dm_os_wait_stats every few minutes but I couldn't see anything out of the ordinary. But as @AaronBertrand suggested I'll look into that.
  • We use datareader and datasets but we don't do any heavy processing with the reader results.
  • It happens on the last working day of each month so it's a different day every month.
  • I'm new here and I was told this has been happening every month for the past 10 years at least. 10 years ago we probably had fewer employees but also slower servers.
  • There's nothing else that runs on the last day of each month.
  • We have 15 mins backups, report generation that runs every 15, every 30 minutes etc. but they don't cause a problem on other days.
  • Both datareader and datasets are used, but datareader iterations are used to get total of all items, etc., just quick loops, nothing like reading a file for each record.
  • There doesn't seem to be anything else running on either servers.
  • This happens on last working day of each month so it's a different day each month.

Best Answer

It might be related with leaking connections exhausting the connection pool. If the connections are not released properly they will remain active and will not be returned to the connection pool until the next Garbage Collection executes. If a lot of connections are opened in a short time the connection pool might fill up before the GC runs, and the following connection requests will end up waiting for a free connection in the pool, eventually timing out.

https://msdn.microsoft.com/en-us/library/8xx3tyca(v=vs.110).aspx