Sql-server – Cause of High Disk Queue Length in SQL Server

performancesql server

I have a fairly high-throughput application that occasionally decides to collapse on me. It's not very often – about once every ~3 weeks or so. When it does, if I check out perfmon, I see 100% "Avg. Disk Queue Length" pegging the server.

During these times, I also see lots of nice connection failed messages from SQL Server.

I'm no SQL Server expert, I can do the basics for indexing, taking backups, etc., but that's it.

What would cause something like this? I was thinking perhaps it was a resize of the database (it was down to ~300MB available [and it's a 30 gig database]), or maybe some reindexing gone nuts?

I do have one table in particular that has tons of inserts. Very few reads, but many inserts per second isn't unusual at all.

The server has only ~4 gig of RAM as well, but we do have a dedicated warehouse box that rolls up data every night where most of the heavy querying is directed.

Anyone got any thoughts on what might cause that huge queue length?

Best Answer

OK, so, from what I can tell, it was related to a bunch of things:

  1. resizing files (data and log both)
  2. large number of inserts happening while 1. was going on
  3. one particularly heavy query running while 1. was going on
  4. a hard drive with some pretty high i/o times in general
  5. lack of memory on the server, so hitting the page file on top of everything else

Here were my resolutions:

  1. change the autogrowth settings so I'd have consistent growth when it happens, manually grew both the temp and primary database files to have more space, added additional vlogs to the temp database, and set up a notification so I can manually grow when the database gets below a certain space level.
  2. [nothing to be done about it]
  3. made the heavy query run less often; it was loading up data the user didn't always need, so changed it to run on-demand
  4. [working on getting a new server, this app is growing fast]
  5. [working on getting a new server, this app is growing fast]

So, anyway, it was a combination of a whole bunch of different things, mostly related to SQL, but not exclusively (so Will was correct there).

I'd love to split the answer between everyone, as they had portions of it right, but what can you do...