I have a fairly high-throughput application that occasionally decides to collapse on me. It's not very often – about once every ~3 weeks or so. When it does, if I check out perfmon, I see 100% "Avg. Disk Queue Length" pegging the server.
During these times, I also see lots of nice connection failed messages from SQL Server.
I'm no SQL Server expert, I can do the basics for indexing, taking backups, etc., but that's it.
What would cause something like this? I was thinking perhaps it was a resize of the database (it was down to ~300MB available [and it's a 30 gig database]), or maybe some reindexing gone nuts?
I do have one table in particular that has tons of inserts. Very few reads, but many inserts per second isn't unusual at all.
The server has only ~4 gig of RAM as well, but we do have a dedicated warehouse box that rolls up data every night where most of the heavy querying is directed.
Anyone got any thoughts on what might cause that huge queue length?
Best Answer
OK, so, from what I can tell, it was related to a bunch of things:
Here were my resolutions:
So, anyway, it was a combination of a whole bunch of different things, mostly related to SQL, but not exclusively (so Will was correct there).
I'd love to split the answer between everyone, as they had portions of it right, but what can you do...