Sql-server – Which to trust

indexperformancesql server

We are troubleshooting a long-running issue with a vendor. Their software has a tendency to freeze up and stop working once or twice a week causing major disruptions to our operation. They have been unable to determine the cause despite us sending them many GBs of logs and backups of the DB. Lately they have begun suggesting the issues are with our maintenance and perhaps not with their software (despite there being no long running queries, CPU/RAM/IO pressure or even deadlocks when the problems occur). In particular they are saying our indexes are an issue.

Their favorite tool to use is DBCC showcontig despite my arguing the thing is deprecated by MS. They obsess over scan density and extent fragmentation especially. To take away the excuse I instituted some aggressive nightly maintenance that rebuilds indexes with <90% scan density or >10% fragmentation. This has somewhat thrown them off the scan density train but they remain fixated on extent fragmentation. DBCC showcontig shows high extent fragmentation even on an index that was rebuilt hours before. Below are the results of dbcc_showcontig and sys.dm_db_index_physical_stats for a table they pointed to as a "possible problem".

DBCC SHOWCONTIG
  • Pages Scanned…………………………..: 1222108
  • Extents Scanned…………………………: 152964
  • Extent Switches…………………………: 180904
  • Avg. Pages per Extent……………………: 8.0
  • Scan Density [Best Count:Actual Count]…….: 84.44% [152764:180905]
  • Logical Scan Fragmentation ………………: 3.24%
  • Extent Scan Fragmentation ……………….: 35.97%
  • Avg. Bytes Free per Page…………………: 692.5
  • Avg. Page Density (full)…………………: 91.44%

sys.dm_db_index_physical_stats

index_type_desc      alloc_unit_type_desc     Avg_fragmentation_in_percent  page_count

CLUSTERED INDEX       IN_ROW_DATA          3.236803129  1222070

NONCLUSTERED INDEX    IN_ROW_DATA          0.680074642  48230

NONCLUSTERED INDEX    IN_ROW_DATA          0.093237195  48264

NONCLUSTERED INDEX    IN_ROW_DATA          0.03315856   48253

NONCLUSTERED INDEX    IN_ROW_DATA          0.194653248  48291

NONCLUSTERED INDEX    IN_ROW_DATA          0.393480436  58961

NONCLUSTERED INDEX    IN_ROW_DATA          0.23622292   64346

NONCLUSTERED INDEX    IN_ROW_DATA          0.041445623  48256

NONCLUSTERED INDEX    IN_ROW_DATA          0.701172007  59044

NONCLUSTERED INDEX    IN_ROW_DATA          0.216397724  53605

Should I be concerned with my indexes? The one above is not atypical. The preferred MS DMV would appear to show it being just fine, but the vendor is stuck on that 35.97% extent fragmentation. I suspect this is just them desperately trying to find something to blame their software issues on, but if I have an actual problem I want to try and fix it.

Best Answer

Their software has a tendency to freeze up and stop working once or twice a week causing major disruptions to our operation. They have been unable to determine the cause despite us sending them many GBs of logs and backups of the DB. ... In particular they are saying our indexes are an issue.

Oh, right, I think I've heard this joke before. Doesn't it go something like:

A duck walks into a bar and says, "Ouch!" (just kidding ;-) and the bartender says, "What'll ya have?"

The duck says, "Gimme 3 fingers of your strongest vodka."

The bartender says, almost as if he were joking, "Don't you mean 3 'feathers'?"

The duck says, "Look, I'm sorry you're no longer head writer for Everybody Loves Raymond, but it's been a rough day so could you be a pal and make with the vodka?"

The bartender says, "Sure, buddy. Hold on."

He comes back a moment later, visibly slightly less happy than when he left, and says to the duck, "Looks like we're all out of the good stuff. All we got's left is Skyy. Will that work?"

The duck jumps up on the counter, grabs the bartender by the collar with one wing (somehow), pulls a knife out of, well, somewhere with the other wing, and says, rather slowly, softly, but clearly, "I. Will. Cut. You."

The bartender, panicking, says, "Hey man, it's the database. It's slow. It's not responding."

The duck, a good bit confused as to whether or not he should just end the bartender--right here, right now--barks at him angrily, "The database? What the hell are you talkin' about?"

The bartender, now sobbing, blurts out, "I don't know...Is there any blocking going on?..It's just what we say....Can you try rebuilding indexes or something?..you know, when we don't know what else to say....maybe we should add more memory to the server...you think that would help?...everyone knows that app code is fast and databases are the bottle-neck....Hey, I've been hearing about these NoSQL databases that are <air-quotes>web-scale</air-quotes> and are usually Open Source so they're free, and like, Twitter, and Google, and the Facebook all use that stuff since relational databases are pretty much on their way out."

And with that, the duck had made up his mind...........

Hmm. Well, trust me, it's a lot funnier in the original Hungarian.

But still, why is the initial reaction of so many people, when a system slows down, to just assume that it's the database? As if app code can't be horribly written, or simply have some bugs? Things getting slower could certainly be the database. But simply locking up / freezing? That doesn't strike me as a database-specific issue.

What it does sound like is possibly some app code that is not properly releasing external resources (network sockets, file system handles, etc). If we are talking about a .NET application, sometimes developers forget to properly Dispose() of objects that have associated unmanaged resources. For example: opening up a SqlConnection object. You don't get an infinite amount of them. So if they want to look in the database then fine. But maybe, the next time the system freezes, take a quick look at:

SELECT sdec.*, '---' AS [---], sdes.*
FROM sys.dm_exec_connections sdec
INNER JOIN sys.dm_exec_sessions sdes
        ON sdes.session_id = sdec.session_id

If their code isn't releasing the connections, then it should be fairly obvious if there are too many connections, especially if many of them have long idle times.

And maybe this stuff has already been checked and simply not disclosed in the Question. But it strikes me as rather odd that they are so focused on indexes and fragmentation. Sure, there are parameter sniffing issues that sometimes cause one, or maybe a few, Stored Procedures to take a REALLY LONG time, but locking up an entire application? I'm not buying it, especially if you don't see a query that is running and taking up a lot of resources or locks or time when this happens.

So, "which to trust?" Certainly not this vendor ;-).