We have the user and system data files on the same disk drive. The ( io_stall_write_ms / ( 1.0 + num_of_writes ) ) is below 2 for the user files but the tempdb files are typically over 400. I see that on a few servers and I'm curious if there is a reason it takes longer to write to tempdb than a regular database data file.
SELECT DISTINCT UPPER(LEFT(mf.physical_name, 1)) AS Directory,
( io_stall_write_ms / ( 1.0 + num_of_writes ) ) as result,
io_stall_write_ms, num_of_writes,
fs.database_id,
fs.[file_id]
FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS fs
INNER JOIN sys.master_files AS mf ON fs.database_id = mf.database_id
AND fs.[file_id] = mf.[file_id]
Thank you,
Best Answer
Short Answer: Seeing higher IO stalls may or may not be a problem in an of itself. You need to look at more information to suss out if it you have an issue. It does seem a bit high, yes, but are you suffering? If so, it is probably because either your IO system is not handling the load right (because it can't, because you have everything on one drive or some other reason) or you are doing too much in TempDB (changing the first problem - the IO performance - is probably an easier and more efficient fix, but first determine if you have a problem)
The longer discussion/answer:
There are two questions at play here -
1.) What do I do when I see high IO Stalls?
First off, "high" is in the eye of the beholder. If you were to ask 10 DBAs what "too high" is for IO stalls you'd probably get 2-3 different answers with numbers in them, 5-6 "It depends" answers and one blank stare. My assumption is an average of 400ms is potentially too high here, especially when the other DBs are 2ms or lower for the average stall time.
Regardless of which database is seeing the high stalls you should approach it the same way. An IO stall is what it sounds like... An IO request taking longer than expected.. Stalling. These happen. They happen all the time in a system with resources being shared and finite resources (really all of our systems). They become an issue when the stalls become performance problems or lead to them. So I trust that you are looking here as a proactive part of monitoring or because you were experiencing performance issues that you are troubleshooting. We also don't want to get lost in just IO stalls. We are looking at a piece of the puzzle and not the big picture. It can be troublesome to just look at wait stats or file stats since SQL was last restarted because you are looking at all time and some maintenance window or heavy load window could skew counters. So make sure you look at the full picture.
But when I suspect I have a disk performance issue or see something off in a query like this, I normally follow a process that looks like:
PAGEIOLATCH_*
,IO_COMPLETION
,WRITELOG
, etc. ?). If you do this is another indication that you have some IO related performance issues, just like the IO stalls. But it gives you another form of agreement here.Physical Disk:Avg Disk Sec/Read
andAvg Sec Disk Sec/Write
counters. These measure your latency. Watch these counters over a period of time saved to a performance log file. What did you see for averages? If you are seeing numbers over 0.020 seconds (20ms) this could be an issue. If you see numbers over 40-50ms avg or higher is a more firm indication of a problem. Also look at your spikes? How high do they go and how long do they last? If you see spikes into the hundreds of ms and they last for tens or scores of seconds or more and/or happen frequently you are more likely to have an issue with your IO performance for your workload.(Note: for this wait stats analysis and perfmon analysis - look at various periods and type of usage. Do you have different usage statistics at night than you do during the day? Batch processing windows? Maintenance windows where you rebuild a lot of indexes? Look at these tools during each of these periods and understand what you are seeing for each)
Another IO performance consideration here -
2.) What are some reasons TempDB could be higher?
So TempDB is a database and it can have IO stalls like any other database as I just discussed. But what are some reasons TempDB can have higher reads? (not exhaustive, I welcome additions or thoughts in edits, other answers or comments) -
The point is - TempDB is used in a lot of ways, and it doesn't surprise me at all to see it as one of your busiest, if not the busiest, database. It also doesn't surprise me when I see it as having the highest number of and highest average stalls of all databases at a client's site. It is the nature of its workload sometimes. Looking at some of the things I've mentioned here can certainly help you determine if these numbers indicate a problem and if so, how to go deeper in solving it.