Sql-server – Controlling how tempdb activity is split across drives (tempdb on both solid state and spinning drives)

disk-spacesql serversql-server-2012tempdb

I'm seeing the following tempdb I/O behavior over a one hour period:
enter image description here

There is a fair amount of disk I/O generated by multiple DW workloads executing on the machine, some of which cannot fit in the ~280GB of memory allocated to SQL. One aspect that is interesting is that so much of the I/O is concentrated on the spinning disk drives (E) rather than the solid state drives (F and G) that are handling I/O much more efficiently.

We have pre-allocated the full 300GB each (600GB total) on the F and G drives to tempdb (using 12 files), and we have pre-allocated 1.3TB to tempdb on the E drive (currently 1 file). The I/O data above suggests that tempdb usage is distributed across files based on the current size of the file. I wasn't able to find documentation on this, but I also ran a query like the following to investigate further:

-- While running this query, writes to tempdb are distributed to E/F/G drives
-- in proportion to their current size.  This was shown by both
-- sys.dm_io_virtual_file_stats and the space used on the tempdb files before and after
SELECT TOP 100000000 *
INTO #temp
FROM [A_Really_Big_Table]

The ideal behavior would be that F and G are used exclusively unless they are both full, in which case the spinning disk drive should provide additional tempdb space so that occasional very large workloads can be serviced without running out of tempdb space.

Are we on the right track in observing that tempdb usage is distributed across files based on the current size of the files? It was a little bit surprising to see this type of distribution rather than to see an equal usage of each file (which likely was the assumption of whoever set up this hardware and decided to allocate just one to tempdb on the spinning disk).

Based on Paul White's comment in response to this question, we are thinking about the following approach:

  • Shrink the tempdb files on spinning disk. Based on our inital testing, this should shift the current distribution of work more to the solid state drives
  • Configure the solid state tempdb files to pre-allocate their space (as we already do now)
  • Configure the spinning disk tempdb files to start with no allocation. Make sure that instant file initialization is on. Tempdb will grow on the spinning disk only as needed (which will likely be at most once per week).
  • Create a maintenance plan that shrinks the tempdb files on spinning disk after times of peak load, putting the distribution back in favor of the solid state tempdb files.

Does this seem reasonable? Are there any alternative approaches or potential problems to consider? We will obviously test the approach to the extent possible, but won't be able to do so on completely equivalent testing hardware.

Best Answer

File writes are distributed across files in the same file group proportionally based on the current size of each file in the file group. This is referred to as the "proportional fill algorithm" - look at http://sqlserver-performance-tuning.net/?p=2552 for some interesting details around that.

tempdb can have only a single filegroup. If you attempt to create a filegroup in tempdb you get the following:

Msg 1826, Level 16, State 1, Line 1
User-defined filegroups are not allowed on "tempdb".

If you have trace flag 1117 turned on, files within a group will autogrow simultaneously across files in the filegroup for each file that is not currently at its maximum size, and where space exists on the disk.

Does your instance have trace flag 1117 turned on? You'd probably want it off in this particular instance even though "best practices" often indicate having this turned on. There is an item on Microsoft Connect asking to have a setting like this that can be enabled/disabled on a per-database basis, here: https://connect.microsoft.com/SQLServer/feedback/details/781198/trace-flag-1117-autogrowth-of-data-files-is-instance-wide-would-like-a-flag-for-just-tempdb

Assuming the SSDs are devoted to tempdb, I agree with your assertion and suggest making the tempdb on the SSDs as big as possible (not 100% of the drive, perhaps leave 10% free). Make the tempdb files on disk as small as possible, say 1MB, with autogrowth, and maximum file size as big as you need. Monitor for tempdb filegrowth on the HDDs, and make the case for getting bigger SSDs if you think the company would benefit from it.

According to the documentation, SQL Server files can be created on raw partitions (partitions that haven't been formatted) by using just the drive letter of the partition in the ALTER DATABASE ... ADD FILE syntax. This apparently removes the need to grow or shrink the file since it inherently uses the entire raw partition, as needed. I'm uncertain if this would help your situation or not; just thought I'd throw it out there as an interesting factoid. See "If the file is on a raw partition, os_file_name must specify only the drive letter of an existing raw partition. Only one file can be put on each raw partition."