SQL Server 2016 – Default Number of TempDB Files

configurationsql serversql-server-2016tempdb

Noticed the docs for SQL Server 2016 CTP2 state "by default, setup adds as many tempdb files as the CPU count or 8, whichever is lower."
https://msdn.microsoft.com/en-us/library/bb510411(v=sql.130).aspx

But I always thought that the "correct" ratio would be half the number of CPU cores, for boxes under 8 cores that is.

Should this be taken as the official recommandation from Microsoft and even applied to 2012/2014 ?

Best Answer

According to Paul Randal the number of tempdb files should be:

equal to the number of CPU cores for 8 or less cores
1/4 to 1/2 of CPU cores for more than 8 cores

I think it's a great news that Microsoft finally changed this default setting. The previous one was really bad - it caused problems with latch contention.

Paul Randal's article on the topic

Related Solutions

Sql-server – Partitioning / indexing an extremely large table

My approach is to disable all the nonclustered indexes [...] then build a clustered index for the table using this partition scheme.

Creating a clustered index on a heap automatically rebuilds all nonclustered indexes (even disabled ones). The nonclustered indexes are rebuilt but not partitioned. Assuming the desired end state is a partitioned clustered table with aligned indexes, rebuilding the nonclustered indexes to be non-aligned is entirely wasted effort.

What worries me slightly is that tempdb is now pushing nearly 1 TB and steadily climbing, despite the current table being around half that size. The MS docs I've read suggest the tempdb space usage should be about the size of the final table/clustered index.

The question of sort space is very complex. To understand all the details (including the effect of parallelism) you would need to carefully read an entire series of posts by the SQL Server Query Processing Team. Converting a heap to a partitioned clustered table with parallelism enabled is probably pretty close to worst case.

At it's most basic (neglecting most of the important information in the QP Team's posts), you are asking SQL Server to run a query like:

SELECT *
FROM DailyTable
ORDER BY
    $partition.monthly_on_primary(LoadDate),
    LoadDate,
    SeqNumber;

This query will not execute quickly, regardless of where you choose to write the sort runs that do not fit in memory to. Add to that the work of actually building a complete new copy of the entire data set in separate rowsets, and the work involved in rebuilding the nonclustered indexes pointlessly...

Advice

There are many considerations in getting this change to work efficiently. The important ones are to avoid sorting at all where possible, and to use parallel minimally-logged bulk load wherever possible.

The details of that depend on details not contained in the question, and a full solution is beyond an answer here. Nevertheless, the outline of an approach that has worked well for me personally in the past is:

Extract the existing data using bcp to one file per final partition
Drop the existing table and create the new one
Load the new table using parallel minimally-logged bulk load

The per-partition data extract needs be ordered on (LoadDate, SeqNumber). Ideally, you would avoid a sorting operation. If you have an existing nonclustered index on (LoadDate, SeqNumber) you can extract data in the right order without sorting if you construct the query correctly.

Once per-partition data has been extracted to separate files (this can be done in parallel if your hardware is up to it), the source table can then be dropped, freeing space. A new partitioned heap or clustered table is then created, and bulk loaded with the pre-sorted data, possibly also in parallel.

Done right, the entire process requires no more than 1x data size and achieves the fastest possible data transfer rates in both directions, with the least amount of log use.

SQL Server – Number of TempDB Files If Parallelism Can’t Be Used

Individual queries don't use parallelism? So what? If you have two completely different users running different queries at the same time, they too can make use of different tempdb files.

Of course it doesn't help all that much if you have four tempdb files all on the same volume - the real gain is when you can spread it out into multiple I/O paths.

Hard to speculate on "any slowdowns you're seeing" without more details.

Shouldn't be any issues with changing the file size. Just make sure they are all exactly the same size, have the exact same autogrow settings, and check to see if you should run trace flags 1117 and 1118.

Best Answer

Related Solutions

Sql-server – Partitioning / indexing an extremely large table

Advice

SQL Server – Number of TempDB Files If Parallelism Can’t Be Used

Related Question