My approach is to disable all the nonclustered indexes [...] then build a clustered index for the table using this partition scheme.
Creating a clustered index on a heap automatically rebuilds all nonclustered indexes (even disabled ones). The nonclustered indexes are rebuilt but not partitioned. Assuming the desired end state is a partitioned clustered table with aligned indexes, rebuilding the nonclustered indexes to be non-aligned is entirely wasted effort.
What worries me slightly is that tempdb is now pushing nearly 1 TB and steadily climbing, despite the current table being around half that size. The MS docs I've read suggest the tempdb space usage should be about the size of the final table/clustered index.
The question of sort space is very complex. To understand all the details (including the effect of parallelism) you would need to carefully read an entire series of posts by the SQL Server Query Processing Team. Converting a heap to a partitioned clustered table with parallelism enabled is probably pretty close to worst case.
At it's most basic (neglecting most of the important information in the QP Team's posts), you are asking SQL Server to run a query like:
SELECT *
FROM DailyTable
ORDER BY
$partition.monthly_on_primary(LoadDate),
LoadDate,
SeqNumber;
This query will not execute quickly, regardless of where you choose to write the sort runs that do not fit in memory to. Add to that the work of actually building a complete new copy of the entire data set in separate rowsets, and the work involved in rebuilding the nonclustered indexes pointlessly...
Advice
There are many considerations in getting this change to work efficiently. The important ones are to avoid sorting at all where possible, and to use parallel minimally-logged bulk load wherever possible.
The details of that depend on details not contained in the question, and a full solution is beyond an answer here. Nevertheless, the outline of an approach that has worked well for me personally in the past is:
- Extract the existing data using
bcp
to one file per final partition
- Drop the existing table and create the new one
- Load the new table using parallel minimally-logged bulk load
The per-partition data extract needs be ordered on (LoadDate, SeqNumber)
. Ideally, you would avoid a sorting operation. If you have an existing nonclustered index on (LoadDate, SeqNumber) you can extract data in the right order without sorting if you construct the query correctly.
Once per-partition data has been extracted to separate files (this can be done in parallel if your hardware is up to it), the source table can then be dropped, freeing space. A new partitioned heap or clustered table is then created, and bulk loaded with the pre-sorted data, possibly also in parallel.
Done right, the entire process requires no more than 1x data size and achieves the fastest possible data transfer rates in both directions, with the least amount of log use.
The clustered index "is" the table. Rebuilding the index will require enough space to duplicate each existing non-deleted row, page-by-page. Since you don't have Enterprise Edition, you do not have the luxury of rebuilding the index online. This means the table will not be accessible during the rebuild operation.
You'd likely be better off rebuilding by creating a duplicate empty table on a different drive that has plenty of free space with room for foreseeable growth, moving data row-by-row in key order, then renaming the old and new tables. For instance if the original table is Table1 and the new copy is Table2, rename Table1 as Table1-old then rename Table2 as Table1.
Best Answer
First you need to figure out what the actual problem is. The sys.dm_os_wait_stats DMV can help with that. Most values in there are accumulative, so you need to capture it a few times over an extended period to see what is actually going on.
One of the things that this DMV can tell you is if you have RAM pressure. The graphic you posted is basically useless as SQL Server uses all memory it can get (if it needs to). So this is only showing that that mechanism is working.
Once you figured out what your biggest problem is, come back here to get more help.
There are a few wait_types that can be considered noise. You can filter many of those out with this query:
(based on http://www.sqlskills.com/blogs/paul/survey-what-is-the-highest-wait-on-your-system/)