To check your request I created 2 tables following this scheme:
- 7.9 million records representing balance information.
- an identity field counting from 1 to 7.9 million
- a number field grouping the records in about 500k groups.
The first table called heap
got a non clustered index on the field group
. The second table called clust
got a clustered index on the sequential field called key
and a nonclustered index on the field group
The tests were run on an I5 M540 processor with 2 hyperthreaded cores, 4Gb memory and 64-bit windows 7.
Microsoft SQL Server 2008 R2 (RTM) - 10.50.1600.1 (X64)
Apr 2 2010 15:48:46
Developer Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1)
Update on 9 Mar 2011: I did a second more extensive benchmark by running the following .net code and logging Duration, CPU, Reads, Writes and RowCounts in Sql Server Profiler. (The CommandText used will be mentioned in the results.)
NOTE: CPU and Duration are expressed in milliseconds
- 1000 queries
- zero CPU queries are eliminated from the results
- 0 rows affected are eliminated from the results
int[] idList = new int[] { 6816588, 7086702, 6498815 ... }; // 1000 values here.
using (var conn = new SqlConnection(@"Data Source=myserver;Initial Catalog=mydb;Integrated Security=SSPI;"))
{
conn.Open();
using (var cmd = new SqlCommand())
{
cmd.Connection = conn;
cmd.CommandType = CommandType.Text;
cmd.CommandText = "select * from heap where common_key between @id and @id+1000";
cmd.Parameters.Add("@id", SqlDbType.Int);
cmd.Prepare();
foreach (int id in idList)
{
cmd.Parameters[0].Value = id;
using (var reader = cmd.ExecuteReader())
{
int count = 0;
while (reader.Read())
{
count++;
}
Console.WriteLine(String.Format("key: {0} => {1} rows", id, count));
}
}
}
}
End of Update on 9 Mar 2011.
SELECT performance
To check performanc numbers I performed the following queries once on the heap table and once on the clust table:
select * from heap/clust where group between 5678910 and 5679410
select * from heap/clust where group between 6234567 and 6234967
select * from heap/clust where group between 6455429 and 6455729
select * from heap/clust where group between 6655429 and 6655729
select * from heap/clust where group between 6955429 and 6955729
select * from heap/clust where group between 7195542 and 7155729
The results of this benchmark are for the heap
:
rows reads CPU Elapsed
----- ----- ----- --------
1503 1510 31ms 309ms
401 405 15ms 283ms
2700 2709 0ms 472ms
0 3 0ms 30ms
2953 2962 32ms 257ms
0 0 0ms 0ms
Update on 9 Mar 2011:
cmd.CommandText = "select * from heap where group between @id and @id+1000";
- 721 Rows have > 0 CPU and affect more than 0 rows
Counter Minimum Maximum Average Weighted
--------- ------- ---------- ------- ---------
RowCounts 1001 69788 6368 -
Cpu 15 374 37 0.00754
Reads 1069 91459 7682 1.20155
Writes 0 0 0 0.00000
Duration 0.3716 282.4850 10.3672 0.00180
End of Update on 9 Mar 2011.
for the table clust
the results are:
rows reads CPU Elapsed
----- ----- ----- --------
1503 4827 31ms 327ms
401 1241 0ms 242ms
2700 8372 0ms 410ms
0 3 0ms 0ms
2953 9060 47ms 213ms
0 0 0ms 0ms
Update on 9 Mar 2011:
cmd.CommandText = "select * from clust where group between @id and @id+1000";
- 721 Rows have > 0 CPU and affect more than 0 rows
Counter Minimum Maximum Average Weighted
--------- ------- ---------- ------- ---------
RowCounts 1001 69788 6056 -
Cpu 15 468 38 0.00782
Reads 3194 227018 20457 3.37618
Writes 0 0 0 0.0
Duration 0.3949 159.6223 11.5699 0.00214
End of Update on 9 Mar 2011.
SELECT WITH JOIN performance
cmd.CommandText = "select * from heap/clust h join keys k on h.group = k.group where h.group between @id and @id+1000";
The results of this benchmark are for the heap
:
873 Rows have > 0 CPU and affect more than 0 rows
Counter Minimum Maximum Average Weighted
--------- ------- ---------- ------- ---------
RowCounts 1009 4170 1683 -
Cpu 15 47 18 0.01175
Reads 2145 5518 2867 1.79246
Writes 0 0 0 0.00000
Duration 0.8215 131.9583 1.9095 0.00123
The results of this benchmark are for the clust
:
865 Rows have > 0 CPU and affect more than 0 rows
Counter Minimum Maximum Average Weighted
--------- ------- ---------- ------- ---------
RowCounts 1000 4143 1685 -
Cpu 15 47 18 0.01193
Reads 5320 18690 8237 4.97813
Writes 0 0 0 0.00000
Duration 0.9699 20.3217 1.7934 0.00109
UPDATE performance
The second batch of queries are update statements:
update heap/clust set amount = amount + 0 where group between 5678910 and 5679410
update heap/clust set amount = amount + 0 where group between 6234567 and 6234967
update heap/clust set amount = amount + 0 where group between 6455429 and 6455729
update heap/clust set amount = amount + 0 where group between 6655429 and 6655729
update heap/clust set amount = amount + 0 where group between 6955429 and 6955729
update heap/clust set amount = amount + 0 where group between 7195542 and 7155729
the results of this benchmark for the heap
:
rows reads CPU Elapsed
----- ----- ----- --------
1503 3013 31ms 175ms
401 806 0ms 22ms
2700 5409 47ms 100ms
0 3 0ms 0ms
2953 5915 31ms 88ms
0 0 0ms 0ms
Update on 9 Mar 2011:
cmd.CommandText = "update heap set amount = amount + @id where group between @id and @id+1000";
- 811 Rows have > 0 CPU and affect more than 0 rows
Counter Minimum Maximum Average Weighted
--------- ------- ---------- ------- ---------
RowCounts 1001 69788 5598 811
Cpu 15 873 56 0.01199
Reads 2080 167593 11809 2.11217
Writes 0 1687 121 0.02170
Duration 0.6705 514.5347 17.2041 0.00344
End of Update on 9 Mar 2011.
the results of this benchmark for the clust
:
rows reads CPU Elapsed
----- ----- ----- --------
1503 9126 16ms 35ms
401 2444 0ms 4ms
2700 16385 31ms 54ms
0 3 0ms 0ms
2953 17919 31ms 35ms
0 0 0ms 0ms
Update on 9 Mar 2011:
cmd.CommandText = "update clust set amount = amount + @id where group between @id and @id+1000";
- 853 Rows have > 0 CPU and affect more than 0 rows
Counter Minimum Maximum Average Weighted
--------- ------- ---------- ------- ---------
RowCounts 1001 69788 5420 -
Cpu 15 594 50 0.01073
Reads 6226 432237 33597 6.20450
Writes 0 1730 110 0.01971
Duration 0.9134 193.7685 8.2919 0.00155
End of Update on 9 Mar 2011.
DELETE benchmarks
the third batch of queries I ran are delete statements
delete heap/clust where group between 5678910 and 5679410
delete heap/clust where group between 6234567 and 6234967
delete heap/clust where group between 6455429 and 6455729
delete heap/clust where group between 6655429 and 6655729
delete heap/clust where group between 6955429 and 6955729
delete heap/clust where group between 7195542 and 7155729
The result of this benchmark for the heap
:
rows reads CPU Elapsed
----- ----- ----- --------
1503 10630 62ms 179ms
401 2838 0ms 26ms
2700 19077 47ms 87ms
0 4 0ms 0ms
2953 20865 62ms 196ms
0 4 0ms 9ms
Update on 9 Mar 2011:
cmd.CommandText = "delete heap where group between @id and @id+1000";
- 724 Rows have > 0 CPU and affect more than 0 rows
Counter Minimum Maximum Average Weighted
--------- ------- ---------- ------- ---------
RowCounts 192 69788 4781 -
Cpu 15 499 45 0.01247
Reads 841 307958 20987 4.37880
Writes 2 1819 127 0.02648
Duration 0.3775 1534.3383 17.2412 0.00349
End of Update on 9 Mar 2011.
the result of this benchmark for the clust
:
rows reads CPU Elapsed
----- ----- ----- --------
1503 9228 16ms 55ms
401 3681 0ms 50ms
2700 24644 46ms 79ms
0 3 0ms 0ms
2953 26955 47ms 92ms
0 3 0ms 0ms
Update on 9 Mar 2011:
cmd.CommandText = "delete clust where group between @id and @id+1000";
- 751 Rows have > 0 CPU and affect more than 0 rows
Counter Minimum Maximum Average Weighted
--------- ------- ---------- ------- ---------
RowCounts 144 69788 4648 -
Cpu 15 764 56 0.01538
Reads 989 458467 30207 6.48490
Writes 2 1830 127 0.02694
Duration 0.2938 2512.1968 24.3714 0.00555
End of Update on 9 Mar 2011.
INSERT benchmarks
The last part of the benchmark is the execution of insert statements.
insert into heap/clust (...)
values (...),
(...),
(...),
(...),
(...),
(...)
The result of this benchmark for the heap
:
rows reads CPU Elapsed
----- ----- ----- --------
6 38 0ms 31ms
Update on 9 Mar 2011:
string str = @"insert into heap (group, currency, year, period, domain_id, mtdAmount, mtdAmount, ytdAmount, amount, ytd_restated, restated, auditDate, auditUser)
values";
for (int x = 0; x < 999; x++)
{
str += string.Format(@"(@id + {0}, 'EUR', 2012, 2, 0, 100, 100, 1000 + @id,1000, 1000,1000, current_timestamp, 'test'), ", x);
}
str += string.Format(@"(@id, 'CAD', 2012, 2, 0, 100, 100, 1000 + @id,1000, 1000,1000, current_timestamp, 'test') ", 1000);
cmd.CommandText = str;
- 912 statements have > 0 CPU
Counter Minimum Maximum Average Weighted
--------- ------- ---------- ------- ---------
RowCounts 1000 1000 1000 -
Cpu 15 2138 25 0.02500
Reads 5212 7069 6328 6.32837
Writes 16 34 22 0.02222
Duration 1.6336 293.2132 4.4009 0.00440
End of Update on 9 Mar 2011.
The result of this benchmark for the clust
:
rows reads CPU Elapsed
----- ----- ----- --------
6 50 0ms 18ms
Update on 9 Mar 2011:
string str = @"insert into clust (group, currency, year, period, domain_id, mtdAmount, mtdAmount, ytdAmount, amount, ytd_restated, restated, auditDate, auditUser)
values";
for (int x = 0; x < 999; x++)
{
str += string.Format(@"(@id + {0}, 'EUR', 2012, 2, 0, 100, 100, 1000 + @id,1000, 1000,1000, current_timestamp, 'test'), ", x);
}
str += string.Format(@"(@id, 'CAD', 2012, 2, 0, 100, 100, 1000 + @id,1000, 1000,1000, current_timestamp, 'test') ", 1000);
cmd.CommandText = str;
- 946 statements have > 0 CPU
Counter Minimum Maximum Average Weighted
--------- ------- ---------- ------- ---------
RowCounts 1000 1000 1000 -
Cpu 15 2403 21 0.02157
Reads 6810 8997 8412 8.41223
Writes 16 25 19 0.01942
Duration 1.5375 268.2571 6.1463 0.00614
End of Update on 9 Mar 2011.
Conclusions
Although there are more logical reads going on when accessing the table with the clustered & the nonclustered index (while using the nonclustered index) the performance results are:
- SELECT statements are comparable
- UPDATE statements are faster with a clustered index in place
- DELETE statements are faster with a clustered index in place
- INSERT statements are faster with a clustered index in place
Of course my benchmark was very limited on a specific kind of table and with a very limited set of queries, but I think that based on this information we can already start saying that it is virtually always better to create a clustered index on your table.
Update on 9 Mar 2011:
As we can see from the added results, the conclusions on the limited tests were not correct in every case.
The results now indicate that the only statements which benefit from the clustered index are the update statements. The other statements are about 30% slower on the table with clustered index.
Some additional charts where I plotted the weighted duration per query for heap vs clust.
As you can see the performance profile for the insert statements is quite interesting. The spikes are caused by a few data points which take a lot longer to complete.
End of Update on 9 Mar 2011.
Best Answer
The perfmon counters [\SQLServer\Databases(*)\Bulk Copy Rows/sec] can help determine whether the SSIS package is performing bulk insert. You want bulk insert, especially if the underlying table is ever converted to a clustered columnstore.
In your ETL, I'm assuming that each ETL involves X+Y rows, where X is a number of rows that are updates and thus are deleted from the ~130 million row target table and Y are net new rows. A delete is performed on the target table and then all X+Y rows are inserted.
(It's also possible that this ETL simply inserts Y new rows with each iteration, and deletes X old rows past some cutoff point - eg deletes all rows past 30 days old. That makes some of this designing/planning easier especially if the ~130 million row target table stays fairly stable in size over time, rather than consistently growing over time.)
It's very important to know whether this schema and process is to be optimized for the ETL, optimized for the non-ETL workloads, or balanced between them.
For some system and workload combinations, ETL is done within an acceptable amount of time just about however it's done. In such a case, optimization can almost completely focus on the non-ETL workloads. The maxims then are "write the data how it will be read" and "keep it simple". No need to worry about the page split behavior (and transaction log ramifications) of a clustered index vs heap, or that clustered index insert will most likely always be a serialization point in the ETL. It all becomes: if there's a key - either single or combination - that is meaningful for the reporting queries and with a significant number of the table columns, that can be considered as a candidate for the clustered index key.
Fragmentation of both the clustered index (if adopted) and the nonclustered index can be managed according to the same considerations. (But index maintenance is usually more of a consideration for non-ETL workloads than ETL workloads.) I usually focus on index reorganizations rather than rebuilds, because what I really care about is keeping the pages pretty full. One reason I like looking at it from that perspective: it is much less expensive to keep track of over time. I can just watch how many mb each million rows is on average, before and after a reorg. That can be measured fairly inexpensively, in comparison to using sys.dm_db_index_physical_stats. There are two inherent advantages to rebuild vs reorg: an index rebuild automatically includes a fullscan stats update, and an index rebuild can be performed in parallel while a reorg is always serial.
Now, what if ETL performance optimization takes precedence over everything else? Then you don't want a non-clustered index on the heap, unless it's used to make the deletes faster by making a more efficient plan available. It's possible that the added cost of maintaining that non-clustered index during the ETL inserts is a significant drag on ETL performance, even if it speeds up the deletes. In that case, consider creating the non-clustered index before the deletes. And dropping it immediately after the deletes. That way it never has to be maintained with REORG or REBUILD. And the deletes get the benefit while the inserts don't have to worry about the cost. But that has to be measured in order to know if it's benefecial. And the measurement needs to be re-visited as the table grows to 2x, 3x size etc. Or if the number of deleted rows in each ETL changes signficantly.
If ETL performance takes precedence over everything else, probably don't want a clustered index on that table, either. Here's one of the performance pitfalls of heavy insert activity into a clustered index: if its estimated at more than 100 rows, its going to be sorted before the insert. Depending on the size of the insert, the maximum memory grant, and the grant that insert query recieves that may mean tempdb spill - and a big increase in elapsed time for the insert.
All right. Chances are, this system and workload hint toward performance optimizations needed for both ETL and non-ETL workloads. Maybe the non-clustered index is used both for the ETL deletes and by reporting queries? In that case, consider dropping it before the inserts, and creating it after the inserts.
This blog post from Paul White is one I think about a lot. He describes wide, per-index plans and narrow, per-row plans here.
Optimizing T-SQL queries that change data 26 January 2013 https://www.sql.kiwi/2013/01/optimizing-t-sql-queries-that-change-data.html
Maybe keeping the non-clustered index in place for both deletes and inserts is the best choice for your needs. Maybe creating before the deletes, and dropping after the deletes. Maybe dropping before the deletes, and creating after the inserts. It depends which query plans actually use the index, and the relative cost of creating the index vs keeping it around for the activity.
Similarly, maybe a clustered index will be beneficial to the deletes and to the reporting queries (although if the clustered index is meaningful to the deletes the non-clustered index is less likely to be meaningful to the deletes). It'll be a net cost to the inserts, though, compared to a heap.
And... yeah, I agree that in general creating a clustered index and then dropping the clustered index in order to manage fragmentation/average fill in the heap is not something to be done on a regular basis. I'm much more in favor of making sure TABLOCK/TABLOCKX hints are used as needed in delete statements, so page locks are taken and pages emptied by the delete can be deallocated from the heap.
If a clustered index will be created and dropped in order to manage fragmentation of a heap as a once-in-a-while-when-really-needed thing, I recommend taking such an action at a time that the NCI(s) can be dropped before the clustered index is created, and then NCI(s) created again after the clustered index is dropped. Because creating a clustered index on a heap with NCIs will rebuild the NCI(s). And dropping the clustered index will rebuild the NCI(s) again. Droppng the NCI(s) beforehand and creating the NCI(s) again at the end reduces that rigamarole by about half.
Hopefully, rather than creating a clustered index just to drop it again, if once-in-a-while intervention is needed on the heap, it can take the form of ALTER TABLE... MOVE. For that operation, too, I recommend dropping NCI(s) beforehand, and creating NCI(s) afterward. I like having individual control over the NCI rebuilds rather than having them take place automatically due to an action taken on the underlying heap.
One important variable to keep in mind in evaluating these options: the recovery model of the database in question. If transaction log operations are a concern either for performance or transaction log space and the database is in simple recovery model, engaging minimally logged operations wherever possible may be a big gain. If the database is in full recovery model and the txlog is a concern, page splits in a clustered index (or in the NCI for that matter) become a bigger deal. For full recovery model, the benefit of sort_in_tempdb for index create or rebuild operations increases (unless tempdb is more of a sore spot for the instance than the txlog is for the database).