Sql-server – SQL Server clustered index, index balancing and insert performance using NewID

clustered-indexinsertnonclustered-indexperformancesql server

I have a large (6db) trace table. It has a clustered key (DateTime) which is created through GETDATE().

The connection pool for connections to this database/table rises as high as 50 on average across a cluster of 10 computers, so on average we have at ~500 concurrent connections attempting to insert.

The database fits in memory and hardly any IO is seen at all.

I am trying to figure out whether under sustained INSERT load the clustered index gets to a point where it rebalances the tree, and whether this will cause a slowdown in the number of inserts that the system can sustain.

There is some question in my mind as to whether the rebalancing an index is something SQL Server does on a clustered index (and even on a non-clustered index).

Questions-

Are there any reasons for periodic/cyclic slow-down of insert performance?
Do rebalance operations automatically trigger on clustered indexes?
Do rebalance operations automatically trigger on non-clustered indexes?

Other info

SQL Server 2008
Really BIG server – 256Gb, 40 cores, 40mbit LAN…

Best Answer

Are there any reasons for periodic/cyclic slow-down of insert performance?

Yes. check point events. With a write intensive workload, big RAM server, as you describe, a large number of 'dirty' pages accumulate in memory. At the predetermined checkpoint interval all these dirty pages get written to disk, causing a spike of IO requests. This in turn slows down the log commit writes, which manifests as the increase in INSERT response time you observe periodically. QED. This is, of course, just a guess, in lack of a proper investigation. For a more certain response, I recommend you read How to analyse SQL Server performance and apply the techniques described there to identify the problem.

If the problem is indeed caused by checkpoint, then SQL Server 2012 comes with Indirect Checkpoints:

Indirect checkpoints, new in SQL Server 2012, provide a configurable database-level alternative to automatic checkpoints. ... Indirect checkpoints reduce checkpoint-related I/O spiking by continually writing dirty pages to disk in the background.

For a more detailed discussion about chekcpoint impact on performance read SQL Q&A: Fine Tuning for Optimal Performance:

In Search of Spikes
Q. I’m troubleshooting an issue where we see periodic I/O spikes from one of our SQL Servers. I’ve narrowed it down to checkpoints using PerfMon, but I can’t tell which database is the major culprit. How can I drill in further?

Pre-SQL Server 2012 you have the option to reduce the recovery interval value. This will increase the frequency of checkpoints, but will reduce the number of dirty pages each checkpoint has to write. Spreading out the data IO helps (buy more spindles). Separating the log IO to it's own path (own spindle) does not help the checkpoint, but isolates the log commits from the effects and thus keep the INSERT responsive. SSDs work miracles.

I would advice against any structural changes. In my opinion you already have the best clustered index for time series. Any structural change would have to be backed by root-cause- performance analysis pointing to the current structure as a problem.

Related Solutions

Sql-server – Clustered monotonically increased index insert performance

The relative cost of the components of an execution plan is not always reliable.

The INSERTED scan or deleted scan shouts TRIGGER to me. This is your problem most likely.

Are you looping over 400 rows and sending an email in an extreme example?

Sql-server – Efficient INSERT INTO a Table With Clustered Index

As the other answers already indicate SQL Server may or may not explicitly ensure that the rows are sorted in clustered index order prior to the insert.

This is dependant upon whether or not the clustered index operator in the plan has the DMLRequestSort property set (which in turn depends upon the estimated number of rows that are inserted).

If you find that SQL Server is underestimating this for whatever reason you might benefit from adding an explicit ORDER BY to the SELECT query to minimize page splits and ensuing fragmentation from the INSERT operation

Example:

use tempdb;

GO

CREATE TABLE T(N INT PRIMARY KEY,Filler char(2000))

CREATE TABLE T2(N INT PRIMARY KEY,Filler char(2000))

GO

DECLARE @T TABLE (U UNIQUEIDENTIFIER PRIMARY KEY DEFAULT NEWID(),N int)

INSERT INTO @T(N)
SELECT number 
FROM master..spt_values
WHERE type = 'P' AND number BETWEEN 0 AND 499

/*Estimated row count wrong as inserting from table variable*/
INSERT INTO T(N)
SELECT T1.N*1000 + T2.N
FROM @T T1, @T T2

/*Same operation using explicit sort*/    
INSERT INTO T2(N)
SELECT T1.N*1000 + T2.N
FROM @T T1, @T T2
ORDER BY T1.N*1000 + T2.N


SELECT avg_fragmentation_in_percent,
       fragment_count,
       page_count,
       avg_page_space_used_in_percent,
       record_count
FROM   sys.dm_db_index_physical_stats(2, OBJECT_ID('T'), NULL, NULL, 'DETAILED')
;  


SELECT avg_fragmentation_in_percent,
       fragment_count,
       page_count,
       avg_page_space_used_in_percent,
       record_count
FROM   sys.dm_db_index_physical_stats(2, OBJECT_ID('T2'), NULL, NULL, 'DETAILED')
;

Shows that T is massively fragmented

avg_fragmentation_in_percent fragment_count       page_count           avg_page_space_used_in_percent record_count
---------------------------- -------------------- -------------------- ------------------------------ --------------------
99.3116118225536             92535                92535                67.1668272794663               250000
99.5                         200                  200                  74.2868173956017               92535
0                            1                    1                    32.0978502594514               200

But for T2 fragmentation is minimal

avg_fragmentation_in_percent fragment_count       page_count           avg_page_space_used_in_percent record_count
---------------------------- -------------------- -------------------- ------------------------------ --------------------
0.376                        262                  62500                99.456387447492                250000
2.1551724137931              232                  232                  43.2438349394613               62500
0                            1                    1                    37.2374598468001               232

Conversely sometimes you might want to force SQL Server to underestimate the row count when you know the data is already pre-sorted and wish to avoid an unnecessary sort. One notable example is when inserting a large number of rows into a table with a newsequentialid clustered index key. In versions of SQL Server prior to Denali SQL Server adds an unnecessary and potentially expensive sort operation. This can be avoided by

DECLARE @var INT =2147483647

INSERT INTO Foo
SELECT TOP (@var) *
FROM Bar

SQL Server will then estimate that 100 rows will be inserted irrespective of the size of Bar which is below the threshold at which a sort is added to the plan. However as pointed out in the comments below this does mean that the insert will unfortunately not be able to take advantage of minimal logging.

Best Answer

Related Solutions

Sql-server – Clustered monotonically increased index insert performance

Sql-server – Efficient INSERT INTO a Table With Clustered Index

Related Question