Given the following table, unique clustered index, and statistics:
CREATE TABLE dbo.Banana
(
pk integer NOT NULL,
c1 char(1) NOT NULL,
c2 char(1) NOT NULL
);
CREATE UNIQUE CLUSTERED INDEX pk ON dbo.Banana (pk);
CREATE STATISTICS c1 ON dbo.Banana (c1);
CREATE STATISTICS c2 ON dbo.Banana (c2);
INSERT dbo.Banana
(pk, c1, c2)
VALUES
(1, 'A', 'W'),
(2, 'B', 'X'),
(3, 'C', 'Y'),
(4, 'D', 'Z');
-- Populate statistics
UPDATE STATISTICS dbo.Banana;
The statistics row modification counters obviously show zero before any updates:
-- Show statistics modification counters
SELECT
stats_name = S.[name],
DDSP.stats_id,
DDSP.[rows],
DDSP.modification_counter
FROM sys.stats AS S
CROSS APPLY sys.dm_db_stats_properties(S.object_id, S.stats_id) AS DDSP
WHERE
S.[object_id] = OBJECT_ID(N'dbo.Banana', N'U');
Incrementing each pk
column value by one for every row:
-- Increment pk in every row
UPDATE dbo.Banana
SET pk += 1;
Uses the execution plan:
It produces the following statistics modification counters:
Questions
- What do the Split, Sort, and Collapse operators do?
- Why do the
pk
stats show 2 modifications, butc1
andc2
show 5?
Best Answer
SQL Server always uses the Split, Sort, and Collapse combination of operators when maintaining a unique index as part of an update that affects (or might affect) more than one row.
Working through the example in the question, we could write the update as a separate single-row update for each of the four rows present:
The problem is that the first statement would fail, since it changes
pk
from 1 to 2, and there is already a row wherepk
= 2. The SQL Server storage engine requires that unique indexes remain unique at every stage of processing, even within a single statement. This is the problem solved by Split, Sort, and Collapse.Split
The first step is to Split each update statement into a delete followed by an insert:
The Split operator adds an action code column to the stream (here labelled Act1007):
The action code is 1 for an update, 3 for a delete, and 4 for an insert.
Sort
The split statements above would still produce a false transient unique key violation, so the next step is to sort the statements by the keys of the unique index being updated (
pk
in this case), then by the action code. For this example, this simply means that deletes (3) on the same key are ordered before inserts (4). The resulting order is:Collapse
The preceding stage is enough to guarantee the avoidance of false uniqueness violations in all cases. As an optimization, Collapse combines adjacent deletes and inserts on the same key value into an update:
The delete/insert pairs for
pk
values 2, 3, and 4 have been combined into an update, leaving a single delete onpk
= 1, and an insert forpk
= 5.The Collapse operator groups rows by the key columns, and updates the action code to reflect the collapse outcome:
Clustered Index Update
This operator is labelled as an Update, but it is capable of inserts, updates, and deletes. Which action is taken by the Clustered Index Update per row is determined by the value of the action code in that row. The operator has an Action property to reflect this mode of operation:
Row modification counters
Note that the three updates above do not modify the key(s) of the unique index being maintained. In effect, we have transformed updates to the key columns in the index into updates of the non-key columns (
c1
andc2
), plus a delete and an insert. Neither a delete nor an insert can cause a false unique-key violation.An insert or a delete affects every single column in the row, so statistics associated with every column will have their modification counters incremented. For update(s), only statistics with any of the updated columns as the leading column have their modification counters incremented (even if the value is unchanged).
The statistics row modification counters therefore show 2 changes to
pk
, and 5 forc1
andc2
:Note: Only changes applied to the base object (heap or clustered index) affect statistics row modification counters. Non-clustered indexes are secondary structures, reflecting changes already made to the base object. They do not affect statistics row modification counters at all.
If an object has multiple unique indexes, a separate Split, Sort, Collapse combination is used to organize the updates to each. SQL Server optimizes this case for nonclustered indexes by saving the result of the Split to an Eager Table Spool, then replaying that set for each unique index (which will have its own Sort by index keys + action code, and Collapse).
Effect on statistics updates
Automatic statistics updates (if enabled) occur when the query optimizer needs statistical information and notices that existing statistics are out of date (or invalid due to a schema change). Statistics are considered out of date when the number of modifications recorded exceed a threshold.
The Split/Sort/Collapse arrangement results in different row modifications being recorded than would be expected. This, in turn, means that a statistics update may be triggered sooner or later than would be the case otherwise.
In the example above, row modifications for the key column increase by 2 (the net change) rather than 4 (one for each table row affected), or 5 (one for each delete/update/insert produced by the Collapse).
In addition, non-key columns that were logically not changed by the original query accumulate row modifications, which may number as many as double the table rows updated (one for each delete, and one for each insert).
The number of changes recorded depend on the degree of overlap between old and new key column values (and so the degree to which the separate deletes and inserts can be collapsed). Resetting the table between each execution, the following queries demonstrate the effect on row modification counters with different overlaps: