Sql-server – Unique index updates and statistics row modification counters

execution-plansql serverstatisticsunique-constraintupdate

Given the following table, unique clustered index, and statistics:

CREATE TABLE dbo.Banana
(
    pk integer NOT NULL, 
    c1 char(1) NOT NULL, 
    c2 char(1) NOT NULL
);

CREATE UNIQUE CLUSTERED INDEX pk ON dbo.Banana (pk);

CREATE STATISTICS c1 ON dbo.Banana (c1);
CREATE STATISTICS c2 ON dbo.Banana (c2);

INSERT dbo.Banana 
    (pk, c1, c2) 
VALUES 
    (1, 'A', 'W'), 
    (2, 'B', 'X'), 
    (3, 'C', 'Y'), 
    (4, 'D', 'Z');

-- Populate statistics
UPDATE STATISTICS dbo.Banana;

Example data

The statistics row modification counters obviously show zero before any updates:

-- Show statistics modification counters
SELECT
    stats_name = S.[name], 
    DDSP.stats_id,
    DDSP.[rows],
    DDSP.modification_counter
FROM sys.stats AS S
CROSS APPLY sys.dm_db_stats_properties(S.object_id, S.stats_id) AS DDSP
WHERE
    S.[object_id] = OBJECT_ID(N'dbo.Banana', N'U');

Zero modification counters

Incrementing each pk column value by one for every row:

-- Increment pk in every row
UPDATE dbo.Banana 
SET pk += 1;

Uses the execution plan:

Split Sort Collapse execution plan

It produces the following statistics modification counters:

Post-update modification counters

Questions

  1. What do the Split, Sort, and Collapse operators do?
  2. Why do the pk stats show 2 modifications, but c1 and c2 show 5?

Best Answer

SQL Server always uses the Split, Sort, and Collapse combination of operators when maintaining a unique index as part of an update that affects (or might affect) more than one row.

Working through the example in the question, we could write the update as a separate single-row update for each of the four rows present:

-- Per row updates
UPDATE dbo.Banana SET pk = 2 WHERE pk = 1;
UPDATE dbo.Banana SET pk = 3 WHERE pk = 2;
UPDATE dbo.Banana SET pk = 4 WHERE pk = 3;
UPDATE dbo.Banana SET pk = 5 WHERE pk = 4;

The problem is that the first statement would fail, since it changes pk from 1 to 2, and there is already a row where pk = 2. The SQL Server storage engine requires that unique indexes remain unique at every stage of processing, even within a single statement. This is the problem solved by Split, Sort, and Collapse.

Split Split

The first step is to Split each update statement into a delete followed by an insert:

DELETE dbo.Banana WHERE pk = 1;
INSERT dbo.Banana (pk, c1, c2) VALUES (2, 'A', 'W');

DELETE dbo.Banana WHERE pk = 2;
INSERT dbo.Banana (pk, c1, c2) VALUES (3, 'B', 'X');

DELETE dbo.Banana WHERE pk = 3;
INSERT dbo.Banana (pk, c1, c2) VALUES (4, 'C', 'Y');

DELETE dbo.Banana WHERE pk = 4;
INSERT dbo.Banana (pk, c1, c2) VALUES (5, 'D', 'Z');

The Split operator adds an action code column to the stream (here labelled Act1007):

Split properties

The action code is 1 for an update, 3 for a delete, and 4 for an insert.

Sort Sort

The split statements above would still produce a false transient unique key violation, so the next step is to sort the statements by the keys of the unique index being updated (pk in this case), then by the action code. For this example, this simply means that deletes (3) on the same key are ordered before inserts (4). The resulting order is:

-- Sort (pk, action)
DELETE dbo.Banana WHERE pk = 1;
DELETE dbo.Banana WHERE pk = 2;
INSERT dbo.Banana (pk, c1, c2) VALUES (2, 'A', 'W');
DELETE dbo.Banana WHERE pk = 3;
INSERT dbo.Banana (pk, c1, c2) VALUES (3, 'B', 'X');
DELETE dbo.Banana WHERE pk = 4;
INSERT dbo.Banana (pk, c1, c2) VALUES (4, 'C', 'Y');
INSERT dbo.Banana (pk, c1, c2) VALUES (5, 'D', 'Z');

Sort properties

Collapse Collapse

The preceding stage is enough to guarantee the avoidance of false uniqueness violations in all cases. As an optimization, Collapse combines adjacent deletes and inserts on the same key value into an update:

-- Collapse (pk)
DELETE dbo.Banana WHERE pk = 1;
UPDATE dbo.Banana SET c1 = 'A', c2 = 'W' WHERE pk = 2;
UPDATE dbo.Banana SET c1 = 'B', c2 = 'X' WHERE pk = 3;
UPDATE dbo.Banana SET c1 = 'C', c2 = 'Y' WHERE pk = 4;
INSERT dbo.Banana (pk, c1, c2) VALUES (5, 'D', 'Z');

The delete/insert pairs for pk values 2, 3, and 4 have been combined into an update, leaving a single delete on pk = 1, and an insert for pk = 5.

The Collapse operator groups rows by the key columns, and updates the action code to reflect the collapse outcome:

Collapse properties

Clustered Index Update Clustered index update

This operator is labelled as an Update, but it is capable of inserts, updates, and deletes. Which action is taken by the Clustered Index Update per row is determined by the value of the action code in that row. The operator has an Action property to reflect this mode of operation:

Clustered Index Update action property


Row modification counters

Note that the three updates above do not modify the key(s) of the unique index being maintained. In effect, we have transformed updates to the key columns in the index into updates of the non-key columns (c1 and c2), plus a delete and an insert. Neither a delete nor an insert can cause a false unique-key violation.

An insert or a delete affects every single column in the row, so statistics associated with every column will have their modification counters incremented. For update(s), only statistics with any of the updated columns as the leading column have their modification counters incremented (even if the value is unchanged).

The statistics row modification counters therefore show 2 changes to pk, and 5 for c1 and c2:

-- Collapse (pk)
DELETE dbo.Banana WHERE pk = 1;                         -- All columns modified
UPDATE dbo.Banana SET c1 = 'A', c2 = 'W' WHERE pk = 2;  -- c1 and c2 modified
UPDATE dbo.Banana SET c1 = 'B', c2 = 'X' WHERE pk = 3;  -- c1 and c2 modified
UPDATE dbo.Banana SET c1 = 'C', c2 = 'Y' WHERE pk = 4;  -- c1 and c2 modified
INSERT dbo.Banana (pk, c1, c2) VALUES (5, 'D', 'Z');    -- All columns modified

Note: Only changes applied to the base object (heap or clustered index) affect statistics row modification counters. Non-clustered indexes are secondary structures, reflecting changes already made to the base object. They do not affect statistics row modification counters at all.

If an object has multiple unique indexes, a separate Split, Sort, Collapse combination is used to organize the updates to each. SQL Server optimizes this case for nonclustered indexes by saving the result of the Split to an Eager Table Spool, then replaying that set for each unique index (which will have its own Sort by index keys + action code, and Collapse).

Effect on statistics updates

Automatic statistics updates (if enabled) occur when the query optimizer needs statistical information and notices that existing statistics are out of date (or invalid due to a schema change). Statistics are considered out of date when the number of modifications recorded exceed a threshold.

The Split/Sort/Collapse arrangement results in different row modifications being recorded than would be expected. This, in turn, means that a statistics update may be triggered sooner or later than would be the case otherwise.

In the example above, row modifications for the key column increase by 2 (the net change) rather than 4 (one for each table row affected), or 5 (one for each delete/update/insert produced by the Collapse).

In addition, non-key columns that were logically not changed by the original query accumulate row modifications, which may number as many as double the table rows updated (one for each delete, and one for each insert).


The number of changes recorded depend on the degree of overlap between old and new key column values (and so the degree to which the separate deletes and inserts can be collapsed). Resetting the table between each execution, the following queries demonstrate the effect on row modification counters with different overlaps:

UPDATE dbo.Banana SET pk = pk + 0; -- Full overlap

pk = pk + 0

UPDATE dbo.Banana SET pk = pk + 1;

pk = pk + 1

UPDATE dbo.Banana SET pk = pk + 2;

pk = pk + 2

UPDATE dbo.Banana SET pk = pk + 3;

pk = pk + 3

UPDATE dbo.Banana SET pk = pk + 4; -- No overlap

pk = pk + 4