Sql-server – Nonclustered Index Insert

sql server

Say I have a table like this:

create table SomeTable
(
    id int identity(1, 1) not null primary key clustered,
    SomeString1 varchar(50) not null,
    SomeString2 varchar(50) not null
)
go

create nonclustered index IX_SomeString1
on SomeTable(SomeString1)
go

If I was to do this:

insert into SomeTable(SomeString1, SomeString2)
values('foo', 'bar')
go

And view the actual execution plan, I only see a Clustered Index Insert. Why am I not seeing a Nonclustered Index Insert in the execution plan?

Best Answer

For a single row insert you get a narrow/per-row plan

INSERT INTO SomeTable(SomeString1, SomeString2)
SELECT TOP 1 type, type
FROM master..spt_values

Narrow plan

If you select the Clustered Index Insert Operator and View the properties window you can see the same information as shown in the XML.

If you try for 1,000 rows

INSERT INTO SomeTable(SomeString1, SomeString2)
SELECT TOP 1000 type, type
FROM master..spt_values

You get a different wide/per-index plan with the operations split out separately

See Wide vs. Narrow Plans or Craig Freedman's blog for more information about the two

Related Solutions

Sql-server – SELECT/INSERT Deadlock

On the face of it, this looks like a classic lookup deadlock. The essential ingredients for this deadlock pattern are:

a SELECT query that uses a non-covering nonclustered index with a Key Lookup
an INSERT query that modifies the clustered index and then the nonclustered index

The SELECT accesses the nonclustered index first, then the clustered index. The INSERT access the clustered index first, then the nonclustered index. Accessing the same resources in a different order acquiring incompatible locks is a great way to 'achieve' a deadlock of course.

In this case, the SELECT query is:

SELECT query

...and the INSERT query is:

INSERT query

Notice the green highlighted non-clustered indexes maintenance.

We would need to see the serial version of the SELECT plan in case it is very different from the parallel version, but as Jonathan Kehayias notes in his guide to Handling Deadlocks, this particular deadlock pattern is very sensitive to timing and internal query execution implementation details. This type of deadlock often comes and goes without an obvious external reason.

Given access to the system concerned, and suitable permissions, I am certain we could eventually work out exactly why the deadlock occurs with the parallel plan but not the serial (assuming the same general shape). Potential lines of enquiry include checking for optimized nested loops and/or prefetching - both of which can internally escalate the isolation level to REPEATABLE READ for the duration of the statement. It is also possible that some feature of parallel index seek range assignment contributes to the issue. If the serial plan becomes available, I might spend some time looking into the details further, as it is potentially interesting.

The usual solution for this type of deadlocking is to make the index covering, though the number of columns in this case might make that impractical (and besides, we are not supposed to mess with such things on SharePoint, I am told). Ultimately, the recommendation for serial-only plans when using SharePoint is there for a reason (though not necessarily a good one, when it comes right down to it). If the change in cost threshold for parallelism fixes the issue for the moment, this is good. Longer term, I would probably look to separate the workloads, perhaps using Resource Governor so that SharePoint internal queries get the desired MAXDOP 1 behaviour and the other application is able to use parallelism.

The question of exchanges appearing in the deadlock trace seems a red herring to me; simply a consequence of the independent threads owning resources which technically must appear in the tree. I cannot see anything to suggest that the exchanges themselves are contributing directly to the deadlocking issue.

Sql-server – Should the index on an identity column be nonclustered

By default the PK is clustered and in most cases, this is fine. However, which question should be asked:

should my PK be clustered?
which column(s) will be the best key for my clustered index?

PK and Clustered index are 2 differences things:

PK is a constraint. PK is used to uniquely identify rows, but there is no notion of storage. However by default (in SSMS), it is enforced by a unique clustered index if a clustered index is not yet present.
Clustered indexes is a special type of index which store row data at the leaf level, meaning it is always covering. All columns whether they are part of the key or not, are stored at the leaf level. It does not have to be unique, in which case a uniquifier (4 bytes) is added to the clustered key.

Now we end up with 2 questions:

How do I want to uniquely identify rows in my table (PK)
How do I want to store it at the leaf level of an index (Clustered Index)

It depends on how:

you design your data model
you query your data and you write your queries
you insert or update your data
...

First, do you need a clustered index? If you bulk insert, it is more efficient to store unordered data to a HEAP (versus ordered data in a cluster). It uses RID (Row Identifier, 8 bytes) to uniquely identify rows and store it on pages.

The clustered index should not be a random value. The data at the leaf level will be stored and ordered by the index key. Therefore it should grow continuously in order to avoid fragmentation or page split. If this can not be achieved by the PK, you should consider another key as a clustered candidate. Clustered index on identy columns, sequential GUID or even something like the insertion's date is fine from a sequential point of view since all rows will be added to the last leaf page. On the other hand, while unique identifier may be useful to your business needs as a PK, they should not be clustered (they are randomly ordered/generated).

If after some data and query analysis, you find out that you mostly use the same index to get your data before doing a key lookup in the clustered PK, you may consider it as clustered index although it may not uniquely identify your data.

The clustered index key is composed of all the columns you want to index. A uniquefier column (4 bytes) is added if there is no unique constraint on it (incremental value for duplicates, null otherwise). This index key will then be stored once for each row at the leaf level of all your nonclustered indexes. Some of them will also be stored several times at intermediate levels (branch) between the root and the leaf level of the index tree (B-tree). If the key is too large, all the non clustered index will get larger, will require more storage and more IO, CPU, memory, ... If you have a PK on name+birthdate+country, it is very likely that this key is not a good candidate. It is too large for a clustered index. Uniqueidentifier using NEWSEQUENTIALID() is usually not considered as a narrow key (16 bytes) although it is sequential.

Then once you figured out how to uniquely identify rows in your table, you can add a PK. If you think you won't use it in your query, don't create it clustered. You can still create another nonclustered index if you sometime need to query it. Note that the PK will automaticaly create a unique index.

The non clustered indexes will always contain the clustered key. However, if the indexed columns (+key columns) are covering, there won't be any key lookup in the clustered index. Don't forget you can also add Include and Where to a non clustered index. (use it wisely)

Clustered index should be unique and as narrow as possible Clustered index should not change over time and should inserted incrementally.

It is now time to write some SQL which will create the table, clustered and nonclustered indexes and constraints.

This is all theoritical because we don't know your data model and datatypes used (A and B).

Best Answer

Related Solutions

Sql-server – SELECT/INSERT Deadlock

Sql-server – Should the index on an identity column be nonclustered

Related Question