I agree that there is no point in creating an IDENTITY
column clustered index here but I would just create the composite PK as CLUSTERED
rather than NONCLUSTERED
.
There is no point having 2 copies of the data, One in the heap that never gets used and one in the NCI. In fact it is downright counter productive as shown below.
Your NCI will still get page splits anyway if you are not inserting into the table in order of primary key so you have not avoided the problem. You might also consider a UNIQUE
constraint or index on the reversed key order as well depending upon what queries you run against that table.
In general NCIs can be slightly be more compact than the equivalent NCI as they do not have the status bits B or NULL_BITMAP
(if no nullable columns) but in this case this is more than outweighed by the necessity to store the 8 byte RID pointer to the row in the heap as can be seen from the below.
Script
CREATE TABLE T_NCI
(
table1_id INT NOT NULL,
table2_id INT NOT NULL,
CONSTRAINT NCI_PK PRIMARY KEY NONCLUSTERED(table1_id, table2_id)
);
CREATE TABLE T_CI
(
table1_id INT NOT NULL,
table2_id INT NOT NULL,
CONSTRAINT CI_PK PRIMARY KEY CLUSTERED(table1_id, table2_id)
);
INSERT INTO T_CI
OUTPUT inserted.* INTO T_NCI
SELECT DISTINCT number, number
FROM master..spt_values
SELECT LEFT(OBJECT_NAME(object_id),5) AS [Name],
page_count,
record_count,
min_record_size_in_bytes,
max_record_size_in_bytes
FROM (SELECT *
FROM sys.dm_db_index_physical_stats(db_id(), OBJECT_ID('T_CI'), NULL, NULL, 'DETAILED')
UNION ALL
SELECT *
FROM sys.dm_db_index_physical_stats(db_id(), OBJECT_ID('T_NCI'), NULL, NULL, 'DETAILED')
) T
DROP TABLE T_NCI
DROP TABLE T_CI
Results
Name page_count record_count min_record_size_in_bytes max_record_size_in_bytes
----- -------------------- -------------------- ------------------------ ------------------------
T_CI 5 2163 15 15
T_CI 1 5 15 15
T_NCI 5 2163 15 15
T_NCI 6 2163 17 17
T_NCI 1 6 15 15
The record length for the non leaf pages in both the CI and the NCI is 15 bytes (1 byte status bits, 8 bytes for the composite key and 6 for the down page pointer) but for the leaf page NCI the rows take up 17 bytes (1 byte status bits, 8 bytes for the composite key and 8 for the row pointer) compared to 15 for the CI (2 bytes status bits, 1 byte column count offset, 8 bytes data, 2 bytes column count, 1 byte null bitmap). And as well as this less compact index structure you also have all the additional pages for the heap itself on top.
I think the problem here is a difference in terminology.
The "number of writes" that's usually referred to is the number of object accesses, rather than the number of pages that get touched by the physical operation.
The reason why that's usually used as a metric in discussion is because it's a more "stable" and meaningful number to talk about. As we're getting into here, the number of pages touched by an INSERT
statement for even a single row depends on many factors, so it's not a very useful quantity outside your own environment and situation.
The one thing I would pick at from the article quote is this (emphasis mine):
One write for inserting the row, and one write for updating the non-clustered index.
This may be confusing. Inserting a row into the base table would involve an insert to the base table, and also an insert into each nonclustered index (ignoring special index features), not an update.
So if a record has to be updated, say the value 1 has to be updated to 7, won't the update need to be applied to both the key in the clustered index top node (this may, in cases, cause a re-structuring of the entire structure) and the corresponding value in the record in the leaf-page?
Yes, assuming the column that was updated is in the index key. However, this is still a single object access, and hence a "single write."
Best Answer
Yes, especially if (1) the table is looked up using the Primary Key, and (2) you are noticing performance issues.
Note that if the table is primarily looked up using a set of data other than the PK (Example: Order Lines may be looked up by the OrderID and sorted by the OrderLineId), then you may consider a clustered index based on the most common lookup(s), especially if they select sequences of rows. For example, on the Order Lines example above, having a Clustered index like:
would be optimal for retrieving all of the order lines for a given order.
Also, there are cases where having a non-clustered covering index may be optimal. You will need to keep an eye on performance whatever you do with these tables.
Best practices depends on the answers to the above questions. If you actually want a clustered PK (because that is how the table is commonly referenced, or because there is no clear clusterable index that could be created), then you will need to perform all of the steps you specify above. If you decide to create a clustered index that is identical to your PK (and yes, people do this... ) you won't need to do all these steps. And if you decide your optimum clustered index is some combination of fields not identical to the PK, then you will not need to do all of these steps either.
If you do replace the PK with a clustered PK, then there are scripts that can help with identifying the foreign keys. I tend to use the View Dependencies functionality inside to SSMS to identify possible locations for FKs to my table, then manually script them out and drop them myself (because I had some bad experiences with scripts that should handle FKs), so I can't recommend any specific choices right now.
Not really, no.