Sql-server – Adding a clustered index to a heap table

clustered-indexheapnonclustered-indexsql serversql-server-2012

I've noticed that we have a few heap tables on our database, they all have a non clustered index on the primary key (which is an Int). The biggest table has over 2 million rows, a forwarded record count of 663,479 and page count of 609,028. Also noticed that the forwarded records per second are 972 on the server.

From my limited knowledge and a bit of googling this all looks pretty bad so i'm looking into adding a clustered index to this table.

Would you agree this is the right thing to do?
If so what are the best practices for adding a clustered index to this table, i
assume i will have to remove the existing non clustered index on the primary key (and any foreign keys referencing it) and then add the clustered index and recreate the foreign keys? Are there any scripts to assist with dropping and recreating the foreign keys as there are quite a few?
Would there be any benefit to rebuilding the heap as a short term fix while waiting for a chance to sort out the indexes?

Sorry if i haven't provided enough information, let me know if you need to know anything else to be able to help, thanks!

Best Answer

Would you agree this is the right thing to do?

Yes, especially if (1) the table is looked up using the Primary Key, and (2) you are noticing performance issues.

Note that if the table is primarily looked up using a set of data other than the PK (Example: Order Lines may be looked up by the OrderID and sorted by the OrderLineId), then you may consider a clustered index based on the most common lookup(s), especially if they select sequences of rows. For example, on the Order Lines example above, having a Clustered index like:

CREATE CLUSTERED INDEX Idx_OrderID_OrderLineId
ON OrderLines (OrderId, OrderLineId)

would be optimal for retrieving all of the order lines for a given order.

Also, there are cases where having a non-clustered covering index may be optimal. You will need to keep an eye on performance whatever you do with these tables.

If so what are the best practices for adding a clustered index to this table, i assume i will have to remove the existing non clustered index on the primary key (and any foreign keys referencing it) and then add the clustered index and recreate the foreign keys? Are there any scripts to assist with dropping and recreating the foreign keys as there are quite a few?

Best practices depends on the answers to the above questions. If you actually want a clustered PK (because that is how the table is commonly referenced, or because there is no clear clusterable index that could be created), then you will need to perform all of the steps you specify above. If you decide to create a clustered index that is identical to your PK (and yes, people do this... ) you won't need to do all these steps. And if you decide your optimum clustered index is some combination of fields not identical to the PK, then you will not need to do all of these steps either.

If you do replace the PK with a clustered PK, then there are scripts that can help with identifying the foreign keys. I tend to use the View Dependencies functionality inside to SSMS to identify possible locations for FKs to my table, then manually script them out and drop them myself (because I had some bad experiences with scripts that should handle FKs), so I can't recommend any specific choices right now.

Would there be any benefit to rebuilding the heap as a short term fix while waiting for a chance to sort out the indexes?

Not really, no.

Script

CREATE TABLE T_NCI
  (
     table1_id INT NOT NULL,
     table2_id INT NOT NULL,
     CONSTRAINT NCI_PK PRIMARY KEY NONCLUSTERED(table1_id, table2_id)
  );

CREATE TABLE T_CI
  (
     table1_id INT NOT NULL,
     table2_id INT NOT NULL,
     CONSTRAINT CI_PK PRIMARY KEY CLUSTERED(table1_id, table2_id)
  );  

INSERT INTO T_CI
OUTPUT inserted.* INTO T_NCI
SELECT DISTINCT number, number
FROM master..spt_values


SELECT LEFT(OBJECT_NAME(object_id),5) AS [Name],
       page_count,
       record_count,
       min_record_size_in_bytes,
       max_record_size_in_bytes
FROM   (SELECT *
        FROM   sys.dm_db_index_physical_stats(db_id(), OBJECT_ID('T_CI'), NULL, NULL, 'DETAILED')
        UNION ALL
        SELECT *
        FROM   sys.dm_db_index_physical_stats(db_id(), OBJECT_ID('T_NCI'), NULL, NULL, 'DETAILED')
) T  

DROP TABLE T_NCI
DROP TABLE T_CI

Results

Name  page_count           record_count         min_record_size_in_bytes max_record_size_in_bytes
----- -------------------- -------------------- ------------------------ ------------------------
T_CI  5                    2163                 15                       15
T_CI  1                    5                    15                       15
T_NCI 5                    2163                 15                       15
T_NCI 6                    2163                 17                       17
T_NCI 1                    6                    15                       15

The record length for the non leaf pages in both the CI and the NCI is 15 bytes (1 byte status bits, 8 bytes for the composite key and 6 for the down page pointer) but for the leaf page NCI the rows take up 17 bytes (1 byte status bits, 8 bytes for the composite key and 8 for the row pointer) compared to 15 for the CI (2 bytes status bits, 1 byte column count offset, 8 bytes data, 2 bytes column count, 1 byte null bitmap). And as well as this less compact index structure you also have all the additional pages for the heap itself on top.

Sql-server – Aren’t two writes required to update a clustered index record

I think the problem here is a difference in terminology.

The "number of writes" that's usually referred to is the number of object accesses, rather than the number of pages that get touched by the physical operation.

The reason why that's usually used as a metric in discussion is because it's a more "stable" and meaningful number to talk about. As we're getting into here, the number of pages touched by an INSERT statement for even a single row depends on many factors, so it's not a very useful quantity outside your own environment and situation.

The one thing I would pick at from the article quote is this (emphasis mine):

One write for inserting the row, and one write for updating the non-clustered index.

This may be confusing. Inserting a row into the base table would involve an insert to the base table, and also an insert into each nonclustered index (ignoring special index features), not an update.

So if a record has to be updated, say the value 1 has to be updated to 7, won't the update need to be applied to both the key in the clustered index top node (this may, in cases, cause a re-structuring of the entire structure) and the corresponding value in the record in the leaf-page?

Yes, assuming the column that was updated is in the index key. However, this is still a single object access, and hence a "single write."

Best Answer

Related Solutions

Sql-server – SQL Server – is it ok to store link table as a heap

Script

Results

Sql-server – Aren’t two writes required to update a clustered index record

Related Question