Sql-server – SQL Server – is it ok to store link table as a heap

heapsql-server-2008

I'm trying to understand if it's a good practice to have non-clustered PK on link table:

CREATE TABLE table1_table2(table1_id INT NOT NULL, table2_id INT NOT NULL,
CONSTRAINT PK_T1_T2 PRIMARY KEY NONCLUSTERED(table1_id, table2_id),
CONSTRAINT FK_T1_T2_T1ID FOREIGN KEY(table1_id) REFERENCES table1(id),
CONSTRAINT FK_T1_T2_T2ID FOREIGN KEY(table2_id) REFERENCES table2(id)
);
CREATE INDEX IDX_T1_T2 ON table1_table2 (table2_id);

My point here that heap is ok because

  1. No particular order during INSERT, so in case of clustered PK I'll have many page splits.

  2. All SELECTs will use just index anyway

In my understanding, adding identity clustered PK will add more overhead for such tables (absolutely no data except foreign keys).

Could you please clarify if that makes sense ?

Thanks.

Best Answer

I agree that there is no point in creating an IDENTITY column clustered index here but I would just create the composite PK as CLUSTERED rather than NONCLUSTERED.

There is no point having 2 copies of the data, One in the heap that never gets used and one in the NCI. In fact it is downright counter productive as shown below.

Your NCI will still get page splits anyway if you are not inserting into the table in order of primary key so you have not avoided the problem. You might also consider a UNIQUE constraint or index on the reversed key order as well depending upon what queries you run against that table.

In general NCIs can be slightly be more compact than the equivalent NCI as they do not have the status bits B or NULL_BITMAP (if no nullable columns) but in this case this is more than outweighed by the necessity to store the 8 byte RID pointer to the row in the heap as can be seen from the below.

Script

CREATE TABLE T_NCI
  (
     table1_id INT NOT NULL,
     table2_id INT NOT NULL,
     CONSTRAINT NCI_PK PRIMARY KEY NONCLUSTERED(table1_id, table2_id)
  );

CREATE TABLE T_CI
  (
     table1_id INT NOT NULL,
     table2_id INT NOT NULL,
     CONSTRAINT CI_PK PRIMARY KEY CLUSTERED(table1_id, table2_id)
  );  

INSERT INTO T_CI
OUTPUT inserted.* INTO T_NCI
SELECT DISTINCT number, number
FROM master..spt_values


SELECT LEFT(OBJECT_NAME(object_id),5) AS [Name],
       page_count,
       record_count,
       min_record_size_in_bytes,
       max_record_size_in_bytes
FROM   (SELECT *
        FROM   sys.dm_db_index_physical_stats(db_id(), OBJECT_ID('T_CI'), NULL, NULL, 'DETAILED')
        UNION ALL
        SELECT *
        FROM   sys.dm_db_index_physical_stats(db_id(), OBJECT_ID('T_NCI'), NULL, NULL, 'DETAILED')
) T  

DROP TABLE T_NCI
DROP TABLE T_CI

Results

Name  page_count           record_count         min_record_size_in_bytes max_record_size_in_bytes
----- -------------------- -------------------- ------------------------ ------------------------
T_CI  5                    2163                 15                       15
T_CI  1                    5                    15                       15
T_NCI 5                    2163                 15                       15
T_NCI 6                    2163                 17                       17
T_NCI 1                    6                    15                       15

The record length for the non leaf pages in both the CI and the NCI is 15 bytes (1 byte status bits, 8 bytes for the composite key and 6 for the down page pointer) but for the leaf page NCI the rows take up 17 bytes (1 byte status bits, 8 bytes for the composite key and 8 for the row pointer) compared to 15 for the CI (2 bytes status bits, 1 byte column count offset, 8 bytes data, 2 bytes column count, 1 byte null bitmap). And as well as this less compact index structure you also have all the additional pages for the heap itself on top.