Sql-server – Second index on junction table with composite primary key: include both or one column

clustered-indexsql server

I'm working on a database that will be very read-intensive, and uses a junction table that's read often and in both directions.

The junction table contains two columns, IDA and IDB

The primary key for the junction table will be a composite key for both columns.

I want to create an index so the database can quickly seek the corresponding IDA for a specific IDB.

Should that index contain both IDB and IDA? Or should it just contain IDB?

To put it another way, should my table look like this:

CREATE TABLE ExJunction(
    IDA INTEGER NOT NULL,
    IDB INTEGER NOT NULL,
    CONSTRAINT PK_ExJunction_IDA_IDB PRIMARY KEY CLUSTERED (IDA ASC, IDB ASC),
    CONSTRAINT AK_ExJunction_IDB_IDA UNIQUE (IDB ASC, IDA ASC)
)

Or like this:

CREATE TABLE ExJunction(
    IDA INTEGER NOT NULL,
    IDB INTEGER NOT NULL,
    CONSTRAINT PK_ExJunction_IDA_IDB PRIMARY KEY CLUSTERED (IDA ASC, IDB ASC),
    INDEX IX_ExJunction_IDB NONCLUSTERED (IDB ASC)
)

Or, should I do something else altogether?

I'm having trouble conceptualizing this, since I've always thought of nonclustered indexes containing both the index key and the primary key. But the index key is already contained in the primary key, so, are these the same (both containing IDB and IDA), or different?

Best Answer

There is almost no difference between the two options. Each would include both IDA and IDB, since both columns are part of the primary key. And SQL Server is smart enough to avoid including a column multiple times. To see this, create a table with both options, populate it with data, and compare the sizes of the resulting indexes:

CREATE TABLE ExJunction(
    IDA INTEGER NOT NULL,
    IDB INTEGER NOT NULL,
    CONSTRAINT PK_ExJunction_IDA_IDB PRIMARY KEY CLUSTERED (IDA ASC, IDB ASC),
    CONSTRAINT AK_ExJunction_IDB_IDA UNIQUE (IDB ASC, IDA ASC),
    INDEX IX_ExJunction_IDB NONCLUSTERED (IDB ASC)
)
GO

DECLARE @i AS INT = 0
WHILE @i < 250000
BEGIN
  SET @i = @i + 1
  INSERT INTO ExJunction (IDA, IDB)
  VALUES (@i % 1000, @i % 1001)
END


SELECT index_id, index_type_desc, index_depth, index_level, page_count, record_count, avg_page_space_used_in_percent
FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('ExJunction'), NULL, NULL, 'DETAILED')

If you compare the two nonclustered indexes in the final query, you'll see that each is three levels deep, and each uses the exact same number of pages, and the same percentage of each page is full. Also checkout this post, especially some of the comments and responses that address this same scenario.

Since the resulting indexes are identical, the only possible difference I can see would be from maintenance overhead (and I'm speculating a bit here; it's possible there is no difference at all). With the unique constraint, updates may be slightly more costly due to the requirement to ensure uniqueness. However, reads may be slightly more efficient since the optimizer knows ahead of time that the rows are unique. But it's also possible that the database engine is smart enough to recognize that the primary key already requires uniqueness, and that there really would be no difference.

So the only other consideration is readability and personal preference. Do you prefer to explicitly call out the fact that IDA is included in the nonclustered index, and that it is unique? Or do you prefer to keep the definition as minimal as possible, and let SQL Server do its thing on the backend?