SQL Server – Should a Surrogate Key Be Used in a Mapping Table?

clustered-primary-keydatabase-designsql serversql-server-2008-r2

I am deploying a mapping table which connects Items to ItemGroups

Though an Item will only ever be in a single ItemGroup please accept a priori it is not appropriate in my use case to append a column to the Items table.

So a table will be created with two columns:

ItemId INT NOT NULL
ItemGroupId INT NOT NULL

It is tempting to make ItemId the CI and PK for this table…. but is that appropriate? I think it makes sense as a primary key but since it is neither sequential nor perpetually-increasing it seems incorrect as a CI. I will never execute range queries against ItemId

… But a surrogate key solely for the purpose of a sequential CI also doesn't seem appropriate.

I am equally likely to execute either:

SELECT [ItemGroupId] FROM [Table] WHERE [ItemId] = @id
SELECT [ItemId] FROM [Table] WHERE [ItemGroupId] = @id

So as far as I can tell I have few options:

  1. ItemId is both the CI and the PK
  2. ItemId is not the CI, but it is the PK
  3. ItemId is neither the CI nor the PK but has a UNIQUE constraint
  4. ItemId, ItemGroupId combination becomes the PK and ItemId has a UNIQUE constraint

Best Answer

Option 5:

  • (ItemGroupId, ItemId) (note the order) as the primary key (clustered)
  • Unique constraint (non-clustered) on ItemId

While searching by ItemId is not conducive to range scans (yet?), searching by ItemGroupId is 1, which makes it a good candidate to be the leading column of the clustered index key.

My normal design for this type of table is to index both combinations, i.e., (Column1, Column2) and (Column2, Column1) so I (and others) never have to think about whether it's indexed properly; it just is. In your case here, one combination is degenerate and so only one column goes into the key, but the non-key column is still covered by the index because it's part of the clustered index key. The situation becomes more complicated, of course, if there are non-key columns involved, but that's beyond the scope of what you asked here.

In this case, while you could flip-flop the primary key with the unique constraint, I chose the 2-column primary key on purpose because it will be more stable during the life of the table. This works to your advantage as a DBA (one word: Replication) and also to developers (it could be quite a bit of code updates if the primary key definition changes).


1 There might be an issue with terminology here. Searching by a single ItemId will return 0 or 1 row; searching by a single ItemGroupId (an incomplete key) returns >= 0 rows, and so is a seek + scan (it will show up as an index seek in the execution plan). Obviously this isn't a "range scan" on multiple ItemGroupIds. The idea is to physically group together (and order, in this case) the ItemIds for every ItemGroupId. I hope that's clear.