I am deploying a mapping table which connects Items
to ItemGroups
Though an Item
will only ever be in a single ItemGroup
please accept a priori it is not appropriate in my use case to append a column to the Items
table.
So a table will be created with two columns:
ItemId INT NOT NULL
ItemGroupId INT NOT NULL
It is tempting to make ItemId
the CI and PK for this table…. but is that appropriate? I think it makes sense as a primary key but since it is neither sequential nor perpetually-increasing it seems incorrect as a CI. I will never execute range queries against ItemId
… But a surrogate key solely for the purpose of a sequential CI also doesn't seem appropriate.
I am equally likely to execute either:
SELECT [ItemGroupId] FROM [Table] WHERE [ItemId] = @id
SELECT [ItemId] FROM [Table] WHERE [ItemGroupId] = @id
So as far as I can tell I have few options:
ItemId
is both the CI and the PKItemId
is not the CI, but it is the PKItemId
is neither the CI nor the PK but has aUNIQUE
constraintItemId, ItemGroupId
combination becomes the PK andItemId
has aUNIQUE
constraint
Best Answer
Option 5:
(ItemGroupId, ItemId)
(note the order) as the primary key (clustered)ItemId
While searching by
ItemId
is not conducive to range scans (yet?), searching byItemGroupId
is 1, which makes it a good candidate to be the leading column of the clustered index key.My normal design for this type of table is to index both combinations, i.e.,
(Column1, Column2)
and(Column2, Column1)
so I (and others) never have to think about whether it's indexed properly; it just is. In your case here, one combination is degenerate and so only one column goes into the key, but the non-key column is still covered by the index because it's part of the clustered index key. The situation becomes more complicated, of course, if there are non-key columns involved, but that's beyond the scope of what you asked here.In this case, while you could flip-flop the primary key with the unique constraint, I chose the 2-column primary key on purpose because it will be more stable during the life of the table. This works to your advantage as a DBA (one word: Replication) and also to developers (it could be quite a bit of code updates if the primary key definition changes).
1 There might be an issue with terminology here. Searching by a single
ItemId
will return 0 or 1 row; searching by a singleItemGroupId
(an incomplete key) returns >= 0 rows, and so is a seek + scan (it will show up as an index seek in the execution plan). Obviously this isn't a "range scan" on multipleItemGroupId
s. The idea is to physically group together (and order, in this case) theItemId
s for everyItemGroupId
. I hope that's clear.