I'm adding the following cross-reference table to my SQL Server-hosted DB:
company_id bigint not null (FK)
org_path nvarchar (2048) not null
The company_id
field refers to the id
field in another table (in which it's the primary key).
Given that there can also be multiple records with the same company_id
, any primary key would have to use both fields. However, I'm unable to create a key using both fields because org_path
is too long for SQL Server.
As for org_path
, this is the only table in which it exists. There's every likelihood that queries to this table will be asking for either all entries or all org_path
entries by company_id
. Or to put it another way, it looks doubtful that this table will ever be queried by org_path
. Furthermore, it's unlikely that org_path
will be updated, and more likely inserted and – probably rarely – deleted.
I expect that the total number of rows will be in the low thousands.
Also, the reason it's nvarchar (2048)
is because the value has to mimic that in a third-party DB. A typical example will be something like
\Translation Providers\[customer name]\[order name]\
and can contain diacritics.
So my question is this: would it be more efficient to add an auto-increment id
field and use that in conjunction with company_id
as the primary key, or would it add unnecessary overhead – and does the fact that company_id
is the primary key in another table have any effect here?
Best Answer
For a non-unique clustered index on
comany_id
alone, SQL Server will automatically add a 4 byte integer uniqueifier to all duplicate (i.e. second and subsequent for a key value) clustered index keys to make it unique. This is not exposed to the user though.The advantage of adding your own unique identifier as a secondary key column is that you can then still seek by
company_id
but also seek to individual rows more efficiently (usingcompany_id, identitycol
rather thancompany_id
with a residual predicate onorg_path
). The clustered index would then be unique oncompany_id, identitycol
, so no hidden uniqueifiers would be added.Also, if you end up with duplicates for
(company_id,org_path)
, having the explicit identity column (a sort of "exposed uniqueifier") will make it easier to target just one of them for delete or update.