Sql-server – Proper table design for sparse primary key

In my system I have temporary entities that are created based on rules stored in my database, and the entities are not persisted.

Now, I need is to store information about these entities, and because they are created based on rules and are not stored, they have no ID.

I came up with a formula to generate an ID for these temp entities based on the rule that was used to generate them: id = rule id + "-" + entity index in the rule.
This formula generates unique strings of the form 164-3, 123-0, 432-2, etc...

My question is how should I build my table (regarding primary key and clustered index) when my keys have no relation or order?
Keep in mind that I will only (99.9% of the time) query the table using the id mentioned above.

Options I thought about after much reading, but don't have the knowledge to determine which is better:

1) primary key on a varchar column with clustered index.
-According to various sources, this would be bad because of fragmentation and the wideness of the key. Also their format is pretty weird for sorting.

2) primary key on varchar column without clustered index (heap table).
-Also a bad idea according to various sources due to indexing and fragmentation issues.

3) identity int column with clustered index, and a varchar column as primary key with unique index.
-Can't really see the benefit of the surogate key here since it would mainly help with range queries and ordering and I would never query the table based on this key because it would be unknown at all times.

4) 2 columns composite key: rule id + rule index columns.
Now I don't have strings but I have two columns that will be copied to FKs and non clustered indexes. Also I'm not sure what indexes I would use in this case.

Can anybody shine a light here? Any help is appreciated.

–Edit

Here is what I think I'll be using after reading the comments, and the rationale:

clustered identity int Id as PK
non-clustered unique index on (rule id, index)

Rationale:
1 – I will often query the table by rule id only, which is why it is the leftmost index column;
2 – The Id column can be used for subsequent operations after the first query by rule id (the id will be stored client side);
3 – I will never query by index only;
4 – The unique index guarantees insert consistency, though it still slower than a single int;

Best Answer

I'm thinking clustered pk (rule_id, index)

Do that. This isn't really a hard case. (rule_id, index) uniquely identifies a row, so that should be your clustered PK, unless there's some compelling reason to use a different design. and because

I will often query the table by rule id only

rule_id should be the leading column in the index. This supports efficient lookup by rule_id and ensures locality for inserting multiple rows for the same rule_id.

Best Answer

Related Solutions

Sql-server – Single Identity column and composite key, which to make primary

Sql-server – Large Fact table and partitioning key dilemma

Related Question