Sql-server – Primary Key efficiency

primary-keysql server

If I need a table to hold Point of Sales transactions, and am told I need to store:

Country ID
Store Number
POS Terminal Number
Transaction Date
Item Code
Teller ID
Another Field
More Fields

Now, in this case, the uniqueness would be:

Country ID, Store Number, POS Terminal Number, Transaction Date, Item Code

I am always unsure if it's best to have a identity colmn as the primary key – in this case, maybe TransactionID INT NOT NULL PRIMARY KEY, and then a unique constraint across the unique fields.

Or, should a primary key be created across all the unique fields?

The benefit of the TransactionId I guess would be, joins. To join back to a transaction, you just use the single field. I can't see the benefit of the PK across a number of fields. (Save space of an extra column?).

Best Answer

I would say yes, create the surrogate key of TransactionID. It would create a narrow, unique identifier across the table.

One of the reasons why a surrogate key would be best is because of relationships with other tables. If you need to relate the Transaction table with another table (Line_Item_Detail for instance) the entire primary key of the Transaction table would need to be in the related table as a foreign key. If you decided to use your candidate key of Country ID, Store Number, POS Terminal Number, Transaction Date, and Item Code these columns would need to be in every related table. If you would need to update any of these fields they would need to be updated in the related tables too. This gets messy very quickly. With a surrogate key of TransactionID you would only need to add the TransactionID column to your related table. Since this will, should be, and IDENTITY column we should never have to update it.

Another thing you should be thinking about when creating SQL Server tables is the clustered index. A clustered index is how SQL Server physically stores the data for a table. You can have a table without a clustered index called a heap. In most cases you really should create a clustered index (read more about heaps vs. clustered indexes) Michelle Ufford has an excellent post on creating Effective Clustered Indexes. In short, your clustered indexes should be:

  • Narrow – as narrow as possible, in terms of the number of bytes it stores
  • Unique – to avoid the need for SQL Server to add a "uniqueifier" to duplicate key values
  • Static – ideally, never updated
  • Ever-increasing – to avoid fragmentation and improve write performance

A clustered index on TransactionID would fill this criteria nicely.

Since you'll be adding a surrogate key to the Transaction table you should also consider adding an alternate key of Country ID, Store Number, POS Terminal Number, Transaction Date, and Item Code or some other candidate key. This will guarantee there will be no accidental duplicates added to the table. If you don't add the alternate key there is a strong probability that duplicates will creep into the table.

One last thing, you may want to consider normalizing the table further. With the table design your provided if more than one item is added to a transaction you will be adding duplicate data to every row (Store Number, POS Terminal Number, etc). Querying will also become more difficult due to the duplication of data.