Sql-server – Identity/Primary Key columns pattern for performance in SQL

performancesql server

I was discussing about database layout/design with a friend and I got some questions about performance and best practices. I'm mainly a software programmer, so I may lack "common sense" regarding databases, so if you can please explain as you would for a newbie.

I have a table that records the transactions of users and their credit card usages. It has a few millions of records per day. People in my country have a 11 digit number that is supposedly unique.

I've always designed tables with an auto-increment int column as identity.
In this case, the identity column is a composite of the user's social number + the date, which is a huge number column, and the fact that it's non-incremental seems like it's worse when doing a query with WHERE.
What's the best option in this case?

Another question is about primary keys, stores do need to look up the table and they usually send the social number or, most commonly, the user's name.
Would making the user name column as primary key enhance performance? And why is that?

Best Answer

On tables with a large number of rows being inserted having a ever-increasing clustered index generally improves write performance as it ensures that new records are added together "at the end" of the table. (Note that clustered indexes and primary keys are not the same thing, you can actually have a different primary key from your clustered index. Its the clustered index which matters here, not the primary key).

If you have another auto-incrementing column in your table then you could use this as your clustered index and keep your primary key the same, however your clustered index needs to be unique (so you can't use a DATETIME), which generally means an auto-incrementing column, which may as well be your primary key.

That said, you should run some performance / stress tests and try it for yourself. In the past I worked on a system which intentionally clustered on a GUID on a frequently updated table, the theory being that spreading out the writes actually reduced lock contention and improved performance (I wasn't part of that change so to be honest I was always kind of skeptical). Try it out and see which is faster with your data.


To answer your second question, looking up a user by its username should be very fast with proper indexing - its possible that clustering the the username could improve performance, however its going to be very marginal.