Sql-server – What are the pros and cons of using the customer codes as a primary key

database-designnormalizationprimary-keysql server

I am designing a database for one of my sales teams. Right now, the goal is to get a master list of customers and their mailing addresses, but the DB will eventually expand to track some other customer data. So, right now I am just working on the customers table, but want to make sure I follow good normalization and efficiency guidelines in case this thing grows beyond current expectations.

Right now, we have ~3,000 customers. So, we have customer codes we use to ID our customers. Each one can be a max of 20 characters, is alphanumeric, and will always be unique per customer (Foo, Inc. might have a cust code of FOOINC and might have 25 sub accounts, but there will only be one FOOINC master entity. If we ever did biz with another company called Foo Technologies Inc, we would create a new code like FOOTECH or something).

I am not a DBA, but have designed several DB's in the past and traditionally use a SQL identity field as the PK. In this instance, I've been thinking of using the customer code. What are the pros/cons of doing this? On the one hand, it seems logical to use a unique identifier as the PK if I have one, which I do. On the other, I know string PKs are slower than int PKs – this database will definitely grow, but it will never be millions of rows. The 3,000 customers are over 7 years of business, so it will be quite awhile before we even get to tens of thousands of rows.

I've researched this question and the debate usually ends with a "it depends on the data" – so please, teach me…in this situation, what would you consider when planning the table? What would you use for the PK? Any advantage to sticking an autoincrementing INT in there anyway? Any issues with indexing and inserting records?

FYI, the table layout just needs the following:

-CustomerCode(nvarchar(20))
-CustomerName(nvarchar(50))
-Address1(nvarchar(50))
-Address2(nvarchar(50)) - nullable
-Address3(nvarchar(50)) - nullable
-ZipCode (nvarchar(9))

Bonus question – worth sticking with 3NF and making a separate table holding city, state, with ZipCode as a PK and making a relationship to the customers table? Or don't worry about 3NF and keep the city/state in the customer table to avoid some joins?

Thanks for the help! Let me know if you need any other details.

Best Answer

Pros - It is the natural key, it makes sense and it will likely be searched on, I presume?

Cons - The default behavior (which is totally changeable) is for a primary key to be the clustered index. An alphanumeric doesn't make the best candidate because inserts can cause page splits because they aren't set on an ever increasing value like an identity column. The Int identity column will take less space (4bytes) compared to the character data (40+bytes for the unicode) . This makes your other indexes larger since the clustered key is part of them. If you ever change how you identify your customers and make customer codes, this all breaks - going with a surrogate insulates you from those type of changes.

In this situation, I tend to optimize for the insert performance and go with an identity column more often than not for the clustered key and primary key. I really like integer clustered indexes. (Now I know your question was not about clustered index, it was about primary key... You could still choose some other column to be the clustered index and make this your primary key, you could also put a unique constraint on this and treat it as a natural key but not make it your primary key).

I would at the very least index this with a unique constraint and treat it like a natural key. I just don't know if you really need to make it your primary key.

Kimberly Tripp is a trusted resource who has a lot to say about primary keys and (more so) clustered keys on her blog - https://www.sqlskills.com/blogs/kimberly/guids-as-primary-keys-andor-the-clustering-key/

This is all just my opinion - YMMV.