SQL Server – Impact of Using VARCHAR(50) Primary Key in Database

performanceprimary-keysql servervarchar

Suppose a global, mid-sized company is considering a document control management solution that is backed by a SQL Server database and allows developers to create document form types using a form creation tool. Each form type created using the tool is represented in the database by a table with a primary key of datatype VARCHAR(50).

What impact would the requirement to use VARCHAR(50) primary keys have on system performance vs. a system using a more conventional design approach, e.g., this? Assume the core audience for the answer is mostly non-technical, i.e., C-Level Executives making a go or no-go decision on the solution.

Notes:

Best Answer

The document control management system needs to perform typical document control management system tasks...for many years (at least two decades).

The end of extended support for SQL Server 2014 Service Pack 3 is 7/9/2024. That's 5 years from now. So I think your first step should be to see if / when the vendor intends to get certified on a more modern version of SQL Server. This is more important if you are subject to compliance rules that require software to recieve vendor support and security updates.


As far as other things to share with executive stakeholders, it's hard to make general statements, because you certainly won't be comparing systems that have the exact same schema with the exception of the primary key data type.

If you are comparing this product to a similar one that uses surrogate (integer) keys, assuming the varchar(50) primary keys are also clustered indexes (the default for SQL Server), issues you might raise would be:

  • You will likely need to pay for increased disk space for the database
    • the varchar(50) key will take up more space on disk than the integer key
    • this space increase is multiplied by any nonlclustered indexes on these tables, since the clustering key is included in them automatically
    • it's common to say "disk is cheap," but some very smart folks have disagreed with that assessment: Disk Is Cheap! ORLY?
  • Because of the above, you will also spend more on disk space for backups
  • You will likely need to pay for more RAM
    • since these tables and indexes are bigger on disk, they will be bigger in RAM too
  • Some reports and screens might be slower to load
    • index scans and seeks will have to consume more pages, and would take longer than they equivalent scan of an integer-keyed-table
  • Some reports and screens might be faster to load
    • since the text of this serial number would be stored in any table that references it as a foreign key, you might be able to avoid joins sometimes

Notice all of the hedge words above like "might" and "likely" - like I said, I don't think you can make a lot of useful general statements about this situation.