SQL Server Design – Downsides of Using Single Integer Column as Primary Key

database-designidentitysql serversql server 2014

Within one Web application I am working on, all database operations are abstracted using some generic repositories defined over Entity Framework ORM.

However, in order to have a simple design for the generic repositories, all involved tables must define an unique integer (Int32 in C#, int in SQL). Until now, this has been always the PK of the table and also the IDENTITY.

Foreign keys are heavily used and they reference these integer columns. They are required for both consistency and for generating navigational properties by the ORM.

The application layer typically does the following operations:

  • initial data load from table (*) – SELECT * FROM table
  • UpdateUPDATE table SET Col1 = Val1 WHERE Id = IdVal
  • DeleteDELETE FROM table WHERE Id = IdVal
  • InsertINSERT INTO table (cols) VALUES (...)

Less frequent operations:

  • Bulk insertBULK INSERT ... into table followed (*) by all data load (to retrieve generated identifiers)
  • Bulk delete – this is a normal delete operation, but "bulky" from ORM's perspective: DELETE FROM table where OtherThanIdCol = SomeValue
  • Bulk update – this is a normal update operation, but "bulky" from ORM's perspective: UPDATE table SET SomeCol = SomeVal WHERE OtherThanIdCol = OtherValue

*all small tables are cached at application level and almost all SELECTs will not reach database. A typical pattern is initial load and lots of INSERTs, UPDATEs and DELETEs.

Based on current application usage, there is very small chance of ever reaching 100M records in any of the tables.

Question: From a DBA's perspective, are there significant problems I can run into by having this table design limitation?

[EDIT]

After reading the answers (thanks for the great feedback) and referenced articles, I feel like I have to add more details:

  1. Current application specifics – I did not mention about current web application, because I want to understand if the model can be reused for other applications as well. However, my particular case is an application that extracts lots of metadata from a DWH. Source data is quite messy (denormalized in a weird way, having some inconsistencies, no natural identifier in many cases etc.) and my app is generating clear separated entities. Also, many of the generated identifiers (IDENTITY) are displayed, so that the user can use them as business keys. This, besides a massive code refactoring, excludes usage of GUIDs.

  2. "they should not be the only way to uniquely identify a row" (Aaron Bertrand♦) – that is a very good advice. All my tables also define an UNIQUE CONSTRAINT to ensure that business duplicates are not allowed.

  3. Front-end app driven design vs. database driven design – design choice is caused by these factors

    1. Entity Framework limitations – multiple columns PKs are allowed, but their values cannot be updated

    2. Custom limitations – having a single integer key greatly simplifies data structures and non-SQL code. E.g.: all lists of values have an integer key and a displayed values. More important, it guarantees that any table marked for caching will be able to put into a Unique int key -> value map.

  4. Complex select queries – this will almost never happen because all small (< 20-30K records) tables data is cached at application level. This makes life a little harder when writing application code (harder to write LINQ), but the database is hit much nicer:

    1. List views – will generate no SELECT queries on load (everything is cached) or queries that look like this:

      SELECT allcolumns FROM BigTable WHERE filter1 IN (val1, val2) AND filter2 IN (val11, val12)
      

      All other required values are fetched through cache lookups (O(1)), so no complex queries will be generated.

    2. Edit views – will generate SELECT statements like this:

      SELECT allcolumns FROM BigTable WHERE PKId = value1
      

(all filters and values are ints)

Best Answer

Other than additional disk space (and in turn memory usage and I/O), there's not really any harm in adding an IDENTITY column even to tables that don't need one (an example of a table that doesn't need an IDENTITY column is a simple junction table, like mapping a user to his/her permissions).

I rail against blindly adding them to every single table in a blog post from 2010:

But surrogate keys do have valid use cases - just be careful not to assume that they guarantee uniqueness (which is sometimes why they get added - they should not be the only way to uniquely identify a row). If you need to use an ORM framework, and your ORM framework requires single-column integer keys even in cases when your real key is either not an integer, or not a single column, or neither, make sure that you define unique constraints/indexes for your real keys, too.