SQL Server Design – Downsides of Using Single Integer Column as Primary Key

database-designidentitysql serversql server 2014

Within one Web application I am working on, all database operations are abstracted using some generic repositories defined over Entity Framework ORM.

However, in order to have a simple design for the generic repositories, all involved tables must define an unique integer (Int32 in C#, int in SQL). Until now, this has been always the PK of the table and also the IDENTITY.

Foreign keys are heavily used and they reference these integer columns. They are required for both consistency and for generating navigational properties by the ORM.

The application layer typically does the following operations:

initial data load from table (*) – SELECT * FROM table
Update – UPDATE table SET Col1 = Val1 WHERE Id = IdVal
Delete – DELETE FROM table WHERE Id = IdVal
Insert – INSERT INTO table (cols) VALUES (...)

Less frequent operations:

Bulk insert – BULK INSERT ... into table followed (*) by all data load (to retrieve generated identifiers)
Bulk delete – this is a normal delete operation, but "bulky" from ORM's perspective: DELETE FROM table where OtherThanIdCol = SomeValue
Bulk update – this is a normal update operation, but "bulky" from ORM's perspective: UPDATE table SET SomeCol = SomeVal WHERE OtherThanIdCol = OtherValue

*all small tables are cached at application level and almost all SELECTs will not reach database. A typical pattern is initial load and lots of INSERTs, UPDATEs and DELETEs.

Based on current application usage, there is very small chance of ever reaching 100M records in any of the tables.

Question: From a DBA's perspective, are there significant problems I can run into by having this table design limitation?

[EDIT]

After reading the answers (thanks for the great feedback) and referenced articles, I feel like I have to add more details:

Current application specifics – I did not mention about current web application, because I want to understand if the model can be reused for other applications as well. However, my particular case is an application that extracts lots of metadata from a DWH. Source data is quite messy (denormalized in a weird way, having some inconsistencies, no natural identifier in many cases etc.) and my app is generating clear separated entities. Also, many of the generated identifiers (IDENTITY) are displayed, so that the user can use them as business keys. This, besides a massive code refactoring, excludes usage of GUIDs.
"they should not be the only way to uniquely identify a row" (Aaron Bertrand♦) – that is a very good advice. All my tables also define an UNIQUE CONSTRAINT to ensure that business duplicates are not allowed.
Front-end app driven design vs. database driven design – design choice is caused by these factors
1. Entity Framework limitations – multiple columns PKs are allowed, but their values cannot be updated
2. Custom limitations – having a single integer key greatly simplifies data structures and non-SQL code. E.g.: all lists of values have an integer key and a displayed values. More important, it guarantees that any table marked for caching will be able to put into a Unique int key -> value map.
Complex select queries – this will almost never happen because all small (< 20-30K records) tables data is cached at application level. This makes life a little harder when writing application code (harder to write LINQ), but the database is hit much nicer:
1. List views – will generate no SELECT queries on load (everything is cached) or queries that look like this:
```
SELECT allcolumns FROM BigTable WHERE filter1 IN (val1, val2) AND filter2 IN (val11, val12)
```
  All other required values are fetched through cache lookups (O(1)), so no complex queries will be generated.
2. Edit views – will generate SELECT statements like this:
```
SELECT allcolumns FROM BigTable WHERE PKId = value1
```

(all filters and values are ints)

Best Answer

Other than additional disk space (and in turn memory usage and I/O), there's not really any harm in adding an IDENTITY column even to tables that don't need one (an example of a table that doesn't need an IDENTITY column is a simple junction table, like mapping a user to his/her permissions).

I rail against blindly adding them to every single table in a blog post from 2010:

Bad habits to kick : putting an IDENTITY column on every table

But surrogate keys do have valid use cases - just be careful not to assume that they guarantee uniqueness (which is sometimes why they get added - they should not be the only way to uniquely identify a row). If you need to use an ORM framework, and your ORM framework requires single-column integer keys even in cases when your real key is either not an integer, or not a single column, or neither, make sure that you define unique constraints/indexes for your real keys, too.

Related Solutions

Oracle Database Design – Composite Primary Key Column Order

Store the tenant_id first. When you do this you can enable index key compression.

See http://docs.oracle.com/cd/B28359_01/server.111/b28310/indexes003.htm#i1106790 for the syntax and http://docs.oracle.com/cd/B28359_01/server.111/b28318/schema.htm#i14618 for the concepts.

In your case, you can do it like this:

create unique index mytable_idx on mytable(tenant_id,id) compress 1;

alter table mytable add constraint mytable_pk primary key(tenant_id, id);

Sql-server – Identity column value falling behind randomly

Randomly, one of these tables' identity values will fall behind, stopping any inserts from happening and we have no idea why.

Inserts are probably stopping because of an attempt to reuse an already existing unique value in the PRIMARY KEY, thus triggering the error like:

Msg 2627, Level 14, State 1, Line 27
Violation of PRIMARY KEY constraint 'PK__ID__1234'. Cannot insert duplicate key in object 'dbo.MyTable'. The duplicate key value is (8).
The statement has been terminated.

Note that if your IDENTITY column does not have a UNIQUE index or constraint, it is possible to reseed repeatedly and have many identical ID values. You do not want to do that, of course.

I have not personally found an error that, in itself, would reseed the IDENTITY value. Of course, it is possible to reset the SEED to a range where there will soon be a conflict by running a reseed that is lower than the current seed:

DBCC CHECKIDENT( MyTable,RESEED, 7) WITH NO_INFOMSGS

It could be that code somewhere in one of the processes actually does a RESEED on the table under some unusual circumstances.

(For example, this could be from a merge of two data sets, where the code reads the high value from one data set and after the import RESEEDs to the lower of the two high values that were merged.)

You should also read Martin Smith's post at: https://stackoverflow.com/questions/14146148/identity-increment-is-jumping-in-sql-server-database

Best Answer

Related Solutions

Oracle Database Design – Composite Primary Key Column Order

Sql-server – Identity column value falling behind randomly

Related Question