Whats the use of an autoincrement primary column

database-designdatabase-recommendation

I would like to know when is it really needed to use an auto increment primary column, what is the value it adds?

my colleague suggests to have a separate auto increment column for a table as a primary key. It should be named ‘_id’. Primary key should not store any business data.

Another colleague is of different opinion that we can have a non-auto increment column to be a primary key as we plan not to have duplicates.

One approach without auto-increment primary column:

1) Create Tables in Database with the below fields

Table I
a.  SLOC_ID (Primary Key)
b.  Gerrit
c.  Parent Key
d.  Files Inserted
e.  Files Deleted
f.  Gerrit Last User (Non Service Accounts) update time 
g.  Current_Patch_Number 



Table II
a.  SLOC_ID (One to Many Relationship)
b.  File Path
c.  File Name

Another approach with auto-increment is below

PK,FK   Column Name
PK(autoincrement)   sloc_id
FK  change_list_id

PK,FK   Column Name
PK  sloc_details_id
FK  sloc_id
    linesmodified
    lines deleted
    filepath
    latestpatch
    modification_type

where can I find more details here?which of the best solution?

Best Answer

There are a few advantages to using an auto-increment column as a primary key.

  • Unlikely to need to change: If your primary key is business data that's meaningful to human beings, you may be asked to change that at some point. Examples I have seen would include usernames based on a combination of company branch number and person's initials (want to change them if the person moves to a different branch); and company region names (may need to change if company reorganizes regions). Updating a user's username ultimately touched about 20 columns across about a dozen tables, and almost always caused blocking for long enough periods that users noticed.
  • Less fragmentation on table: Assuming that your primary key is also the clustered index for your table, records will also be inserted at the end of the table. since you won't be inserting a new record in between existing records, you won't be splitting pages in the middle of your data, which leads to fragmentation.
  • Smaller indexes: In some DBs, at least, the clustered index value is used in any other indexes to locate a row in the table. If your primary key is a varchar value that's 8 characters long on average, that will require twice as much space to store as a 4-byte integer value (4 times as much, if nvarchar).

For me, personally, the first reason was the strongest one, and the one that made me start including an autoincrement primary key in basically every table. Human-meaningful values will always be at risk of being changed by order of the business; putting those into a position where making that sort of changes could take hours and cause millions of other records to get changed is not acceptable, in my opinion.

UPDATE: As joanolo points out there are certain codes that may be meaningful to human beings, but which are part of established standards, and should (at least in theory) not change because of that. Examples would include an ISO 3166 country code, an ISO 4217 currency code, an ISO 639 language code, or a UN ISIC activity code. He notes these are usually a reasonable size (char(4) or less), so they wouldn't cause overly large indexes, and they are generally static, and thus fragmentation wouldn't be a concern. So, it would make sense to use these codes as primary keys in tables based on this sort of information. It's an excellent point, and I agree.