Currently, I have a categories
table with 2 columns – category VARCHAR(50) NOT NULL PRIMARY KEY
and parent VARCHAR(50)
. The parent
column is a foreign key (FK) to the category
column.
This seems like the most obvious approach. However, alarm bells are ringing in my head because I am using a VARCHAR
column for a primary key, which might slow down operations when querying the table.
I could introduce a third column called cat_id INT AUTO_INCREMENT
and set that to be the PK, but it would introduce a new column which have no meaning.
Apart from which one would be faster, what other considerations should be taken?
N.B. I predict there will be at most 1000 categories or so so the number of rows are not very high. However, the categories
PK column will be a reference column for many foreign keys, in other tables.
Should I use (unique) usernames as PK's also?
Best Answer
VARCHAR
column as Primary Key is not a good choice as normally we create Cluster Index on same column.Cluster Index on VARCHAR columns is a bad choice because of expected high fragmentation rate. Every new inserted key value will try to find its place somewhere between existing keys and normally cause page split and high index fragmentation. As a result poor performance and extra index rebuild/reorganise cost.
Secondly while using
varchar
key column as Foreign Key will take extra space as compared to Pseudo keyauto-incremented
column.BUT
Clustered index on auto-incremented column may create "hot spot". Read this carefully Is 'Avoid creating a clustered index based on an incrementing key' a myth from SQL Server 2000 days?
Though hot spot could be a problem, when to many users are trying to insert values but still in your case i would like to go for auto-incremented column as compare to varchar.