Mysql – Should I replace the varchar primary key with an integer primary key

innodbMySQLprimary-keyvarchar

I have a table that has a varchar primary key that's associated with "Base Product Codes". They rarely change, and if they do, it's usually just being deleted or added. I figured it would be better to index part of the product code so I could perform searches on the data based on a partial product code.

As an example, "Base Product Codes" are in the following format:

ABC-AB12

The table stores the relationship between a product and accessory, and there can be multiple relationships for each product or accessory:

Id          |     Prod     |    Acc    | Val
---------------------------------------------
ABC-AB12_1  |    ABC-AB12  |   ABC1    |  1
ABC-AB12_2  |    ABC-AB12  |   DEF1    |  2
ABC-AB12_3  |    ABC-AB12  |   GHI1    |  A

I'm storing the id for each relationship as the product base code with an integer tacked onto the end of it.

I figured if I needed to to a search for all of the relationships for a particular product that it would be faster to use the primary key and search than it would be to search on the Prod column.

Is that an incorrect assumption? Should I change the Id to be an auto incrementing integer instead?

Best Answer

You should have an autoincrement PRIMARY KEY. Since I do not know the storage engine of the table, I will try answer based on both.

InnoDB

In general, a smaller primary key is always better than a bigger one. The PRIMARY KEY for an InnoDB table is stored in the Clustered Index (known within InnoDB as the gen_clust_index). Since an InnoDB Page is 16K, smaller keys will make more keys fit inside an index page.

What should be noted is the fact that for each entry in a Secondary Index, there is Primary Key. Thus, not only will a smaller PRIMARY KEY benefit the table, but all non-unique Indexes will corresponding shrink as well.

MyISAM

Similar principles apply to MyISAM in terms of key sizes and indexes. Additionally, there is an added bonus in your particular case that is not often discussed when it comes to MyISAM.

MyISAM allows you to have an auto_increment key per column value. What do I mean?

Look at the table in your question with additional rows:

Id          |     Prod     |    Acc    | Val
---------------------------------------------
ABC-AB12_1  |    ABC-AB12  |   ABC1    |  1
ABC-AB12_2  |    ABC-AB12  |   DEF1    |  2
ABC-AB12_3  |    ABC-AB12  |   GHI1    |  A
DEF-AB12_1  |    DEF-AB12  |   ABC1    |  1
DEF-AB12_2  |    DEF-AB12  |   DEF1    |  2
DEF-AB12_3  |    DEF-AB12  |   GHI1    |  A
GHI-AB12_1  |    GHI-AB12  |   ABC1    |  1
GHI-AB12_2  |    GHI-AB12  |   DEF1    |  2
GHI-AB12_3  |    GHI-AB12  |   GHI1    |  A

You could replace the Id with an auoincrement value and end up with this:

Id |     Prod     |    Acc    | Val
----------------------------------------------
1  |    ABC-AB12  |   ABC1    |  1
2  |    ABC-AB12  |   DEF1    |  2
3  |    ABC-AB12  |   GHI1    |  A
4  |    DEF-AB12  |   ABC1    |  1
5  |    DEF-AB12  |   DEF1    |  2
6  |    DEF-AB12  |   GHI1    |  A
7  |    GHI-AB12  |   ABC1    |  1
8  |    GHI-AB12  |   DEF1    |  2
9  |    GHI-AB12  |   GHI1    |  A

This you would do if the Id looks likr this:

PRIMARY KEY (Id)

OK, great. Now here is the added bonus: If you make the PRIMARY KEY look like this:

PRIMARY KEY (Prod,Id)

the data can be stored like this:

Id |     Prod     |    Acc    | Val
----------------------------------------------
1  |    ABC-AB12  |   ABC1    |  1
2  |    ABC-AB12  |   DEF1    |  2
3  |    ABC-AB12  |   GHI1    |  A
1  |    DEF-AB12  |   ABC1    |  1
2  |    DEF-AB12  |   DEF1    |  2
3  |    DEF-AB12  |   GHI1    |  A
1  |    GHI-AB12  |   ABC1    |  1
2  |    GHI-AB12  |   DEF1    |  2
3  |    GHI-AB12  |   GHI1    |  A

How is that possible? Only the MyISAM Storage Engine Has This Mechanism Built In !!!

I have discussed this before:

One more thing: Why have PRIMARY KEY (Prod,Id) as a PRIMARY KEY? This would allow you to sequence each Product ID. Thus, you can look for sequence 3 of one product and sequence 3 of another product.

EPILOGUE

Whichever way you decide to go, using a smaller autoincrement PRIMARY KEY (4 bytes) make more sense for performance and diskspace than a larger PRIMARY KEY (more than 4 bytes).

Give it a Try !!!