I've read a number of articles regarding efficiency of primary keys depending on the storage engine, and I am left confused.
Given a simple many-to-many table with two fields, storeId
and zoneId
, which one of the following designs is the most efficient with InnoDB and why?
- using the two fields as a composite primary key:
CREATE TABLE store_zone(
storeId INT(10) UNSIGNED NOT NULL,
zoneId INT(10) UNSIGNED NOT NULL,
PRIMARY KEY(storeId, zoneId)
);
- Using a specific auto increment primary key:
CREATE TABLE store_zone(
id INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
storeId INT(10) UNSIGNED NOT NULL,
zoneId INT(10) UNSIGNED NOT NULL,
PRIMARY KEY(id),
UNIQUE KEY(storeId, zoneId)
);
Notes:
- I need a unique key on the (
storeId
,zoneId
) pair anyway - I have foreign keys to the
store
andzone
tables, not shown here for readability, so in both cases there's an extra required index onzoneId
as well
Best Answer
Analysis
Since we are talking about InnoDB, let's zero in on the
gen_clust_index
. It's a special index that keeps the PRIMARY KEY and associated row data accessible from the same InnoDB pages.According to MySQL Documentation on the gen_clust_index
This being the case, you would really feel efficiency or deficiency in the following areas:
DiskSpace
Having the single auto_increment column as the
PRIMARY KEY
keeps the overall size of the PRIMARY KEY smaller that that of having two columns. Why? BTREE pages would be twice as big if thePRIMARY KEY
was two INTs instead of one. This becomes even more painful if you use foreign key constraints and secondary indexes as they must subsequently blow up in size as well.In this instance, you would choose the second schema for better index usage.
Insert Performance
Inserting row data into an InnoDB table that has two UNIQUE indexes calls for twice as much BTREE management and unique checks.
(storeId, zoneId)
)Additional microseconds for each additional UNIQUE index would add up in CPU time when inserting millions of rows.
In this instance, you would choose the first schema for faster INSERTs.
Conclusion
The diskspace and insert performance issues pretty much force you to choose. Obviously, you choose which one to live with. If you use foreign key constraints that represent
(storeId, zoneId)
, then you need the second schema using the auto_increment as the reference in outside tables.