Mysql – Multi-Column PK or Auto-Increment PK

indexinnodbMySQL

In another question I mentioned that I have a table where, in order to ensure uniqueness, I need to use multiple columns as the PK. However, lookups are usually only done using the first of those columns.

The table will ultimately have as many as hundreds of millions of rows, so it was pointed out that an auto-increment PK might be better and my multi-column PK would be less performant. Inserts can be several per second.

E.g. please ignore questions about columns, this is just an example.

CREATE TABLE records(
  record_code VARCHAR(20) NOT NULL,
  record_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  user VARCHAR(30) NOT NULL,
  other_data VARCHAR(100),
  other_data VARCHAR(100),
  PRIMARY KEY (record_code, record_date, user)
) ENGINE=InnoDB;

Or:

CREATE TABLE records(
  id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  record_code VARCHAR(20) NOT NULL,
  record_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  user VARCHAR(30) NOT NULL,
  other_data VARCHAR(100),
  other_data VARCHAR(100),
  PRIMARY KEY (id),
  KEY record_code (record_code)
) ENGINE=InnoDB;

It made sense to me that an AUTO_INCREMENT surrogate PK would save disk/memory space and allow faster inserts, but another commenter disagreed, so I am wondering if it's OK to use the multi-column PK, is it just as fast? Is there not much difference?

NOTE it is possible at some point that may need to start doing lookups by both the first AND second columns, so in the second example above the second index would be KEY record_code_date (record_code, record_date) so inserts are doing almost as much work as with just using the multi-column PK(?). Does this change the advisable approach?

Best Answer

It depends.

One must weigh the alternatives. Here is a partial list:

  • Rows are INSERTed in an ordered list (the BTree that contains the data, and is ordered by the PK) according to the PK. For a huge table and random PK (eg UUID), this means jumping around a lot, which leads to poor caching.

  • New rows with an AUTO_INCREMENT PK are inserted at the "end" of the table, thereby avoiding the above point.

  • Every secondary key has a copy of the PK. So, if the PK is bulky, you are taking more space for the secondary keys.

  • Ditto for other tables that need to JOIN to this table (with or without FOREIGN KEYs).

  • A PK is necessarily UNIQUE. Hence, DATETIME and TIMESTAMP make risky fields to put into a PK. (What if two things happened in the same second?)

  • Will you be fetching records based on the column(s) of a multi-column key that could be the PK? If so, you get to them faster because you don't have to go through a secondary key faster.

  • Consecutive records (according to all columns of the PK) are "clustered" next to each other. This means that fetching multiple rows will need fewer blocks from disk.

  • Etc, etc.