Mysql – Will using bigint vs mediumint have a performance impact

database-designMySQLperformance

I'm considering using mediumint or bigint for an ID column. I prefer to use bigint, because that way I'm very sure that the huge numbers that the client sometimes uses are stored just fine. However, I can also technically split up the numbers, and that way use a mediumint. I do not care about storage, but I do care about speed/performance. Is it better to use mediumint than bigint?

This is on an indexed, auto-increment ID column which will be used a lot for SELECT queries.

Best Answer

Why does the user know the value of the ID column? This should be abstracted from them. They should be able to identify their data by some other natural key (even if this is only implemented as a unique constraint and you use the surrogate ID as a primary key for performance reasons). If you take away the ID column and you can no longer uniquely identify a row, there is likely an issue with the design.

While you can save some space and improve performance in small ways by choosing a different data type, you need to balance performance with actual business requirements. If you're going to exceed the bound of a smallint or mediumint then you shouldn't use either. Don't make your design complex for the exception (e.g. "splitting up the numbers") to try and optimize for the general rule. This complexity is unnecessary IMHO and will only lead to problems.

In SQL Server we can take advantage of data compression, which essentially treats a bigint column in the following way:

  • if the number fits in a tinyint (0-255), it takes 1 byte
  • if the number fits in a smallint (-32K -> 32K), it takes 2 bytes
  • if the number fits in an int (-2Bn -> 2Bn), it takes 4 bytes
  • only if the number exceeds the bounds of an int does it take 8 bytes

(This is a simplification. And this is the worst case where it treats each value independently only if values on a page are unique and non-sequential, which wouldn't be the case for an auto-increment column. In addition page compression uses other methods like dictionary/prefix, which can reduce storage even further.)

Of course compression trades CPU for I/O, so while you save raw storage and reduce I/O, you pay in CPU. Even today most systems are I/O-bound, so this is a benefit, but shouldn't be considered if you're CPU-bound.

I don't believe there is similar functionality in MySQL, but I am not certain. If there is you should look into it.

And one final note. You say "I do not care about storage, but I do care about speed/performance." If you care about speed/performance, then you should care about storage. The more pages it takes to store your data, the harder the engine is going to have to work to read/write those pages, perform seeks, etc. But again, there needs to be balance. You shouldn't take an extra four weeks over-engineering a solution that is going to save you 10 seconds a week.