MySQL schema design – how to decide between an `enum` vs. a table join

database-designMySQLrdbms

In designing a MySQL schema, should I avoid using an enum type altogether, as I have been reading really strong, but favored opinions on this subject, such as:

Or, is there a choice to be made specific to the use case in hand? If there is, what decision framework should I use, i.e. when to choose one of the two options?

For instance:

  • Currency: It's better to use a table join, since currency codes to be supported could change with time (since modification has been emphasized in many posts as anti-enum, but then what could not change with time?).
  • PaymentStatus: It's better to use enum type with values like, SUCCESS, FAILED, CANCELLED.
  • TierPlans: It's better to use a table join, as it would give the flexibility to add columns like monthlyPricing, annualPricing, discountRate, etc. in the future. (I am currently using enum instead, with values like FREE, PRO, MASTER; hadn't thought of pricing yet -_- ).

I am finding that if I stress enough, I could find a table join being a better choice always, mutation safe route. Is increasing number of tables or table joins a symptom of bad design? How should I approach
design in general, and design in such use cases specifically?


Note: I had asked this question on SO here. I am posting it here since I see that there is a closing request there, and I desperately need an answer to this confusing roadblock of mine. Understandably this might not be a good question to ask here too. I'd rather edit, if you may guide!

Best Answer

(Yes, there have been many heated discussions of ENUM vs TINYINT vs VARCHAR, both here and on stackoverflow.com. I have put my 2-cents in on some of them. I am addressing this Question to discuss nuances of the examples brought up.)

Currency is a tricky one. When, for example, a devaluation occurs, does the currency_code change?

If so, then is the question is about how often the list of values is likely to change.

If it does not change, then you may have a mess for the data -- Is "XYZ", the old xyz currency, or the 'new'?

With things like country_code (with well-established 2-letter codes) and currency_code (3 letter standard), etc, I recommend using things like

country_code  CHAR(2) CHARACTER SET ascii
currency_code CHAR(3) CHARACTER SET ascii

Rationale:

  • Only 2 or 3 bytes, which is not much bigger than a 1-byte ENUM or TINYINT.
  • Avoids the extra bulk and overhead of utf8mb4, which might be the default for the table.
  • You can still build a secondary table to, for example, spell out the full name of the country or currency.
  • You won't need a JOIN except when you want the extra info. (JOINing is most costly when you need to start from multiple Dimension tables in an attempt to look up rows in the Fact table. A JOIN is reasonably cheap going the other direction.)