It depends on your engine. Common wisdom is that reads are cheap, a few bytes here and there will not significantly impact the performance of a small to medium size database.
More importantly, it depends on the uses to which you will put the primary key. Integer serials have the advantage of being simple to use and implement. They also, depending on the specific implementation of the serialization method, have the advantage of being quickly derivable, as most databases just store the serial number in a fixed location, rather than deriving it with Select max(ID)+1 from foo
on the fly.
The question becomes: how does a 5 character key present a "meaningful value" to you and to the application? How is this value created, and does it take more or less time than finding an incrementing serial number. While there is a trivial amount of space saved in some integers, the vast majority of systems will ignore this space savings.
There are no performance implications, save that the character scheme requires that there never be an automatic engine, as your "keys" are underivable. For your specific domain, don't bother with artificial keys, and just use Chinese, Japanese and Thai as key names. While you cannot guarantee uniqueness over any possible application, in your scope it is much more reasonable to use them instead of horrible and forced 5-character abbreviations. There are no significant performance impacts until you get to the millions of tuples.
Alternatively, if you're just tracking by country of origin, and not specific regional cuisines (Cantonese, Sichuan, Sicilian, Umbrian, Calabrian, Yucatecan, Oaxacan, etc.), you could always just use ISO 3166 codes.
If I have 10,000 recipes doesn't the difference between a 5-character and 20-character key start to add up?
Space is cheap. When you're talking 10,000,000 recipes that you're doing OLAP operations on, then, maybe. With 10k recipes, you're looking at 150k of space.
But again, it depends. If you have many millions of records, and are doing joins on them, then it makes sense to denormalize the lookup for something this trivial (into a materialized view). For all practical purposes, the relative join efficiency on a modern machine between a 5 character key and variable length key is so similar to be identical. Happily, we live in a world of plentiful CPU and plentiful disk. The nasty ones are too many joins and query inefficiency, rather than character-by-character comparison. With that said, always test.
P&T things of this level are so database-dependent that generalizations are extremely difficult. Build two sample models of the database, populate them with the estimated numbers of records, then see which one is faster. In my experience, character length doesn't make a huge difference compared with good indexes, good memory configurations, and other critical performance tuning elements.
I'll answer obliquely...
The natural key is always the natural key and should be enforced with a unique constraint or index. This is the "primary key" that flows from your modelling phase.
The choice of an auto-number/identity surrogate key matters at implementation phase because there are good and bad choices for your clustered index (example: SQL Server, Sybase, MySQL InnoDB, Oracle IOT).
That is, primary key is orthogonal to your clustered index: don't confuse the two issues
I'd suggest using a contrived key adds no value over using an auto-number/identity column in this respect. You lose data from the natural key, probably won't be unique, is just as opaque.
FWIW, I use surrogate keys and composite keys when I need too:
- Some natural keys are useful in their own right: ISO currency and country codes
- A table with no secondary (non-clustered) indexes and no child table doesn't benefit from a surrogate key
- If you have parent-child-grandchild, then I usually need to join parent-grandchild: with composite keys I can do so directly. Simpler JOINs, simpler indexes
Note: this assumes that every table requires a clustered index
Related on dba.se: SQL Server Primary key / clustered index design decision
Best Answer
I'm going to say no, not always, but most of the time yes..
These are some circumstances in which you don't need a surrogate or artificial key:
table with a unique business key which is fixed externally to your
business and which has zero chance of ever changing for any
practical purpose, then using the business key directly can make
things simpler. An example might be a list of state or province
codes or a list of ANSI standard numbers, etc.
There are also some situations where the old-faithful monotonically increasing integer surrogate key is not ideal. You can have keys that are alphanumeric surrogates. These could include:
Why most of the time yes? The most fundamental answer to that question is that it is pure hell if you ever need to modify a primary key value on any table. Since almost anything a user can see or touch is conceivably subject to an update at some point, using a visible key value is inviting pure hell. Using a surrogate key will keep you from falling into this trap.
Having said that, remember that there is room for YAGNI in applying this concept. You don't need to go forcing code tables with IDENTITY keys into every nook and cranny of your schema, just in case someone decides that the symbol for male gender in your employee table needs to change from M to X or something silly.