It depends on your engine. Common wisdom is that reads are cheap, a few bytes here and there will not significantly impact the performance of a small to medium size database.
More importantly, it depends on the uses to which you will put the primary key. Integer serials have the advantage of being simple to use and implement. They also, depending on the specific implementation of the serialization method, have the advantage of being quickly derivable, as most databases just store the serial number in a fixed location, rather than deriving it with Select max(ID)+1 from foo
on the fly.
The question becomes: how does a 5 character key present a "meaningful value" to you and to the application? How is this value created, and does it take more or less time than finding an incrementing serial number. While there is a trivial amount of space saved in some integers, the vast majority of systems will ignore this space savings.
There are no performance implications, save that the character scheme requires that there never be an automatic engine, as your "keys" are underivable. For your specific domain, don't bother with artificial keys, and just use Chinese, Japanese and Thai as key names. While you cannot guarantee uniqueness over any possible application, in your scope it is much more reasonable to use them instead of horrible and forced 5-character abbreviations. There are no significant performance impacts until you get to the millions of tuples.
Alternatively, if you're just tracking by country of origin, and not specific regional cuisines (Cantonese, Sichuan, Sicilian, Umbrian, Calabrian, Yucatecan, Oaxacan, etc.), you could always just use ISO 3166 codes.
If I have 10,000 recipes doesn't the difference between a 5-character and 20-character key start to add up?
Space is cheap. When you're talking 10,000,000 recipes that you're doing OLAP operations on, then, maybe. With 10k recipes, you're looking at 150k of space.
But again, it depends. If you have many millions of records, and are doing joins on them, then it makes sense to denormalize the lookup for something this trivial (into a materialized view). For all practical purposes, the relative join efficiency on a modern machine between a 5 character key and variable length key is so similar to be identical. Happily, we live in a world of plentiful CPU and plentiful disk. The nasty ones are too many joins and query inefficiency, rather than character-by-character comparison. With that said, always test.
P&T things of this level are so database-dependent that generalizations are extremely difficult. Build two sample models of the database, populate them with the estimated numbers of records, then see which one is faster. In my experience, character length doesn't make a huge difference compared with good indexes, good memory configurations, and other critical performance tuning elements.
I would suggest that the "right direction" would be to go a different direction entirely.
A sufficiently random and difficult-to-guess distribution of keys, assuming that is even an achievable objective, is likely to be otherwise sub-optimal for space and performance. But that's the second problem.
The first problem is that exposing the internals of your database in such a fashion is intrinsically unsafe.
Instead, create a separate column for this "PIN" -- of whatever desired length and data type, with a unique key constraint on the column so no more than one record can possibly have the same PIN associated with it.
You can then populate this value using whatever algorithm you want, and you can subsequently change the algorithm, change the min/max lengths of the value, etc., without any other impact on the rest of your application or database as a result of such changes.
Best Answer
As far as PostgreSQL is concerned, your assumption
is generally not true. There are special use cases, where sequential numbers help somewhat, but not with index performance. In normal operation it does not make any difference whatsoever.
Operations on
UUIDs
are a somewhat slower than onint
orbigint
, because the are 16 bytes long (as opposed to 4 / 8 bytes).Generally, having gaps in the numbering is the rule, not the exception.
For your special case, if you have a small number of instances and control over them, you could set aside separate ranges of a
bigint
or even justint
: sequences with pre-setSTART
value, for instance. Since you don't have mass inserts, there should be plenty of numbers.For more instances or if you don't have control, I would strongly advice to use
UUID
instead of any home-grown solution. It's an established, well-tested technique and performs well.