It depends on your engine. Common wisdom is that reads are cheap, a few bytes here and there will not significantly impact the performance of a small to medium size database.
More importantly, it depends on the uses to which you will put the primary key. Integer serials have the advantage of being simple to use and implement. They also, depending on the specific implementation of the serialization method, have the advantage of being quickly derivable, as most databases just store the serial number in a fixed location, rather than deriving it with Select max(ID)+1 from foo
on the fly.
The question becomes: how does a 5 character key present a "meaningful value" to you and to the application? How is this value created, and does it take more or less time than finding an incrementing serial number. While there is a trivial amount of space saved in some integers, the vast majority of systems will ignore this space savings.
There are no performance implications, save that the character scheme requires that there never be an automatic engine, as your "keys" are underivable. For your specific domain, don't bother with artificial keys, and just use Chinese, Japanese and Thai as key names. While you cannot guarantee uniqueness over any possible application, in your scope it is much more reasonable to use them instead of horrible and forced 5-character abbreviations. There are no significant performance impacts until you get to the millions of tuples.
Alternatively, if you're just tracking by country of origin, and not specific regional cuisines (Cantonese, Sichuan, Sicilian, Umbrian, Calabrian, Yucatecan, Oaxacan, etc.), you could always just use ISO 3166 codes.
If I have 10,000 recipes doesn't the difference between a 5-character and 20-character key start to add up?
Space is cheap. When you're talking 10,000,000 recipes that you're doing OLAP operations on, then, maybe. With 10k recipes, you're looking at 150k of space.
But again, it depends. If you have many millions of records, and are doing joins on them, then it makes sense to denormalize the lookup for something this trivial (into a materialized view). For all practical purposes, the relative join efficiency on a modern machine between a 5 character key and variable length key is so similar to be identical. Happily, we live in a world of plentiful CPU and plentiful disk. The nasty ones are too many joins and query inefficiency, rather than character-by-character comparison. With that said, always test.
P&T things of this level are so database-dependent that generalizations are extremely difficult. Build two sample models of the database, populate them with the estimated numbers of records, then see which one is faster. In my experience, character length doesn't make a huge difference compared with good indexes, good memory configurations, and other critical performance tuning elements.
I think the easiest solution for you would be to create a stored procedure that took @GameName and @ConsoleName.
CREATE PROCEDURE InsertGame @GameName VARCHAR(50), @ConsoleName VARCHAR(50)
AS
BEGIN
DECLARE @ConsoleID INT
SELECT @ConsoleID = ConsoleID FROM Console WHERE ConsoleName = @ConsoleName
IF @ConsoleID IS NULL
BEGIN
INSERT Console (Name) SELECT @ConsoleName
SELECT @ConsoleID = scope_identity()
END
INSERT Game (Name, Console) SELECT @GameName, @ConsoleID
END
If you entered in a @ConsoleName that didn't already exist in the Console table, it would INSERT it there. Next the stored procedure would look up the ID for the console name you provided it, and INSERT a record into the Game table with @GameName and the @ConsoleID.
Then you'd call that stored procedure for each Game you were planning to enter:
EXEC InsertGame 'Grand Theft Auto 5', 'PS4'
EXEC InsertGame 'Grand Theft Auto 5', 'PC'
EXEC InsertGame 'Final Fantasy XV', 'PS4'
EXEC InsertGame 'Overwatch', 'PC'
EXEC InsertGame 'Civilization 5', 'PC'
And so on until you've inserted all of your games.
Please note that the exact syntax I've used is for the database platform I am most familiar with, but the underlying logic should be usable in any database platform.
Also, as I write this I am struck by the idea that what you really want is to support a many-to-many relationship between game titles and platforms, since a platform can have many games and a game can be available on multiple platforms. That would take a little more work, but would be helpful for efficient storage and would give you the ability to easily determine which platform(s) a given game was available for without doing a text-match.
So, at a high level you'd need another table. The existing Game table would not have a ConsoleID column. Instead there would be a new table, likely called Console_Game, that would include only ConsoleID and GameID as keys. Then the stored procedure would need to check for both the existence of ConsoleName, and the existence of GameName. Inserting into both tables as necessary, and then finally write a record to Console_Game to show that a given game is available on a given console.
This will be more work upfront, but it is a more scalable solution than what you currently have. If this is for a job and not a hobby you should probably go that route.
Best Answer
UUIDs
are useful when you have clients independently generating unique identifiers.id INT UNSIGNED AUTO_INCREMENT
is smaller, faster, 'ordered', etc.Use UUIDs only if you don't have a viable alternative. More discussion: http://mysql.rjweb.org/doc.php/uuid
In my opinion, sha256 is overkill for a 'digest'.
500K rows
INSERTed
per day? That's 6/second? Not a problem. When you get to 100/sec, we should talk further.