Mysql – Use surrogate or natural PK if a row is uniquely identified by two or more columns

database-designMySQLprimary-key

I am having a hard time choosing between natural and surrogate PK for my database table.

The database is for a ranking system for an MOBA game. Here is more or less what I would do if I were to go the natural route.

Player table

 - PlayerID Int AI NN UQ
 - PlayerName Varchar PK
 - ServerName Varchar PK
 - Registered date, level, hero etc misc columns

Rank table

 - PlayerName FK references player
 - ServerName FK references player
 - RankID Int AI NN UQ
 - PlayerRank Int NN UQ

The thing is, each row in the player table is uniquely identified by the pair of PlayerName and ServerName. I thought using a surrogate key in this case is not really appropriate but I would like to hear suggestions on this.

Best Answer

That is part of one of computing's big philosophical debates. Sometimes it gets as heated amongst information theorists as the vi/emacs thing does amongst Linux developers...

I always avoid using a value that might change as a key or part of a key, in this case player name or server name unless your system has an explicit rule that those properties can not change and must be unique. While you can in some databases use ON UPDATE CASCADE to remove problems due to the value of a PK needing to be updated, this feels unclean to me. If the player name must be unique then you do not really need a separate player ID as playername+servername is a candidate key in the player table too.

BTW: Your player table feels incorrect to me.I would keep servers as a separate entity so the same player record can be linked to multiple servers, so one player who plays in many places does not have to have different records for each server. Something like:

    Player             Ranking               Server        
    --------------     -----------------     --------------
    PlayerID (PK)  <-- PlayerID (FK, PK)     ServerName    
    PlayerName         ServerID (FK, PK) --> ServerID (PK) 
    {more columns}     PlayerRank            {more columns}

or

    Player             Ranking           Server        
    --------------     -------------     --------------
    PlayerID (PK)  <-- PlayerID (FK)     ServerName    
    PlayerName         ServerID (FK) --> ServerID (PK) 
    {more columns}     PlayerRank        {more columns}
                       RankID (PK)

(probably the former as PlayerID+ServerID is a candidate key and not having the extra ID saves space in both not having the extra data and needing an extra index)

One further point: PlayerRank can not be a unique index on its own as you will have a "rank 1" player on each server that has at least one player, so the unique index will need to be over ServerID+PlayerRank.