Splitting up tables with similar structures since they contain unrelated data? (huge numbers)

database-designperformance

Is my reasoning correct in splitting up tables with similar structures since they contain unrelated data?

I'll explain the situation with an example :

Suppose there is a server which hosts an online game's database. The game has multiple gameworlds which operate in parallel. The schema for each gameworld of the game is same, let's say there are 15 relational tables in the database design of the game. Also assume that the server decides not to delete the data ofold servers and instead, archives it.

Approach 1 :

Now since the gameworlds are hosted on one server, It might seem logical to put all the game data into one database containing 15 tables. The table structure of an example table might look like this :

CREATE TABLE `table` (
    playerid ...,
    score ....,
    gameworldid ...,
    PRIMARY KEY(gameworldid,playerid)
) ENGINE = InnoDB

So, everytime I need to show top 10 players with most score, I will have to do something like SELECT * FROM table WHERE gameworldid = {id} ORDER By score DESC LIMIT 10

Approach 2 :

Since the data for one gameworld doesn't relate to data of the other gameworld in any way (except for the structure), we may make different databases for different gameworlds, or different set of tables for each gameworld. The table structure will look something like this :

CREATE TABLE `gameworldid_table` (
    playerid ...,
    score ....,
    PRIMARY KEY(playerid)
) ENGINE = InnoDB

In this case, top 10 players with most scorequery will be SELECT * FROM table ORDER BY score DESC LIMIT 10

The question is, which approach is better in this situation? The storage engine being used is InnoDB, and in all , 6 billion and up rows can be expected to be there in the biggest merged table, and 500 and up gameworlds. (Up since the old data isn't deleted, so when a gameworld restarts, a new gameworld is created.

In my opinion, the advantages of Approach 2 over Approach 1 are :

  • Less space consumed by primary key and indices, making them more efficient
  • Faster sorting, joining and other stuff as compared to merged table in approach one which contains lots of extra data that might come in way of these operations.
  • Faster inserts as compared to ones in larger tables
  • Data won't keep just piling up forever in the same table

And the advantages of Approach 1 over Approach 2 ?

  • Better manageability : 15 tables sound better as compared to 500 * 15 tables, if you ever have to do some manual fixing (rare).
  • Sounds less insane
  • Too many tables may just make database slower ( don't know for a fact, can't find it on the internet)

I guess my opinions on advantages and disadvantages may be biased, but I really want to know what all problems can come my way if I choose one Approach over the other. Also, is there any other possible solution for this solution?

Best Answer

Your issues are probably less about performance, assuming reasonable equipment and a good indexing structure, and more about other impacts that you face.

Approach 1 means some additional programming complexity to deal with filtering rows in each table to restrict access to a single gameworld. It also means that if you need to restore a gameworld, you would have to (a) restore all gameworlds (since they are in the same database) or (b) restore the one database to a recovery database and then script the appropriate gameworld data from Recovery to your Gameworlds database.

Approach 2 means that you need to be able to manage schema changes across many databases. This can be done by scripting as needed. It is also easier to choose to have different schemas for some gameworlds, but that adds management overhead. However, if a a single gameworld crashes, you can restore the latest good backup(s) and get it going again.

Some years ago some Microsoft folks wrote up their take on Multi-Tenanted databases. See: http://msdn.microsoft.com/en-us/library/aa479086.aspx

Having more tables should not, in itself, cause performance problems within the size range you describe.

Choosing your approach should take into account leveraging your strengths so that you have a system you (and your project members) can readily manage.