MySQL – How to Decide if Table Segregation is Beneficial

MySQL

I have an input data feed that is significant in size. Presently, I have one table with all entries input in it.

I want to find out how to decide whether splitting the table into smaller tables would be beneficial or just an overhead. I need some insight on how to do this analysis, and if I perform a test for small data how can I scale the results?

The thing with segregation is that query time will surely reduce but at the same time there will be an increase in data input time and memory allocation will increase as well so I don't know whether it's worth it or not.

Best Answer

Some of your premises are not necessarily true therefore could lead to false conclusion.

The thing with segregation is that query time will surely reduce

Properly sized and configured servers with the right indexes can still serve queries from tables in the TB scale in the milliseconds range. If you do point queries (single row lookups with primary key or fully covering secondary keys) then you will find that the significant time is spent on parsing the query string and authenticate/authorise the user not the actual data fetching.

memory allocation will increase as well

By having more tables you're certainly not going to increase the memory allocation for MySQL. Number of tables has no influence on memory usage. Discussing memory allocation is out of scope now but this may be useful to take a look: MySQL memory calculator.

there will be an increase in data input time

The insert time could only decrease not increase by having more tables.

Insert (and update) time can be improved by having more smaller table because the B+Trees become smaller. Given which storage engine you use it also matters if you insert in primary key order or random order. In case of InnoDB if you insert in random order there is a performance penalty of constant B+Tree rebalancing. In that case smaller tables could help.

However it does involve more logic on the application side and I don't know how much overhead that would be for your developers. Table partitions could be a middle ground here.

Do you delete from these table frequently?

If you do purging whole tables (or partitions) is much easier and can be done without affecting your production queries and replication (if you have) while DELETE FROM ... WHERE ... has to lookup the rows, it involves some locking (level is dependent on storage engine dependent), etc.

In this case splitting is certainly beneficial.

Do you want to archive the data?

MyISAM tables can just be simply copied over to another mysql server and it will just work. This is a very common and easy way to archive old data to a different server.

Splitting to tables by date work quite well in this case too.

Do you change your table structure frequently?

ALTERing many smaller table is more manageable than one huge.


These are the most common situations for splitting I can think of right now.

The question is do you experience any issues right now? Is there an explicit need to fix something? Can it be done by other ways of optimization? Configuration, indexes, query tuning, etc?

Testing

You can test your query time by replaying queries from tcpdump or slowlog.

You can also enable binary logs and replay insert/updates from ther eand see how fast they finish.

This is going to be a balance between many things including operability of the system so find what's work best for your use case.

I hope this helped.