Mysql – Aggregating high volumes of data from many MySQL servers

data synchronizationMySQLscalability

I have 4 servers of MySQL, each receives around 4 million records a day.

I need to aggregate all the inserted records into another main server after doing some mapping processing to the aggregated records. so this new server should receive around 16 million records per day.

I also need to make this operation as real time as possible, so I was thinking of a script running every minute or so that do the sync. or perhapse some other sync tool or mechanism you can suggest.

So, may question is:

  • Is MySQL the right choice for a 16-20 million records/day ? or I should try something else (MongoDB, Elasticsearch, .. etc)
  • What tool/language to give the best performance in such task?

Thank you.

Best Answer

Yes, MySQL will be able to do 20 million inserts per day, no problem. My calculator says 20 million per day translates to roughly 231 inserts per second. See e.g. this blog post from Percona's Database Performance Blog from 2010 which talks about more than 36 thousand writes per second. With the optimisations that have gone into MySQL since then as well as the hardware improvements, this number will obviously be even higher today.

And from a skills perspective, since you're already using MySQL, it might make sense to use that for this aggregation database as well.

Note that you can increase write throughput by combining multiple inserts together in a single transaction.

However, you may be able to use multi-source replication:

MySQL Multi-Source Replication enables a replication slave to receive transactions from multiple sources simultaneously. Multi-source replication can be used to back up multiple servers to a single server, to merge table shards, and consolidate data from multiple servers to a single server. Multi-source replication does not implement any conflict detection or resolution when applying the transactions, and those tasks are left to the application if required.

So if you're able to use this, then you won't have to write any code to do the actual replication.