Mysql – Need Expert suggestion for choosing best DB scalability

mariadbMySQLscalability

I am looking best opinion on choosing best DB scalability based on below requirement.

  • Application

    App install tracking platform (Ad serving).
    It's used to track click, impression, app install.

  • Framework

    Spring Boot, Java 8 with Tomcat 9

  • HAProxy

    Application Load Balancing.
    At present there are 3 nodes connected to it.

  • Database

    MySQL

We are expecting 1000 clicks per sec (it can go up to 10,000-100,000 clicks per sec). So please let me know best solution to active it.

As of now I am little confused which one I used to go either:

  1. MySQL Master Master Replication
  2. MySQL Master Slave
  3. MySQL Cluster

Please assume we have 85% read and 15% write. Read is for checking campaign daily, Monthly or total conversion capping before redirecting clicks to landing page.

Also please suggest whether MySQL is best fit for these type of requirement or not. If not which one is best.

Also please highlight what will be the:

  1. Backup Strategy
  2. Future table or row adding
  3. Network Failure

Best Answer

Will you be INSERTing one row per click? Or just incrementing a counter?

INSERTing or UPDATEing as many as 100K rows in one second requires some special planning. First, plan on gathering a bunch of rows, then batching the insert or update; do not try to insert/update one click at a time. (More on this in a minute.)

Build and maintain "summary tables" so that the queries will not have such a significant impact the system. (More to come.)

I would suggest a Galera Cluster (MariaDB 10 / PXC / Mysql+Galera) to give you some scaling, HA, etc. Since you have a large read requirement, you might need multiple slaves hanging off each cluster node.

Backups could be handled by taking a slave offline. (There are other options; I don't see it as much of a problem.)

You have not explained the 85% reads enough for me to finish this discussion; I will focus on the writes and some of the reads.

Use SSDs of sufficient size. Have sufficient network capacity.

Have enough client machines -- possibly more than MySQL servers; the number depends on several things. The design should easily grow by adding clients, so I am not too worried about how many clients until the system is being prototyped.

The best high-speed ingestion I know about is here . It allows multiple clients to insert into a temp table, ping-pong to another table while doing the INSERTs (or UPDATEs). If all you are doing is counting hits, this design takes a load off the main table because it would do some of the tallying in the tmp table. Also, you would probably use INSERT ... ON DUPLICATE KEY UPDATE ... SELECT ... to move the data from the temp to real table. (Caveat: The blog was written with a single Master in mind; I need to do some research to tailor it to Galera, especially if this is happening on all 3 nodes. Initial thoughts suggest that only minor tweaks are needed.)

Normalization can, and should, be done from the tmp table. (The blog discusses that.)

With Summary table(s), the 85% estimate for reads may drop significantly. The idea is to create and maintain tables that are subtotals over an hour or day. Then the queries actually hit the summary tables and run a lot faster than against the real table. (Sometimes 10x faster.) The summary table(s) may be denormalized for extra convenience/speed. The Summary table updates are likely to be IODKU from the tmp table.

I need to see more details about the schema and the tentative inserts/updates and selects. This entire discussion will probably expand beyond what can/should be handled in a free forum.

Reference: Summary table blog