Mysql – thesql clustring

MySQL

i have to manage a large scale mysql database server to handle about 20-30mil request per hour
i read about mysql clustering but there are some difficult points that i dont understand

first i referenced this document :
https://www.digitalocean.com/community/tutorials/how-to-create-a-multi-node-mysql-cluster-on-ubuntu-16-04

there isa mysql manager with two node
it don't distribute what happens if manager fails?

there are some other options like table sharding or Kubernetes solutions
but i dont know what is a bet choice?!

is there a topology for mysql to handle large transaction with reliability !?

UPDATE:
my hardware for cluster is :
3 dedicated server each one
hp g10
dual cpu 14 core 35 mb cache
512 gig hard
6*400 gig ssd raid 10

Best Answer

(Beginnings of an Answer; more may come as more details are available.)

This answer assumes 40K vehicles sending location and status information 2K times per day.

Ingestion of data

I strongly advise that the PRIMARY KEY for the main table start with vehicle_id. That id should be a 3-byte MEDIUMINT UNSIGNED, thereby allowing up to 16M vehicles. (Smallint's limit of 64K is cutting it too close.)

Decide what output you need, then work backward to specify the minimum amount of data to store. Meanwhile, capture the raw incoming messages from the vehicles in plain text file, just in case you need to look back for more details. (This may satisfy some legal issues.)

2000/day = 1 or 2 per minute. At high speed, the vehicle can go more than a mile (or km) between signals.

Don't forget to worry about driving through a tunnel.

May I ask what country your are doing this in? Some countries have lax requirements, some are quite strict, in reference for this "sensitive" data.

To allow for very efficient deletion of data, PARTITION BY RANGE(TO_DAYS(..)) and use DROP PARTITION instead of DELETE. Details here: http://mysql.rjweb.org/doc.php/partitionmaint With that, there will be essentially no downtime to purge 'old' data.

How much RAM? For ingestion, my above advice needs about 1GB of the buffer_pool to keep 30-40K "hotspots" in RAM. What other tasks are regularly performed? Especially the main SELECTs.

Is the data coming as HTTP requests? Is it 30M/hour (8K/sec) or 80M/day (1K/sec); your numbers seem inconsistent-- please check. Either way, it would need a lot of web servers. And there is still some issues to solve to get the ingestion speed up.

Is there any "normalization" to do? For example, does the message identify the vehicle with a 17-character VIN? This would need to be mapped to a 3-byte Mediumint in order to cut back significantly on the disk space. Such is best done by collecting the info in a "staging" table, then normalizing en masse. See http://mysql.rjweb.org/doc.php/staging_table (And hence my question about the possible delay before the data is available for queries.)

Other issues

TBD