MySQL Data Transfer – Continuously Move Data from Server A to Server B While Deleting Data in Server A

clickhouseMySQLnosqlPHP

I'm developing an ad server that is expected to handle ad impressions/billion clicks per day.

The most difficult challenge I am facing is moving data from one server to another.

Basically the flow is like this :

  1. Multiple front facing load balancers distributes traffic (http load balancing) to several servers called the traffic handler node.

  2. These traffic handler nodes job is to store the click logs in mysql table (data like geo , device, offer id, userid etc) and then redirect traffic to the offer landing page.

  3. Every minute cron job runs in all traffic nodes that which transfers all the clicks logs to reporting server (server where all reports generation is done) in batches of 10000 rows per minute , and then delete the data after confirming that the data is successfully received by the reporting server. The reporting servers uses clickhouse database engine

I need to replace the mysql database engine from the traffic nodes as I'm facing a lot of issues with MySQL. Between the heavy inserts and then the heavy deletes it's getting slow. Plus, the data is being transferred via cron job so there is 2 minutes average delay.

I can't use clickhouse in these server as Yandex clickhouse do not support updates and deletes yet and the click logs are supposed to be updated many times (how many events happened on the visit)

I'm looking at kakfa but again I'm not sure how to achieve one way data transfer and then deletion of data.

Maybe my whole approach is wrong. I would be very grateful for any expert to guide in right direction.

Best Answer

I don't think that your approach is wrong, but I don't have much information. Your question is clear, but this space is limited compared to the complexity of your system.

Definitely you shouldn't consider using Clickhouse for OLTP. Not only because DELETE and UPDATE are not (yet) supported, but also because this database is designed to provide good performance for analytics. It lacks more or less everything is needed to optimize an OLTP workload.

Kafka is a good idea? Maybe. But you won't have transactions, for example.

I suggest to try to optimize your MySQL environment first. Some very generic points - sorry if they sound obvious to you, but of course I can't know your skills lever:

  • Tune your configuration for speed, as far as I understand you can give up some consistency.
  • Use READ UNCOMMITTED wherever there is not a risk of conflicts. The benefits of this isolation level are greatly underestimated.
  • Instead of running many DELETEs, consider partitioning tables by time and periodically drop the oldest partition. This is much faster.
  • Consider switching to a storage engine optimized for write-intensive workloads, namely MyRocks or TokuDB.
  • Consider using multiple slaves for scaling reads.
  • Consider using multiple masters for scaling writes.
  • These servers can even be distributed geografically, if needed.
  • Don't consider Galera or InnoDB Cluster, as writes don't really scale.

Hope these hints are useful.