Mysql – Performance impact MySQL replication vs table-synchronisation script

MySQL

Currently I am considering implementing a MySQL table replication solution for a client, this client has 2 servers in 2 different locations:

  • Server 1 running an application with mysql databases (Running Ubuntu 14.04 LTS)
  • Server 2 has a PHP webserver frontend to serve it's customers with a website and customer panel (also Ubuntu, same version).

The databases on server 1 total around 2.5 GB in size.
The servers are in different data centers but within 5ms latency of one another both with gigabit uplinks and heavy dataplans.

Currently the webserver(2) relies completely on the availability of the application server (1). Though this is essential it does mean that the webserver is effectively useless in case of maintenance or unscheduled downtime, the client does not wish to synchronize the databases to both locations but would allow the synchronization of a single table of account login data (just the one table, no other tables from the same database) to at least keep the bare functionality of the website intact.

Now the 2 options I currently am considering are:
A. Setting up MySQL Replication Master->Slave setup for this table
B. Using a "smart/efficient" table sync script (https://www.percona.com/doc/percona-toolkit/2.1/pt-table-sync.html)

My consideration is also that the binary log of the application mysql server is around 8.5Gb a day and I was unable to find out the exact performance impact of this is on replication. I can say that the table of which I speak is 88MB and has around ~250.000 rows. The daily changes to the account table I wish to synchronize are very minor, I expect less than 1000 rows a day).

I am currently leaning towards option B because

  • It would avoid future issues on mysql version conflicts between servers (which I do not both manage for the client)
  • The amount of data changes are rather light on the specific table I wish to synchronize.
  • The methods MySQL offered to synchronize just specific tables seemed to fall short as the replication technology puts more emphasis for use as a full replication solution (in this my fear that big binary logs would have a negative impact based on mysql forum/article comments)
  • "to the second" data accuracy is not an concern for the client, a sync every hour would already suffice (his words).

My question then of course is if anyone can provide information on if option A or B (or another) would be preferred in this scenario and more importantly why?

Please note: I aim to keep the core setup which the client runs intact, the application or web server cannot switch to another database engine.

Best Answer

First of all about Variant A:

  1. Size of bin-log will not affect to performance if You will use filter to replicate only selected tables, and even if You will replicate all - it also normally not a problem. For compare we replicate 20Gb (and more) between datacenter with real-time loading (telecom)
  2. Keep version of Source and Target Server same - it good practice even if You choose Variant B - it avoid many other troubles

About Variant B - I personally look for tools similar pt-table-sync as for excellent but occasional tools in Your toolbox, and not for permanent solution. I can not explain drawbacks, it just personal vision

Variant C: Tools like - SymmetricDS

Variant D: I personally actively work now with CDC variants, when You parse bin-logs and ship them to subscriber over message queue.

I have tested under loading at least 2 working solution (really it more for now):

  1. StreamSets
  2. Maxwell + Talend
  3. Very interesting for some reasons - MariaDB MaxScale, but have not a time for make it work

I personally like this because:

  1. have a lot of integration projects
  2. not always source and target system same. Most resent in my case - MySQL to Redshift, DB2
  3. It also allow realise "event-based" integration - fire Sub-Job when specific data changed
  4. Have more than 1 subscriber with the same producer

Edited for add answer for question in comments:

The problem as medal have 2 (and more) sides:

  1. Size of binary-logs - this is objective, but with good network it not a issue. for make shipping faster You can enable compression - see more
  2. Time what Slave need to apply all changes from Master. Even if You enable multithread slave it still slower than on Master, and by default with single thread it could be a problem with very high amount of changes on Master.

But as I wrote You can enable filter on Slave side for reduce number of monitored objects - replicate-do-table and Slave will ignore all other changes and apply only for this table (this is case of more wide filter - replicate-do-db), You can repeat this settings for several tables, or use wildcards

Filter logic diagram

Any solution have and benefits and drawbacks, I will prefer Replication because it will apply updates "on the fly" but not only when I run the external program.