it seems the binlog doesn't include the new inserts
I'm not sure whether you're saying the binlog actually doesn't include them, and you have confirmed this with mysqlbinlog
, or that it "seems" like it doesn't, because they don't replicate.
PXC needs log_slave_updates
turned on at the node serving as master to the asynchronous slave, otherwise, not everything will be written to the master's binary log. This is very different than an ordinary MySQL server as master, where log_slave_updates
will do nothing at all (unless the master is actually a slave to another master).
If that's not it, remove replicate_do_db
and binlog_do_db
and all of their related options from your configuration and then remove them from your brain. They should never be added unless you know exactly how they work, in your sleep. The simplest and by far most reliable replication configuration is, and will always be, replicate everything, which is the default.
Forget about binlog_format
on the slave. It makes absolutely no difference unless the slave, itself, has other, subtended slaves... and if the master is using ROW
format, the slave will still log in ROW
format if you do indeed have it configured with subtended slaves. Also, the slave's binlogs (not to be confused with the relay logs) will not log statements received from an upstream master unless log_slave_updates
is enabled on the slave.
The same thing goes for innodb_flush_log_at_trx_commit
. It does not impact actual replication. It's a setting the determines a tradeoff between ACID compliance and performance.
Best Answer
In the latest Percona XtraDB cluster version it is possible to make WAN segments so that there is more efficient communication between nodes. If you didn't do that, all nodes would try to replicate to all nodes, which is highly inefficient for Galera. With Galera 3/PXC 5.6, you can establish those so that only one node at a time will replicate to the other segments. This was specifically thought for geographical redundancy.
Before 5.6:
After 5.6 (Galera 3):
In any case, think that the round-trip time will limit the minimum latency for each transaction, which means that using geographical replication will make the cluster much slower.
An alternative option that you can setup is combining Galera and standard replication, like this:
Standard MySQL replication is way more efficient, as it does not wait for the "slaves" to confirm the changes (it does not use group communication). The disadvantage is clear: being asynchronous it does not guarantee that absolutely all changes applied to the primary cluster has been transmitted to the slave node or nodes. If the replications is well managed that should only mean less than 1 second of transactions, but this is certainly a no-go for banking applications or other applications where loss of information is impossible even if the whole datacenter fails. Whoever, that is something that most web applications and services can handle. On the negative aspect, standard replication has some disadvantages, specially if it is not well managed, like potential data drifts or extra latency due to it being single-threaded.