The binlog doesn't contain different structures when the storage engine is XtraDB vs InnoDB.
In the sense of interoperability, XtraDB is not an "extension" of InnoDB -- it's a fully-compatible replacement, that handles some internal operations differently but is still very much "InnoDB" wherever it is exposed outside the server core.
For what it's worth, I've taken "datadir" -- including ibdata1 and the ibd files from a stopped MySQL server, put them on a MariaDB server (which also uses XtraDB instead if InnoDB) and had zero compatibility issues.
The only caveat that comes to mind is the same as any MySQL async replication setup: the slave must be running the same or newer version of MySQL as its master (which, by the way, can be any one of thr PXC machines).
The reason for this "slave version the same or newer" rule is that the binlog format is forward-extensible only: when new capabilities are added to the binlog format, a newer master cam break an older slave, since the slave may not always understand what it is reading from the binlog, and replication will hit a hard stop (not a soft fail). A newer slave should never fail to understand what an older master writes to its binlog, since, officially, replication is supported across major releases (e.g., 5.1 to 5.5) but in practice, 5.1 to 5.6 also seems completely compatible.
The parallel minor version or newer, of MySQL Server, MariaDB, or -- seemingly the most foolproof -- Percona Server... should work fine as an async slave.
If you start out the slave with Identical data, and you replicate every table and schema, missing transactions should never happen. Don't even try to filter what replicates and try to replicate a subserlt of your schemata/tables unless and until you really understand the issues this can encounter. Replicating everything is the default behavior.
Yes... it is very much worth it to set up async slaves for reporting. The cluster members should already have the config they need for this to work (othet than the user account the slave will use to connect).
In the latest Percona XtraDB cluster version it is possible to make WAN segments so that there is more efficient communication between nodes. If you didn't do that, all nodes would try to replicate to all nodes, which is highly inefficient for Galera. With Galera 3/PXC 5.6, you can establish those so that only one node at a time will replicate to the other segments. This was specifically thought for geographical redundancy.
Before 5.6:
---------------
/ / \ \
A --- B --- D --- E (A, B, C Local nodes; D,E Remote nodes; all talk to all)
\ / / /
C ------------
After 5.6 (Galera 3):
[ A --- B ] --- [ D --- E ] (no redundant communication between segments)
[ \ / ]
[ C ]
In any case, think that the round-trip time will limit the minimum latency for each transaction, which means that using geographical replication will make the cluster much slower.
An alternative option that you can setup is combining Galera and standard replication, like this:
A --- B ~~~ D --- E (--- : Galera replication; ~~~: MySQL replication )
\ /
C
Standard MySQL replication is way more efficient, as it does not wait for the "slaves" to confirm the changes (it does not use group communication). The disadvantage is clear: being asynchronous it does not guarantee that absolutely all changes applied to the primary cluster has been transmitted to the slave node or nodes. If the replications is well managed that should only mean less than 1 second of transactions, but this is certainly a no-go for banking applications or other applications where loss of information is impossible even if the whole datacenter fails. Whoever, that is something that most web applications and services can handle. On the negative aspect, standard replication has some disadvantages, specially if it is not well managed, like potential data drifts or extra latency due to it being single-threaded.
Best Answer
I'm not sure whether you're saying the binlog actually doesn't include them, and you have confirmed this with
mysqlbinlog
, or that it "seems" like it doesn't, because they don't replicate.PXC needs
log_slave_updates
turned on at the node serving as master to the asynchronous slave, otherwise, not everything will be written to the master's binary log. This is very different than an ordinary MySQL server as master, wherelog_slave_updates
will do nothing at all (unless the master is actually a slave to another master).If that's not it, remove
replicate_do_db
andbinlog_do_db
and all of their related options from your configuration and then remove them from your brain. They should never be added unless you know exactly how they work, in your sleep. The simplest and by far most reliable replication configuration is, and will always be, replicate everything, which is the default.Forget about
binlog_format
on the slave. It makes absolutely no difference unless the slave, itself, has other, subtended slaves... and if the master is usingROW
format, the slave will still log inROW
format if you do indeed have it configured with subtended slaves. Also, the slave's binlogs (not to be confused with the relay logs) will not log statements received from an upstream master unlesslog_slave_updates
is enabled on the slave.The same thing goes for
innodb_flush_log_at_trx_commit
. It does not impact actual replication. It's a setting the determines a tradeoff between ACID compliance and performance.