We're using MySQL. We have a master that eventually will have 2 slaves. There are triggers in the db that execute when data changes in certain tables. I am wondering whether to disable the triggers in the slaves. It seems to me that I should. It also seems like things like timestamps should be disabled in the slaves because otherwise the data will be different on the slaves than the masters.
I'm not a DBA, I'm a developer. We don't have a DBA at my company so the ops admin and I are figuring it out as we go along. We already have the master and one slave set up and replicating and we are getting duplicate entry error on the slave.
We were thinking of not halting replication for duplicate replication errors, as in this post: http://www.ducea.com/2008/02/13/mysql-skip-duplicate-replication-errors/. We don't know whether this is a good idea. I do feel that this masks the problem, but neither of us knows how to fix the problem.
If we should disable the triggers and/or set the slave not to insert timestamps, how do we go about doing that? Or does establishing a db as a slave automagically do these things anyway?
Best Answer
Do not disable the triggers on your slaves, and do not configure your slaves to skip errors.
When replication is set up correctly with consistent data sets and identical configurations, including tables and their data, views, triggers, stored functions, and stored procedures, it just works. If it isn't "just working," then it was already not consistent in one of these ways. The duplicate key errors are a red flag that your data is not identical, and you don't want to suppress this.
If you aren't already, you should be using
binlog_format
=MIXED
unless you have a specific reason not to. The default value prior to MySQL 5.6 wasSTATEMENT
but this was largely for legacy reasons.Using mixed logging allows the optimizer on the master to choose the best way to log each change to the master's data -- either by:
If a statement is determined to be "unsafe" for statement-based logging (meaning that it could potentially cause the data to diverge from its initial identical state), it will be automatically logged as row-based, if you're using
MIXED
mode. Examples of this would be statements involving calls to non-deterministic functions likeUUID()
which would return different values on master and slave. Row-based logging handles this without a problem, since it passes the literal values across.You might think
NOW()
might also be such a function, and that a similar limitation might apply to automatic timestamp columns, if the server clocks are not in sync, but that's not the case -- because with each binary (replication) log entry, the master logs what its current timestamp was at the time the query was executed -- allowing the slave to have a "pseudo-system clock" to actually fudge-in the master's clock's value at the time that statement was executed when needed. This log entry, incidentally, is how the slave calculates "seconds behind master," by comparing its system clock to the logged timestamps. (The exception to this is when the slave has executed everything in its relay logs, it always shows "0" seconds behind, because it doesn't know whether the former logs were delayed by clock skew or by an actual replication lag -- so if it has run out of things to execute, it is no longer technically "behind").If a query is logged using row-based logging, then any data changed by triggers are also logged as row-based log entries, and the triggers on the slave will automatically not fire.
On the other hand, if the same query is logged using statement-based logging, then replication depends on the identical triggers being present on the slave, because the master will not log what the trigger did -- it only logs the query issued, and depends on the slave triggers firing exactly as they did on the master, assuming you want the data on all servers to be consistent.
http://dev.mysql.com/doc/refman/5.5/en/replication-features-triggers.html
The documentation often speaks of replication in what sounds like terms of either/or (statement or row) but for each query executed
MIXED
logging will choose the logging method on a per-query basis. You may also find statements to the effect that row-based logging is "required" in some conditions but this only means that it must be available (i.e., viaMIXED
); it does not mean that the server must be configured forROW
. DDL, of course, is always logged as statements, regardless of the binlog format.