MySQL Row-Based Replication with Slave Triggers

MySQLreplicationtrigger

I'm replicating a prod MySQL database to an analytics one. I need to drop several tables with sensitive data from the analytics and also redact several columns of several remaining tables, replacing NULLable fields with NULLs. How can I set up triggers on the slave to achieve this?

Best Answer

You can't.

Triggers do not fire on the slave in row-based replication, even if they are defined, because the rows in the other tables affected by the triggers on the master are replicated as row events, and if the triggers were allowed to fire on the slave, that would not work.

With row-based replication, triggers executed on the master do not execute on the slave. Instead, the row changes on the master resulting from trigger execution are replicated and applied on the slave.

http://dev.mysql.com/doc/refman/5.6/en/replication-features-triggers.html

If you have entire tables that need to be eliminated on the slave, you can alter those tables to use the BLACKHOLE storage engine, which happily accepts the insert queries from replication and then discards the data, returning no data when you SELECT from them... though this requires some attention to detail down the road since an ALTER TABLE on the master that specifies a storage engine will alter the table on the slave as well.

If you are in a position to rearrange column order on tables, MySQL supports the slave having fewer columns than the master in row-based mode, as long as the columns they do have are all identical by definition and ordinal position in the table, as I discussed here. If moving the sensitive columns to the right side of the table is practical, you can then ALTER TABLE ... DROP COLUMN ... on the slave.

But this also requires attention down the road.

Probably the simplest solution is to use permissions on views that you essentially create for the purpose of enforcing the access rights to the underlying tables. Views, by default, are executed with the permissions of the user who defined them, not the user who invokes them... so a view can allow a user to access columns in a table that aren't otherwise accessible to that user, if the table and columns are accessible to the DEFINER user.

...views can have a SQL SECURITY characteristic with a value of DEFINER or INVOKER to specify whether the object executes in definer or invoker context. If the SQL SECURITY characteristic is omitted, the default is definer context.

A stored program or view that executes in definer security context executes with the privileges of the account named by its DEFINER attribute.

http://dev.mysql.com/doc/refman/5.6/en/stored-programs-security.html

If you create a view with ALGORITHM=MERGE then queries written against that view will be handled by the server by rewriting the query internally based on the view definition, so there's no significant overhead. As long as you are simply selecting all or some of the columns from each table, and not using functions in the views that are incompatible with MERGE then this seems like the optimum solution, since the analytics user would have no access at all to the underlying tables, and not necessarily even be aware of their presence on the server if the permissions are set correctly. Technically speaking, as long as there are no conflicts in naming, you could even create these views directly on the slave, since MySQL replication has no problem with spurious structures on the slave as long as the replication SQL thread never has any reason to notice them.


If none of these are suitable alternatives, is one more alternative approach that can be used if part of the problem is that you can't even allow the sensitive data to be sent to or stored on the other server, because of its physical location or other security concerns... but it requires a third MySQL server in the middle (between existing master and slave) and is somewhat fragile because every ALTER TABLE operation occurring on the master will cause the intermediate machine's replication SQL thread to pause until you manually reconfigure it to understand the new table definition. On a stable schema, it works pretty much flawlessly (I've had one in production for over a year) but it's still a delicate setup, so I'll spare you the details on this option for the moment.