Mysql – Strategies for rotating backend production MySQL data into a test environment

MySQL

I've been assigned a task where I need to replicate a web application for creating a test environment, and part of the project is to create a test database with sample production data to help testing. I'm no database expert, and I haven't been able to find any solid strategies online for rotating data. All I know for sure is that I would need to spin up another MySQL instance on our DB server and connect the new instance to that, but how do I go about getting actual data out of our production DB without harming production data? Are there any well-known strategies I can deploy?

Best Answer

Plan A: Have the Production system be a Master; have another machine (or two) be a Slave. At any moment, you could use the Slave (in a couple of different ways) to provide a snapshot of Production at some point in time. The impact on the Master (Production) is zero (after a somewhat painful setup). Master-Slave is the beginnings of any HA (High Availability) solution.

Plan B: Use LVM (Logical Volume Management, an OS tool) as a way of taking a snapshot of the Production machine. This involves taking down the server briefly (about a minute, regardless of amount of data). Them the disk can be copied to elsewhere for whatever use -- such as building a dev box.

Neither of those do what I think of as a "rotate", but perhaps they solve your request?

I suggest you search for either or both of those strategies, then come back for further questions, if needed.

Related Solutions

Mysql – Recommended MySQL statistics for production environment

Monitor everything that you can as frequently as you can. I highly recommend Graphite w/ statsd as a central location to collect all your metrics. It provides a very simple plaintext protocol that makes it trivial to log nearly any metric data and a UI that makes it incredibly easy to compare one metric against another. On my systems, I collect a ton of information and most of it has proved invaluable at some point or another. Here are a few of them:

I wrote a daemon called mysampler that send the output of SHOW GLOBAL STATUS to graphite (or csv, if you want) at regular intervals. We log this at 5s intervals, but there are times that I wish we had it set to 1s intervals. You start to see some very interesting patterns at that level of granularity. It's aware of what stats are counters and which are absolute values (Questions is a counter, Threads_running is an absolute value) and will output the deltas for the counters.

ab-tblstats2g runs from cron every night and sends table size statistics to graphite so we can track table growth. I plan to extend it to include the maximum primary key value and the number of rows (from table statistics) in the near future. It also works with MSSQL Server.

mysql_logger logs the output of SHOW FULL PROCESSLIST to syslog every X interval of time. It makes it trivial to find out exactly what was running concurrently when something goes weird (table locks, long running queries, etc). We dump that data into Splunk for easy searching, but I still sometimes just use grep in the syslog logs.

pt-stalk from the Percona Toolkit is great for "what just happened?" scenarios. It watches server status variables to exceed a certain value (Threads_connected > 25 by default, but Threads_running is usually a more valuable metric, in my experience) and when triggered, collects a bunch of data about MySQL and the system which can be reviewed with pt-sift or by just reviewing the generated files. It will even generate tcpdumps, gdb, oprofile, and strace traces.

That's basically what we monitor, which differs from alerting. For alerting, I suggest you alert on a very small number of metrics. You can cover 90% of cases by simply choosing a workload-representative query and setting a threshold on how long it takes to return. If it exceeds that threshold, alert...there is a problem. Otherwise, you're ok. No need to check "is the process running," or anything like that. Other things to look for are entries in the MySQL error log, approaching too many connections, and the how well replication is functioning (slave lag, slave running, tables in sync). Hit ratios are completely useless for alerting purposes - all that matters is that queries are returning within some period of time.

For further reading, the white paper Preventing MySQL Emergencies by the Percona folks is a good read that goes into great detail on what to monitor and alert on. Percona has also released a set of Nagios Plugins (which should work with Zabbix, I believe) that you can use.

Mysql – Automatic MySQL Sync from master, where Slave was offline for long period of time

The only DB I could think of would be Percona XtraDB Cluster (PXC)

It has two methods for copying an entire Cluster Node

SST (State Snapshot Transfer) is the full copy of data from one node to another. It’s used when a new node joins the cluster, it has to transfer data from existing node. There are three methods of SST available in Percona XtraDB Cluster: mysqldump, rsync and xtrabackup (Percona XtraBackup with support of XtraDB Cluster will be released soon, currently you need to use our source code repository). The downside of mysqldump and rsync is that your cluster becomes READ-ONLY while data is being copied from one node to another (SST applies FLUSH TABLES WITH READ LOCK command). Xtrabackup SST does not require READ LOCK for the entire syncing process, only for syncing .frm files (the same as with regular backup).
IST (Incremental State Transfer) : Functionality which instead of whole state snapshot can catch up with te group by receiving the missing writesets, but only if the writeset is still in the donor’s writeset cache.

I addressed this before

Oct 25, 2012 : mysql read slave to fetch entire database on start
Oct 03, 2012 : Does this make my XtraDB Cluster hang?

CAVEAT

This would be the ideal solution if the entire database is InnoDB. If you have read-only MyISAM tables, that would be OK provided you copy the MyISAM to every node manually. SST will do a full copy of MyISAM tables, but the MultiMaster Replication feature of PXC will not replication MyISAM data.

As for PXC, I rigorously tested it in Amazon EC2 and it works like a dream. PXC works as advertised.

UPDATE 2013-02-06 16:40 EDT

drogart made the following comment:

I don't think having one node of your production cluster be a dev/testing environment is a good idea. That would mix dev write traffic back into the prod db. It would also create a situation where devs could impact the prod db cluster by spinning up a new node.

Here has a very valid point. In light of this, be very carful and sparing as to what developers can do this.

Nevertheless, should you ever need a full copy of production in a dev server, offpeak:

STEP01) Setup new PXC server
STEP02) Configure wsrep options
STEP03) Start MySQL (SST should kick off)
STEP04) Shutdown MySQL
STEP05) Comment out wsrep options
STEP06) Restart MySQL (Full of Prod Data with no connectivity to the Real Cluster)

Once this dev server is set up, do the following to update it:

STEP01) Shutdown MySQL
STEP02) Uncomment wsrep options
STEP03) rm -f /var/lib/mysql/galera.cache
STEP04) rm -f /var/lib/mysql/grastate.dat
STEP05) Start MySQL (SST should kick off)
STEP06) Comment out wsrep options
STEP07) Restart MySQL (Full of Prod Data with no connectivity to the Real Cluster)

CAVEAT : Do this copy of Prod to Dev during offpeak hours !!!