How to Recover a Single MySQL Database on a Busy Master-Slave System

innodbmyisamMySQLmysqldumpreplication

Looking for a strategy or tool to deal with recovering a single database to a point-in-time in a busy replicated system.

I have 12 databases running on 2 MySQL 5.0.77 servers in master-slave replicated configuration. A full dump is taken of the read-only slave daily, and there are incremental SQL dumps available, with these backups off-site and the replication status is monitored.

Edit: Tables are a mixtures of InnoDB and myISAM, hence engine specific solutions are not available.

So given a complete failure of the master server, I can break the replication and promote the slave server, I also have the option of rebuilding a new server and configuring from the offside FULL backup, and then apply the differentials taken hourly from the slave.

However I am concerned how to deal with partial failure, or failure of a single database. I can think of 2 scenarios that are quite likely;

database 7 (for example) becomes corrupted, continues to serve some requests until someone notices that it is broken, or alerts from the log files…
Some query like drop database, drop table, "update where…" type query borks a single database, or some subset therein.

At the moment I have a bunch of FULL dumps as FULL-$DATE-all-databases.sql.gz files , and differentials that can be applied to the FULL dumps as DIFF-$DATE-all-databases.sql.gz

To restore database 7 to some point-in-time would require a grep through the FULL and DIFF files, and manual application of that sql.

How should I proceed in order to make it possible to be able to recover to one of the previous DIFF dumps to the master database?

DO I need to backup to individual database files, ie

mysqldump --databases "database1" | gzip > database1.sql.gz
mysqldump --databases "database2" | gzip > database2.sql.gz
mysqldump --databases "database3" | gzip > database3.sql.gz

rather than..

mysqldump --master-data --lock--all-databases --all-databases | gzip > all-databases.sql.gz

If I go for separate mysqldump files, what happens to the master data binary log, and should I even be setting –master-data for the master server recovery dumps?

Best Answer

If all your database use InnoDB only, I have some good news.

You should be to dump all the database in parallel from a slave.

In fact, you can force all the databases into the same point-in-time.

First thing to remember about a the Slave is that it is not not required to have binary logging enabled if it is not a Master for other Slaves.

You cannot use --master-data option for parallel dumps because each dump will have a different position written at line 22 of each dump file. It is better to record the Master's last log file and position the Slave executed using SHOW SLAVE STATUS\G. That way, all the databases have the same point-in-time position.

You can collect all databases and script the parallel dump of all the database.

DBLIST=/tmp/ListOfDatabasesToParallelDump.txt
BACKUP_BASE=/backups
BACKUP_DATE=`date +"%Y%m%d_%H%M%S"`
BACKUP_HOME=${BACKUP_BASE}/${BACKUP_DATE}
mkdir ${BACKUP_HOME}
cd ${BACKUP_HOME}

mysql -h... -u... -p... -e"STOP SLAVE;"
mysql -h... -u... -p... -e"SHOW SLAVE STATUS\G" > ${SSS}
LOGFIL=`cat ${SSS} | grep "Relay_Master_Log_File" | awk '{print $2}'`
LOGPOS=`cat ${SSS} | grep "Exec_Master_Log_Pos"   | awk '{print $2}'`
echo "Master was at ${LOGFIL} Position ${LOGPOS} for this Backup" > Master_Log_FilePos.txt

mysql -h... -u... -p... -AN -e"SELECT schema_name FROM information_schema.schemata WHERE schema_name NOT IN ('information_schema','mysql','performance_schema')" > ${DBLIST}

for DB in `cat ${DBLIST}` 
do 
    mysqldump -h... -u... -p... --hex-blob --routines --triggers ${DB} | gzip > ${DB}.sql.gz & 
done 
wait 

mysql -h... -u... -p... -e"START SLAVE;"

If there are simply too many databases, dump them 10 or 20 at a time as follows:

DBLIST=/tmp/ListOfDatabasesToParallelDump.txt
SSS=/tmp/ShowSlaveStatusDisplay.txt
BACKUP_BASE=/backups
BACKUP_DATE=`date +"%Y%m%d_%H%M%S"`
BACKUP_HOME=${BACKUP_BASE}/${BACKUP_DATE}
mkdir ${BACKUP_HOME}
cd ${BACKUP_HOME}

mysql -h... -u... -p... -e"STOP SLAVE;"
mysql -h... -u... -p... -e"SHOW SLAVE STATUS\G" > ${SSS}
LOGFIL=`cat ${SSS} | grep "Relay_Master_Log_File" | awk '{print $2}'`
LOGPOS=`cat ${SSS} | grep "Exec_Master_Log_Pos"   | awk '{print $2}'`
echo "Master was at ${LOGFIL} Position ${LOGPOS} for this Backup" > Master_Log_FilePos.txt

mysql -h... -u... -p... -AN -e"SELECT schema_name FROM information_schema.schemata WHERE schema_name NOT IN ('information_schema','mysql','performance_schema')" > ${DBLIST}

COMMIT_LIMIT=20
COMMIT_COUNT=0    
for DB in `cat ${DBLIST}` 
do 
    mysqldump -h... -u... -p... --hex-blob --routines --triggers ${DB} | gzip > ${DB}.sql.gz & 
    (( COMMIT_COUNT++ ))
    if [ ${COMMIT_COUNT} -eq ${COMMIT_LIMIT} ]
    then
        COMMIT_COUNT=0
        wait
    fi
done 
wait 
if [ ${COMMIT_COUNT} -gt 0 ]
then
    wait
fi

mysql -h... -u... -p... -e"START SLAVE;"

If you need to recover a single table, you can parallel dump tables 20 at a time in size order.

Try this:

TBLIST=/tmp/ListOfTablesToParallelDump.txt
SSS=/tmp/ShowSlaveStatusDisplay.txt
BACKUP_BASE=/backups
BACKUP_DATE=`date +"%Y%m%d_%H%M%S"`
BACKUP_HOME=${BACKUP_BASE}/${BACKUP_DATE}
mkdir ${BACKUP_HOME}
cd ${BACKUP_HOME}

mysql -h... -u... -p... -e"STOP SLAVE;"
mysql -h... -u... -p... -e"SHOW SLAVE STATUS\G" > ${SSS}
LOGFIL=`cat ${SSS} | grep "Relay_Master_Log_File" | awk '{print $2}'`
LOGPOS=`cat ${SSS} | grep "Exec_Master_Log_Pos"   | awk '{print $2}'`
echo "Master was at ${LOGFIL} Position ${LOGPOS} for this Backup" > Master_Log_FilePos.txt

mysql -h... -u... -p... -AN -e"SELECT CONCAT(table_schema,'.',table_name) FROM information_schema.tables WHERE table_schema NOT IN ('information_schema','mysql','performance_schema') ORDER BY data_length" > ${DBLIST}

COMMIT_LIMIT=20
COMMIT_COUNT=0    
for DBTB in `cat ${TBLIST}` 
do
    DB=`echo "${DBTB}" | sed 's/\./ /g' | awk '{print $1}'`
    TB=`echo "${DBTB}" | sed 's/\./ /g' | awk '{print $2}'`
    DUMPFILE=$DB-{DB}-TBL-${TB}.sql.gz
    mysqldump -h... -u... -p... --hex-blob --routines --triggers ${DB} ${TB} | gzip >  ${DUMPFILE} & 
    (( COMMIT_COUNT++ ))
    if [ ${COMMIT_COUNT} -eq ${COMMIT_LIMIT} ]
    then
        COMMIT_COUNT=0
        wait
    fi
done 
wait 
if [ ${COMMIT_COUNT} -gt 0 ]
then
    wait
fi

mysql -h... -u... -p... -e"START SLAVE;"

Now that you have scripts to dump databases or individual tables, you can load that data at your discretion. If you need to get SQL executed from the binary logs on the master, you can use mysqlbinlog and give it the position ot datetime and output the SQL to other text files. You just have to perform due diligence to find the amount of data you need from whatever timestamps the bnary logs have. Just remember that every binary log's timestamp in the OS represents that last time it was written.

Related Solutions

MySQL replication: “Houston, We’ve Got a Problem”

If you really don't care about that table, you can use pt-slave-restart on the slave and have it skip those problems. I would be conservative about running it and make sure that you are only skipping queries for the table/database that you don't care about or at least for only a specific error.

You didn't post what the error code was in the output from SHOW SLAVE STATUS, but I suspect it is error 1146.

For example, this will skip all errors for 1146:

pt-slave-restart -u root -p pass --error-numbers 1146

Or, you could try skipping all errors that reference that table

pt-slave-restart -u root -p pass --error-text 'database2'

Another way to do this would be to set replicate-ignore-db=database2 and restart MySQL on the slave, but there are some caveats to how that works that you should read about in the documentation

MySQL Master/Slave Replication – Structure Not Replicating

Using replicate-wild-do-table=% is not the correct way to get complete replication. That would seem sensible, but here's why it isn't:

The replicate-* options are restrictive by their presence. The *-do-* options seem to be telling the server what to "do," but in fact they are telling the server what to only do.
The complete absence of any replicate-* configuration variables means "replicate everything."

In the simplest case, when there are no --replicate-* options, the slave executes all statements that it receives from the master. Otherwise, the result depends on the particular options given. -- http://dev.mysql.com/doc/refman/5.5/en/replication-rules.html

That, I think, is the point you are needing. It holds true for all MySQL 5.x.

Once you enable binary logging, set the server-id values on each machine, synchronize the data sets, CHANGE MASTER TO ..., and then START SLAVE, then you should have a working configuration where all DML and DDL will be replicated and your servers will be identical replicas of each other.

If you started out with identical data sets, everything should behave exactly as you would expect it to.

What MySQL replication does best and simplest is replicate entire data sets among servers without any restrictions. Trying to restrict replication to a subset of the data is a process that should carry a warning label that ends with the phrase "...unless you really know what you're doing."

When you use the --replicate-* options, the document cited above also offers this tip:

it is recommended that you avoid mixing “do” and “ignore” options, or wildcard and nonwildcard options

You are using a mix of these, which adds complexity and may explain why your DDL didn't replicate as you expected it to.

Best Answer

Related Solutions

MySQL replication: “Houston, We’ve Got a Problem”

MySQL Master/Slave Replication – Structure Not Replicating

Related Question