If all your database use InnoDB only, I have some good news.
You should be to dump all the database in parallel from a slave.
In fact, you can force all the databases into the same point-in-time.
First thing to remember about a the Slave is that it is not not required to have binary logging enabled if it is not a Master for other Slaves.
You cannot use --master-data
option for parallel dumps because each dump will have a different position written at line 22 of each dump file. It is better to record the Master's last log file and position the Slave executed using SHOW SLAVE STATUS\G
. That way, all the databases have the same point-in-time position.
You can collect all databases and script the parallel dump of all the database.
DBLIST=/tmp/ListOfDatabasesToParallelDump.txt
BACKUP_BASE=/backups
BACKUP_DATE=`date +"%Y%m%d_%H%M%S"`
BACKUP_HOME=${BACKUP_BASE}/${BACKUP_DATE}
mkdir ${BACKUP_HOME}
cd ${BACKUP_HOME}
mysql -h... -u... -p... -e"STOP SLAVE;"
mysql -h... -u... -p... -e"SHOW SLAVE STATUS\G" > ${SSS}
LOGFIL=`cat ${SSS} | grep "Relay_Master_Log_File" | awk '{print $2}'`
LOGPOS=`cat ${SSS} | grep "Exec_Master_Log_Pos" | awk '{print $2}'`
echo "Master was at ${LOGFIL} Position ${LOGPOS} for this Backup" > Master_Log_FilePos.txt
mysql -h... -u... -p... -AN -e"SELECT schema_name FROM information_schema.schemata WHERE schema_name NOT IN ('information_schema','mysql','performance_schema')" > ${DBLIST}
for DB in `cat ${DBLIST}`
do
mysqldump -h... -u... -p... --hex-blob --routines --triggers ${DB} | gzip > ${DB}.sql.gz &
done
wait
mysql -h... -u... -p... -e"START SLAVE;"
If there are simply too many databases, dump them 10 or 20 at a time as follows:
DBLIST=/tmp/ListOfDatabasesToParallelDump.txt
SSS=/tmp/ShowSlaveStatusDisplay.txt
BACKUP_BASE=/backups
BACKUP_DATE=`date +"%Y%m%d_%H%M%S"`
BACKUP_HOME=${BACKUP_BASE}/${BACKUP_DATE}
mkdir ${BACKUP_HOME}
cd ${BACKUP_HOME}
mysql -h... -u... -p... -e"STOP SLAVE;"
mysql -h... -u... -p... -e"SHOW SLAVE STATUS\G" > ${SSS}
LOGFIL=`cat ${SSS} | grep "Relay_Master_Log_File" | awk '{print $2}'`
LOGPOS=`cat ${SSS} | grep "Exec_Master_Log_Pos" | awk '{print $2}'`
echo "Master was at ${LOGFIL} Position ${LOGPOS} for this Backup" > Master_Log_FilePos.txt
mysql -h... -u... -p... -AN -e"SELECT schema_name FROM information_schema.schemata WHERE schema_name NOT IN ('information_schema','mysql','performance_schema')" > ${DBLIST}
COMMIT_LIMIT=20
COMMIT_COUNT=0
for DB in `cat ${DBLIST}`
do
mysqldump -h... -u... -p... --hex-blob --routines --triggers ${DB} | gzip > ${DB}.sql.gz &
(( COMMIT_COUNT++ ))
if [ ${COMMIT_COUNT} -eq ${COMMIT_LIMIT} ]
then
COMMIT_COUNT=0
wait
fi
done
wait
if [ ${COMMIT_COUNT} -gt 0 ]
then
wait
fi
mysql -h... -u... -p... -e"START SLAVE;"
If you need to recover a single table, you can parallel dump tables 20 at a time in size order.
Try this:
TBLIST=/tmp/ListOfTablesToParallelDump.txt
SSS=/tmp/ShowSlaveStatusDisplay.txt
BACKUP_BASE=/backups
BACKUP_DATE=`date +"%Y%m%d_%H%M%S"`
BACKUP_HOME=${BACKUP_BASE}/${BACKUP_DATE}
mkdir ${BACKUP_HOME}
cd ${BACKUP_HOME}
mysql -h... -u... -p... -e"STOP SLAVE;"
mysql -h... -u... -p... -e"SHOW SLAVE STATUS\G" > ${SSS}
LOGFIL=`cat ${SSS} | grep "Relay_Master_Log_File" | awk '{print $2}'`
LOGPOS=`cat ${SSS} | grep "Exec_Master_Log_Pos" | awk '{print $2}'`
echo "Master was at ${LOGFIL} Position ${LOGPOS} for this Backup" > Master_Log_FilePos.txt
mysql -h... -u... -p... -AN -e"SELECT CONCAT(table_schema,'.',table_name) FROM information_schema.tables WHERE table_schema NOT IN ('information_schema','mysql','performance_schema') ORDER BY data_length" > ${DBLIST}
COMMIT_LIMIT=20
COMMIT_COUNT=0
for DBTB in `cat ${TBLIST}`
do
DB=`echo "${DBTB}" | sed 's/\./ /g' | awk '{print $1}'`
TB=`echo "${DBTB}" | sed 's/\./ /g' | awk '{print $2}'`
DUMPFILE=$DB-{DB}-TBL-${TB}.sql.gz
mysqldump -h... -u... -p... --hex-blob --routines --triggers ${DB} ${TB} | gzip > ${DUMPFILE} &
(( COMMIT_COUNT++ ))
if [ ${COMMIT_COUNT} -eq ${COMMIT_LIMIT} ]
then
COMMIT_COUNT=0
wait
fi
done
wait
if [ ${COMMIT_COUNT} -gt 0 ]
then
wait
fi
mysql -h... -u... -p... -e"START SLAVE;"
Now that you have scripts to dump databases or individual tables, you can load that data at your discretion. If you need to get SQL executed from the binary logs on the master, you can use mysqlbinlog
and give it the position ot datetime and output the SQL to other text files. You just have to perform due diligence to find the amount of data you need from whatever timestamps the bnary logs have. Just remember that every binary log's timestamp in the OS represents that last time it was written.
If you have three or more Slaves, I have a nice suggestion: Use one of the Slaves as a Volunteer to be cloned.
Here is a Topology
MAS --+--> SLV1
|
+--> SLV2
|
+--> SLV0
Say you want to Spawn SLV3
. You could use SLV0
as a Clone
- STEP01) Install MySQL Binaries on
SLV3
(Same version as all other Slaves)
- STEP02) On SLV0,
STOP SLAVE;
- STEP03) On SLV0,
service mysql stop
- STEP04)
scp -r SLV0:/etc/my.cnf SLV3:/etc/my.cnf
- STEP05)
scp -r SLV0:/var/lib/mysql SLV3:/var/lib/mysql
- STEP06) On SLV0,
service mysql start
- STEP07) On SLV3,
chown -R mysql:mysql /var/lib/mysql
- STEP08) On SLV3, change
server_id
in /etc/my.cnf
to be unique from all other Slaves
- STEP09) On SLV3,
service mysql start
That's it.
This is the same paradigm followed by Percona XtraDB Cluster (PXC). When it comes to PXC, introducing a New Slave in PXC is as simple as adding the MasterIP to my.cnf and starting MySQL. All of the above steps are executed internally by PXC using Quorom Selection to choose which Slave becomes the Donor (a.k.a. Volunteer to be Cloned) as well as one of three methods for copying data (xtrabackup, rsync, mysqldump) This copying method is known as SST
(State Snapshot Transfer).
If all of the Application-Level Data are stored in InnoDB only, you should look into using PXC. If you have a mix of InnoDB/MyISAM or all MyISAM, the above 9 steps are to be scripted by you.
Best Answer
It's very unlikely that running master and slave on the same machine would be beneficial, because the two instances would not actually be sharing resources in any positive or helpful way.
Running two instances with the same data means you have to divide up the available resources in some combination of implicit (cpu, disk, memory including OS cache) and explicit (buffer pool/key cache configuration) ways...
Significantly, the fact that one is a replica of the other means that database writes are contending for double resources at almost the same time: for each write done by the master, a comparable write has to be done by the slave, which means -- approximately -- twice the disk utilization, plus the fact that every binlog entry is written at least twice, first to the master binlog, then to the slave's relay log, then to the slave's binlog if enabled.
The fact that each instance now has less memory available means that less data can be cached in memory, meaning that you will further increase the need for disk access, as less cached data means more disk I/O, almost by definition.
There might be specific scenarios where splitting uo the workload like this could serve a useful purpose, but situations where such a setup would degrade performance seem to significantly outnumber any scenarios where you'd see an improvement... and any such cases would likely be very workload-specific.