There's a stack of system related questions here. We don't know what the hardware looks like for your master or slave. Maybe you're lacking RAM on the slave, or any number of other differences.
The master is probably receiving bursts of updates in parallel from many client connections, and for whatever reason, the slave can't keep up since all replication is done in serial.
If you have a lot of InnoDB activity on the master, you can probably disable InnoDB on the slave and gain some speed that way. That option and a few others are given here:
16.4.4.7: How can I use replication to improve performance of my system?
http://dev.mysql.com/doc/refman/5.0/en/replication-faq.html#qandaitem-16-4-4-1-7
Or there could be configuration tweaks in my.cnf
that could help you. I'd start by looking at your memory usage on the slave, and looking up buffer and memory related options for innodb and/or myisam depending on which of those engines you're using (or others).
You can also look through your binlogs with the mysqlbinlog
tool to see what database and tables have the most activity. Then start working with your application developers and start chipping away at the problem.
Also check your mysqld error log on the slave. There might be clues in there.
Since you have Master/Slave, you may want to implement the use of two things
- Circular Replication
- Database Virtual IP (aka DBVIP)
Circular Replication
Circular Replication is nothing more than first setting up Master/Slave then performing the same steps using the Slave as the Master's Master and the Master as the Slave's Slave. It just entails
- explicitly user a different server_id on each DB Server
- enabling binary logging on both DB servers
- making sure the replication user is defined on both DB servers
Database Virtual IP (aka DBVIP)
There are products you can download and install to setup a DBVIP. One such product I use is ucarp
. Another product is Linux Heartbeat. I normally do not use such things with MySQL Circular Replication or Master/Slave. Why?
Since those products can perform automatic failover, you do not want to do that in the event a Slave is some number of seconds behind in replication lag.
You should perform manual failovers.
Here is a poor man's approach to implementing DBVIP management.
Suppose you have this setup
- DB Server1 has IP 10.1.2.30
- DB Server2 has IP 10.1.2.40
- You want to use DBVIP 10.1.2.50
Create the Script called /usr/local/sbin/MyAppDBVIP like this
echo echo 10.1.2.50 > /usr/local/sbin/MyAppDBVIP
Create the Script called /usr/local/sbin/dbvip-up
DBVIP=`/usr/local/sbin/MyAppDBVIP`
ip addr add ${DBVIP}/24 dev eth1
Create the Script called /usr/local/sbin/dbvip-down
DBVIP=`/usr/local/sbin/MyAppDBVIP`
ip addr del ${DBVIP}/24 dev eth1
Make sure all scripts are executable
chmod +x /usr/local/sbin/MyAppDBVIP
chmod +x /usr/local/sbin/dbvip-up
chmod +x /usr/local/sbin/dbvip-down
Make sure these script exist on both DB Servers
Simply run dbvip on whichever server you choose. .
So the failover process and protocol are the following:
- Run
dbvip-down
on the DB Server that has the DBVIP. If you cannot
- Run
dbvip-up
on the DB Server that you want to have the DBVIP
- Just remember you should not run
dbvip-up
on both machines
- After running
dbvip-up
, restart apache, JBoss, or any other app server contacting MySQL via the old Master
Best Answer
There are two problems here, which must be solved independently.
Creating a slave
With that size, 100GB, mysqldump is usually to slow too be done efficiently. Try using a binary backup. You have several options: @paul is telling you one, but it has the inconvenient that the master will be locked for the duration of the copy process. Additionally, rsync can be very efficient if you have multiple small files, but it may not be for larger file sizes changed randomly throughout it (if you have a large ibdata1).
My recommendations would be snapshoting (if you are using virtual machines or a filesystem that allow it: ZFS, any other on top of LVM, etc.) or Percona Xtrabackup/Oracle Enterprise Backup. These options will make the backup process almost as fast as copying files from the filesystem without almost no lock at all. Some of them also allow for parallel copy, if your bandwidth allows it.
If none of these work for you, try using a logic parallel backup/restore utility like mydumper.
Replication with increasing lag
You must discover why that is happening first, (profile your queries on both servers) but these are some of the most common causes:
binlog_format = ROW
. It can augment your bandwidth usage but reduce the slave's load.innodb_flush_log_at_trx_commit
, for example).