Thesql second slave not syncing while first slave works fine

MySQLreplication

I have a master (m) – slave (s1) setup using mysql 5.1.45

When I try to add a second slave (s2) the slave lags behind and never catches up on the sync.

Even after having synced the s2 with the whole system offline and there were (Seconds_Behind_Master = 0) after a few hours the s2 gets out of sync.

Strange is that s1 is always on sync.

any ideas?

SHOW SLAVE STATUS \G  (on slave2)
*************************** 1. row ***************************
           Slave_IO_State: Waiting for master to send event
              Master_Host: xxx.xxx.xxx.xxx
              Master_User: xxxx_xxxx5
              Master_Port: 3306
            Connect_Retry: 60
          Master_Log_File: mysql-bin.013165
      Read_Master_Log_Pos: 208002803
           Relay_Log_File: xxxxxxxxxx-relay-bin.000100
            Relay_Log_Pos: 1052731555
    Relay_Master_Log_File: mysql-bin.013124
         Slave_IO_Running: Yes
        Slave_SQL_Running: Yes
          Replicate_Do_DB: xxxxxxxxx
      Replicate_Ignore_DB:
       Replicate_Do_Table:
   Replicate_Ignore_Table:
  Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
               Last_Errno: 0
               Last_Error:
             Skip_Counter: 0
      Exec_Master_Log_Pos: 1052731410
          Relay_Log_Space: 44233859505
          Until_Condition: None
           Until_Log_File:
            Until_Log_Pos: 0
       Master_SSL_Allowed: No
       Master_SSL_CA_File:
       Master_SSL_CA_Path:
          Master_SSL_Cert:
        Master_SSL_Cipher:
           Master_SSL_Key:
    Seconds_Behind_Master: 69594
Master_SSL_Verify_Server_Cert: No
            Last_IO_Errno: 0
            Last_IO_Error:
           Last_SQL_Errno: 0
           Last_SQL_Error:

iperf results between servers:

M -> s2
[ ID] Interval       Transfer     Bandwidth
[  5]  0.0-10.0 sec    502 MBytes    420 Mbits/sec
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  1.05 GBytes    902 Mbits/sec

M -> s1
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec    637 MBytes    534 Mbits/sec
[ ID] Interval       Transfer     Bandwidth
[  5]  0.0-10.0 sec    925 MBytes    775 Mbits/sec

vmstat for s2

 vmstat
 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ 
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0    268 126568 199100 22692944    0    0   100   836    8   81  1  0 96  3

vmstat 2 10
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
0  0    268 1150144 197128 21670808    0    0   100   835    9   81  1  0 96  3  0
0  0    268 1144464 197160 21674940    0    0   644  3096 1328 1602  0  0 97  2  0
0  2    268 1140680 197176 21679624    0    0   846  5362 1002 1567  0  0 98  2  0
0  1    268 1135332 197192 21685040    0    0   960  3348  850 1193  0  0 98  1  0
0  0    268 1130776 197204 21688752    0    0   576  2894  978 1232  0  0 98  2  0
0  0    268 1127060 197264 21693556    0    0   586  5202 1075 1505  0  0 97  3  0
0  0    268 1122184 197272 21698412    0    0   896  1160  614  727  0  0 98  1  0
0  0    268 1118532 197300 21702780    0    0   586  5070 1279 1708  0  0 93  6  0
0  0    268 1114000 197324 21705820    0    0   402  1522  947  942  0  0 95  4  0
0  0    268 1109708 197336 21710188    0    0   704  9150 1224 2109  0  0 97  2  0

top output on s2

top - 14:44:25 up 16:36,  1 user,  load average: 1.62, 1.47, 1.42
Tasks: 140 total,   1 running, 139 sleeping,   0 stopped,   0 zombie
Cpu0  :  2.9%us,  1.1%sy,  0.0%ni, 73.8%id, 21.8%wa,  0.0%hi,  0.4%si,  0.0%st
Cpu1  :  0.8%us,  0.3%sy,  0.0%ni, 95.5%id,  3.3%wa,  0.0%hi,  0.0%si,      0.0%st
Cpu2  :  0.6%us,  0.3%sy,  0.0%ni, 97.7%id,  1.4%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :  0.5%us,  0.2%sy,  0.0%ni, 98.9%id,  0.4%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  :  0.0%us,  0.0%sy,  0.0%ni, 99.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  :  0.0%us,  0.0%sy,  0.0%ni, 99.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  :  0.0%us,  0.0%sy,  0.0%ni, 99.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu7  :  0.0%us,  0.0%sy,  0.0%ni, 99.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  24744184k total, 24005508k used,   738676k free,   199136k buffers
Swap:  1050616k total,      268k used,  1050348k free, 22078920k cached

Any ideas?

Is there any chance that the Mysql version is the culprit of all this in conjuction with the nearly 5 fold increase in traffic to the master ?

If that is the case then why s1 syncs and not s2?

Any ideas if 5.6.x solves similar probs ?

Best Answer

The answer to this is very straightforward. The two slaves must have the same server_id. I wrote about this 2 years ago (Screwed up replication by sharing server ids). In that post, I quoted Baron Schwartz's blog Pop quiz: how can one slave break another slave.

The quick-and-dirty solution ? Change the second slave's server_id. For example, if the master's server_id is 1000 and first slave's server_id is 1001, go to the second slave and run the following:

mysql> SET GLOBAL server_id = 1002;

This will fix it right then and there.

Then, go to the second slave and change the server_id in the my.cnf

[mysqld]
server_id = 1002

Give it a Try !!!