Your assumptions are incorrect.
The slave does not bind to 3306 (or whatever port it is listening to) when making its outbound connection to the master, and the port the slave is "listening to" for inbound client connections is otherwise irrelevant in replication.
The slave establishes a TCP connection from a high port to the master's port (e.g. 3306), authenticates itself, optionally does a little bit of session setup (for example, telling the master its desired heartbeat configuration), and then requests a binlog stream from the master, beginning at a specific file and byte position.
The master responds by beginning to stream binlog events beginning at that position (assuming it's valid) and continuing to autonomously send new binlog events to the slave over that same TCP connection as those events are generated on the master. The master never makes a connection back toward the slave unless both machines are slaves to each other in a circular replication configuration where each of them is the other's slave.
The single TCP connection, originated by the slave, is the pipeline for transferring binlog events and the master will continue streaming events as long as it thinks this TCP connection is established. If the connection is severed for any reason, the slave is responsible for reconnecting to the master, with the same process as before, and requesting a new start point for the binlog stream, based on the last valid event it received before it lost connectivity.
Ever server has a server_id
global variable. If a master accepts a connection from a slave that identifies itself with the same server_id as one that is already connected, the master will forcibly close those already-existing connections, preventing the possibility of the master wasting resources streaming traffic into a black hole because of network issues that make the master think the former connection (which should have been from the same slave) is still established.
So your single access list entry would be "TCP, from slave IP, high random port -- to master IP, port 3306" (assuming the master is listening on 3306).
Additional details:
The replication io_thread
is the thread in MySQL that creates the outbound TCP connection to the master, receives the replication events over that connection, and writes them to the "relay log" (file) on the slave. It doesn't "process" the replication events, it stores them for execution by the sql_thread
.
The sql_thread
on the slave then reads the relay log (file) written by the io_thread
and executes the events found there. It appears like any other client thread in SHOW PROCESSLIST
but it is not created via a connection to port 3306 on the slave, because it's a thread created inside the MySQL process when you START SLAVE
.
There is no reason a thread that's running inside the MySQL process would need to make a TCP connection to the same process -- it's already running inside the MySQL server and has access to the server internals to execute the replication events received from the master and stored to the relay log.
I think you what you are describing is a replication topology called a Star Topology
I wrote about that before
In that event, the Central Server must serve as a Slave in MySQL Replication paradigm until all changes in the binary logs of the Master (outside server) are pulled and executed in the Central Server.
If you pull the plug, replication breaks. It will lose its place.
MySQL does have a setting to control this in the CHANGE MASTER TO command. By default, the retry interval for MySQL 5.6 is 86400 seconds. That's 24 hours. Before MySQL 5.6, it was 60 seconds. To change the retry interval simply run,
STOP SLAVE;
CHANGE MASTER TO MASTER_CONNECT_RETRY = 60;
START SLAVE:
CAVEAT
Even if you pull the plug, it is possible for replication to lose its place and not reconnect because of incomplete transmission of binlog events. When that happens, you must reset the replication coordinates manually.
Here is how to fix replication manually in this regard:
Step 1: STOP SLAVE;
Step 2: SHOW SLAVE STATUS\G
Suppose the display looks like this
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.48.20.253
Master_User: replicant
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000255
Read_Master_Log_Pos: 858190247
Relay_Log_File: relay-bin.066069
Relay_Log_Pos: 873918
Relay_Master_Log_File: mysql-bin.000254
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 858190247
Relay_Log_Space: 873772
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
1 row in set (0.00 sec)
Step 4: Record Relay_Master_Log_File and Exec_Master_Log_Pos
In this case, 'mysql-bin.000254'
and 858190247
Step 5: Use Replication Coordinates from Step 4
CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000254',MASTER_LOG_POS=858190247;
Step 6: Start Replication
START SLAVE;
Give it a Try !!!
Best Answer
The basic problem stems from the Slave's IO Thread. Sometimes, it has the nasty habit of playing dumb and not properly checking or pinging the master.
MySQL 5.5's semisynchronous replication allows you to have the master time itself out and degrade back to asynchronous replication. Slaves running semisynchronous replication should have a more sensitive IO Thread now.
Concerning your particular situation, you are using MySQL 5.0.24 ??? That's a very old version. There are two bug reports (Bug1 and Bug2) discussing this.
It just dawned on me that you asked this question earlier and DTest answered it. Giving credit where credit is due, the bug reports came from his answer.
It would be nice to to be informed when the Public IP is being disabled from Slave access. That way, you could run
STOP SLAVE;
on the slave in advance.You also asked is there a way to come to know the master status from the slave, when this type of behavior occurs.
Here is a typical
SHOW SLAVE STATUS\G
Please note the following:
Master_Log_File
whose latest SQL was last copied to relay logsRelay_Master_Log_File
whose SQL was last executed from the relay logsIf none of these value are moving, that could mean the all SQL is processed or that the slave is currently processing an SQL statement in the SQL thread that came from position
Exec_Master_Log_Pos
of the Master LogRelay_Master_Log_File
.Now, look at
Relay_Log_Space
. This number represents the sum total of all filesizes for all relay logs. IfSlave_IO_Running
is Yes andRelay_Log_Space
is not changing, then go check the master by runningSHOW MASTER STATUS;
. It should like something like this:If the Master's (File,Position) of
SHOW MASTER STATUS;
is beyond the Slave's (Master_Log_File,Read_Master_Log_Pos) ofSHOW SLAVE STATUS\G
, then the IO Thread on the Slave is dead for intents and purposes, even if Slave_IO_Running is Yes.