The basic problem stems from the Slave's IO Thread. Sometimes, it has the nasty habit of playing dumb and not properly checking or pinging the master.
MySQL 5.5's semisynchronous replication allows you to have the master time itself out and degrade back to asynchronous replication. Slaves running semisynchronous replication should have a more sensitive IO Thread now.
Concerning your particular situation, you are using MySQL 5.0.24 ??? That's a very old version. There are two bug reports (Bug1 and Bug2) discussing this.
It just dawned on me that you asked this question earlier and DTest answered it. Giving credit where credit is due, the bug reports came from his answer.
It would be nice to to be informed when the Public IP is being disabled from Slave access. That way, you could run STOP SLAVE;
on the slave in advance.
You also asked is there a way to come to know the master status from the slave, when this type of behavior occurs.
Here is a typical SHOW SLAVE STATUS\G
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.64.80.136
Master_User: replicant
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000019 <<------
Read_Master_Log_Pos: 277892198 <<------
Relay_Log_File: relay-bin.000058
Relay_Log_Pos: 37535484
Relay_Master_Log_File: mysql-bin.000019 <<------
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 277892198 <<------
Relay_Log_Space: 277892637 <<------
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
1 row in set (0.00 sec)
Please note the following:
- Master_Log_File : Log file from Master whose latest SQL was last copied to relay logs
- Read_Master_Log_Pos : Log position within
Master_Log_File
whose latest SQL was last copied to relay logs
- Relay_Master_Log_File : Log file from Master whose SQL was last executed from the relay logs
- Exec_Master_Log_Pos : Log position within
Relay_Master_Log_File
whose SQL was last executed from the relay logs
If none of these value are moving, that could mean the all SQL is processed or that the slave is currently processing an SQL statement in the SQL thread that came from position Exec_Master_Log_Pos
of the Master Log Relay_Master_Log_File
.
Now, look at Relay_Log_Space
. This number represents the sum total of all filesizes for all relay logs. If Slave_IO_Running
is Yes and Relay_Log_Space
is not changing, then go check the master by running SHOW MASTER STATUS;
. It should like something like this:
mysql> show master status;
+------------------+-----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+-----------+--------------+------------------+
| mysql-bin.000019 | 682563021 | | |
+------------------+-----------+--------------+------------------+
1 row in set (0.00 sec)
If the Master's (File,Position) of SHOW MASTER STATUS;
is beyond the Slave's (Master_Log_File,Read_Master_Log_Pos) of SHOW SLAVE STATUS\G
, then the IO Thread on the Slave is dead for intents and purposes, even if Slave_IO_Running is Yes.
There is no solution for fan-in with traditional MySQL replication.
Another option is Tungsten Replicator. Here are a couple of posts on how it supports fan-in replication:
Another option is Galera replication, which is integrated with Percona XtraDB Cluster and MariaDB Galera Cluster. In Galera, all cluster nodes take replicated changes from all other cluster nodes. Which satisfies your goal of fan-in -- sort of -- but it means that all nodes have all the data.
For what it's worth, MariaDB 10 made their General Availability announcement on March 31, 2014.
I don't recommend the Federated storage engine. It should really be deprecated, because it's slow, buggy, and fragile. My feel is that MySQL is considering scrapping it. See http://www.tocker.ca/2013/10/17/how-do-you-use-the-federated-storage-engine.html
Best Answer
Many things may be involved.
First, which of these are you talking about? (And what values do you have?)
How do you "load balance" the Slaves? Assuming that the connections are normally quick, "round robin" is better than any sophisticated method.
You should consider adding another Slave -- Keep in mind that if you lose a Slave (crash, maintenance, hiccup, etc), you will be back down to 3, which seems has the problem. Having 4 up-to-date slaves may make the problem much rarer.
The reason for throttling
max_connections
is to throw the problem back to the client, forcing it to pick another Slave. (I assume you recover from connection error and try again?)Please describe how the data comes in. It sounds like sensor data that is pushed from remote sensors? Is there anything that would cause it to "sync" -- that is everyone sends at the top of the minute? If so, is there a way to stagger the pushes?
Do the remote sensors do the following?
SHOW CREATE TABLE
)? If not, please describe it.
Is there any "connection pooling"? You mentioned "maxscale"; are you using that?
What is the average number of connections over an hour (or day or whatever)? As a first cut, I would set
max_connections
to twice that.Please provide (1) RAM size, (2)
SHOW VARIABLES
, (3)SHOW GLOBAL STATUS
after mysql has been running for a day. This may give more insight into the problems.