MariaDB threads goes around 10K suddenly

database-designmariadbreplication

We have a big Database, It is about 750G. And also for replication this is our Master DB and we have 4 slaves, 3 of them are sync without delay, and one of them has 2 hours delay.

OS : Ubuntu 16.04
DB : MariaDB 10.2.9
app : php 7

Sometimes mysql thread increase suddenly (about 5k) without any reason, by chance I found that if I stop one of first three servers slave trough stop slave it can decrease the threads to normal numbers (around 500), and start slave by star slave after 1 minute or more.

Every time Mysql threads goes up more than 4000 in a second I should use :
stop slave; start slave; on every server ! and this is not good solution!

*** And even worse cases, stop slave; start slave; won't work and Mysql threads is going up more than 10K and I should stop and start Master database Mariadb ( I mean systemctl stop mysql && systemctl stop mysql) and it is clear that at that time my application is not reachable by users!

My questions are :
-1 : what is the reason of this accidents ?
-2 : what is temporary solution ?
-3 : will something like maxscale help me in this scenario or not
-3.1 : If yes how does maxscale will help me ?
-4 : what is the best solution for my problem, how can I trace the issue ?

Best Answer

Many things may be involved.

First, which of these are you talking about? (And what values do you have?)

"Threads_cached": "5",      -- 1000 might be good for you
"Threads_connected": "10",
"Threads_created": "36404", -- divide by Uptime
"Threads_running": "4",     -- 20 is a 'large' value

"Max_used_connections": "238",
"max_connections": "400",   -- we may need to tune this to avoid the problem

How do you "load balance" the Slaves? Assuming that the connections are normally quick, "round robin" is better than any sophisticated method.

You should consider adding another Slave -- Keep in mind that if you lose a Slave (crash, maintenance, hiccup, etc), you will be back down to 3, which seems has the problem. Having 4 up-to-date slaves may make the problem much rarer.

The reason for throttling max_connections is to throw the problem back to the client, forcing it to pick another Slave. (I assume you recover from connection error and try again?)

Please describe how the data comes in. It sounds like sensor data that is pushed from remote sensors? Is there anything that would cause it to "sync" -- that is everyone sends at the top of the minute? If so, is there a way to stagger the pushes?

Do the remote sensors do the following?

connect to mysql
insert one row (please provide details and SHOW CREATE TABLE)
disconnect

? If not, please describe it.

Is there any "connection pooling"? You mentioned "maxscale"; are you using that?

What is the average number of connections over an hour (or day or whatever)? As a first cut, I would set max_connections to twice that.

Please provide (1) RAM size, (2) SHOW VARIABLES, (3) SHOW GLOBAL STATUS after mysql has been running for a day. This may give more insight into the problems.

Related Solutions

MySQL Master Port Disabled from Public IP – Log Not Refreshing

The basic problem stems from the Slave's IO Thread. Sometimes, it has the nasty habit of playing dumb and not properly checking or pinging the master.

MySQL 5.5's semisynchronous replication allows you to have the master time itself out and degrade back to asynchronous replication. Slaves running semisynchronous replication should have a more sensitive IO Thread now.

Concerning your particular situation, you are using MySQL 5.0.24 ??? That's a very old version. There are two bug reports (Bug1 and Bug2) discussing this.

It just dawned on me that you asked this question earlier and DTest answered it. Giving credit where credit is due, the bug reports came from his answer.

It would be nice to to be informed when the Public IP is being disabled from Slave access. That way, you could run STOP SLAVE; on the slave in advance.

You also asked is there a way to come to know the master status from the slave, when this type of behavior occurs.

Here is a typical SHOW SLAVE STATUS\G

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.64.80.136
                  Master_User: replicant
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000019 <<------
          Read_Master_Log_Pos: 277892198 <<------
               Relay_Log_File: relay-bin.000058
                Relay_Log_Pos: 37535484
        Relay_Master_Log_File: mysql-bin.000019 <<------
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 277892198 <<------
              Relay_Log_Space: 277892637 <<------
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
1 row in set (0.00 sec)

Please note the following:

Master_Log_File : Log file from Master whose latest SQL was last copied to relay logs
Read_Master_Log_Pos : Log position within Master_Log_File whose latest SQL was last copied to relay logs
Relay_Master_Log_File : Log file from Master whose SQL was last executed from the relay logs
Exec_Master_Log_Pos : Log position within Relay_Master_Log_File whose SQL was last executed from the relay logs

If none of these value are moving, that could mean the all SQL is processed or that the slave is currently processing an SQL statement in the SQL thread that came from position Exec_Master_Log_Pos of the Master Log Relay_Master_Log_File.

Now, look at Relay_Log_Space. This number represents the sum total of all filesizes for all relay logs. If Slave_IO_Running is Yes and Relay_Log_Space is not changing, then go check the master by running SHOW MASTER STATUS;. It should like something like this:

mysql> show master status;
+------------------+-----------+--------------+------------------+
| File             | Position  | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+-----------+--------------+------------------+
| mysql-bin.000019 | 682563021 |              |                  |
+------------------+-----------+--------------+------------------+
1 row in set (0.00 sec)

If the Master's (File,Position) of SHOW MASTER STATUS; is beyond the Slave's (Master_Log_File,Read_Master_Log_Pos) of SHOW SLAVE STATUS\G, then the IO Thread on the Slave is dead for intents and purposes, even if Slave_IO_Running is Yes.

Mysql – Fan-In replication 4 Master -> 1 slave

There is no solution for fan-in with traditional MySQL replication.

Another option is Tungsten Replicator. Here are a couple of posts on how it supports fan-in replication:

Another option is Galera replication, which is integrated with Percona XtraDB Cluster and MariaDB Galera Cluster. In Galera, all cluster nodes take replicated changes from all other cluster nodes. Which satisfies your goal of fan-in -- sort of -- but it means that all nodes have all the data.

For what it's worth, MariaDB 10 made their General Availability announcement on March 31, 2014.

I don't recommend the Federated storage engine. It should really be deprecated, because it's slow, buggy, and fragile. My feel is that MySQL is considering scrapping it. See http://www.tocker.ca/2013/10/17/how-do-you-use-the-federated-storage-engine.html

Best Answer

Related Solutions

MySQL Master Port Disabled from Public IP – Log Not Refreshing

Mysql – Fan-In replication 4 Master -> 1 slave

Related Question