MariaDB threads goes around 10K suddenly

database-designmariadbreplication

We have a big Database, It is about 750G. And also for replication this is our Master DB and we have 4 slaves, 3 of them are sync without delay, and one of them has 2 hours delay.

OS : Ubuntu 16.04
DB : MariaDB 10.2.9
app : php 7

Sometimes mysql thread increase suddenly (about 5k) without any reason, by chance I found that if I stop one of first three servers slave trough stop slave it can decrease the threads to normal numbers (around 500), and start slave by star slave after 1 minute or more.

Every time Mysql threads goes up more than 4000 in a second I should use :
stop slave; start slave; on every server ! and this is not good solution!

*** And even worse cases, stop slave; start slave; won't work and Mysql threads is going up more than 10K and I should stop and start Master database Mariadb ( I mean systemctl stop mysql && systemctl stop mysql) and it is clear that at that time my application is not reachable by users!

My questions are :
-1 : what is the reason of this accidents ?
-2 : what is temporary solution ?
-3 : will something like maxscale help me in this scenario or not
-3.1 : If yes how does maxscale will help me ?
-4 : what is the best solution for my problem, how can I trace the issue ?

Best Answer

Many things may be involved.

First, which of these are you talking about? (And what values do you have?)

"Threads_cached": "5",      -- 1000 might be good for you
"Threads_connected": "10",
"Threads_created": "36404", -- divide by Uptime
"Threads_running": "4",     -- 20 is a 'large' value

"Max_used_connections": "238",
"max_connections": "400",   -- we may need to tune this to avoid the problem

How do you "load balance" the Slaves? Assuming that the connections are normally quick, "round robin" is better than any sophisticated method.

You should consider adding another Slave -- Keep in mind that if you lose a Slave (crash, maintenance, hiccup, etc), you will be back down to 3, which seems has the problem. Having 4 up-to-date slaves may make the problem much rarer.

The reason for throttling max_connections is to throw the problem back to the client, forcing it to pick another Slave. (I assume you recover from connection error and try again?)

Please describe how the data comes in. It sounds like sensor data that is pushed from remote sensors? Is there anything that would cause it to "sync" -- that is everyone sends at the top of the minute? If so, is there a way to stagger the pushes?

Do the remote sensors do the following?

  1. connect to mysql
  2. insert one row (please provide details and SHOW CREATE TABLE)
  3. disconnect

? If not, please describe it.

Is there any "connection pooling"? You mentioned "maxscale"; are you using that?

What is the average number of connections over an hour (or day or whatever)? As a first cut, I would set max_connections to twice that.

Please provide (1) RAM size, (2) SHOW VARIABLES, (3) SHOW GLOBAL STATUS after mysql has been running for a day. This may give more insight into the problems.