MySQL stalls and strange threads/connections behavior is observed

connection-poolingconnectionsmax-connectionsMySQL

We have a MySQL instance which most of the time works just fine. From time to time, however, we observe strange behavior. What happens is that suddenly MySQL connections reach the limit (currently set to 1000) and MySQL pretty much stalls. The following is observed when we check the graphs for this server:

The image is from a Grafana instance running a MySQL exporter. What troubles me is why threads from the cache are not reused. The other question is why the threads in cache suddenly drops to zero but at the same time no new threads are created (according to the second graph – blue line stays at 0).

Ultimately, my goal is to understand why 1000 connections are used all of a sudden and if this is a problem with MySQL or the app. This is what the MySQL docs say in regards to threads:

How many threads the server should cache for reuse. When a client
disconnects, the client's threads are put in the cache if there are
fewer than thread_cache_size threads there. Requests for threads are
satisfied by reusing threads taken from the cache if possible, and
only when the cache is empty is a new thread created.

Any assistance or guidance will are highly appreciated!

Best Answer

Thundering herd. Or, so many people in a grocery store that no one can move their cart.

When Threads_running is 349, each of them is waiting and waiting and waiting for a tiny slice of CPU and a tiny amount of I/O and ... The effect is that they all take 'forever' to finish. Meanwhile, others trying to get in are let in because you have such a high max_connections, etc.

But, that does not explain what pushed the herd over the cliff. It could be something as trivial as a big SELECT that happens to block what the herd normally does.

Do you have the slowlog turned on? With a low value of long_query_time? If so, you may already have the answer in the slowlog. Since you did not panic and reboot, that slow Select (if it really exists) will be in the slowlog. Based on the graphs, I would guess it might have a time of 80-110 seconds.

As for the dip in Threads_cached, that gets a bit complex. First, let me explain the non-dip parts. Before the 'event', about 200 connections were made; each lasted long enough for there to be a 'steady' 200 connected.

After the event, there were so many waiting to get in that it pushed the number up to about 350.

Having thread_cache_size = 500 wastes memory; suggest you lower it to 250.

As for the dip, I will do some guessing. So many connections were trying so hard to get things done that new connections were denied access. (This may be deliberate on the part of MySQL, or it may be a side effect of everyone clambering for various mutexes.)

It is also unwise to have max_connections = 1000; this just invites scenarios like this. Instead, have the clients 'take the heat' when something hiccups -- by being denied access to MySQL. Furthermore, this makes it easier to say "Oops, were are having a problem please wait, and please do not press Send again, nor Refresh!" Or simply give a "500".

Bottom line: I think the slowlog can point to the cause of the event.

Related Solutions

MySQL 5.5 Runs Out of Memory, Drops All Connections When Creating Many Databases

The first thing that comes to mind is the Server Model : db.m1.large

What limiting factors are placed on a MySQL RDS?

If you spin up an Amazon RDS instance of MySQL, you would subject yourself to whatever constraints are given. All models of MySQL Amazon RDS have the same major options but differ in only two aspects

max_connections
innodb_buffer_pool_size

Here is a Chart I posted Last Month

MODEL      max_connections innodb_buffer_pool_size
---------  --------------- -----------------------
t1.micro   34                326107136 (  311M)
m1-small   125              1179648000 ( 1125M,  1.097G)
m1-large   623              5882511360 ( 5610M,  5.479G)
m1-xlarge  1263            11922309120 (11370M, 11.103G)
m2-xlarge  1441            13605273600 (12975M, 12.671G)
m2-2xlarge 2900            27367833600 (26100M, 25.488G)
m2-4xlarge 5816            54892953600 (52350M, 51.123G)

I posted this in my other posts in the DBA StackExchange

You are using m1.large. Since the InnoDB Buffer Pool is 3/4 of the Instance. That's means you have 7.2G but only 1.8 is usable. That model is capable of having up to 623 connections. Each connection can consume memory because of

join buffers
sort buffers
read buffers
thread info

Amazon RDS is simply micromanaging resources. Since DB Connections can consume RAM, connections are probably being disallowed due to the lack of RAM needed.

SUGGESTIONS

Try reducing InnoDB log IO innodb_flush_log_at_trx_commit=0 during the mass creation
Make sure you are not doing large transactions during any automatic backups or snapshots
Try a bigger server model m1.xlarge or m2.xlarge

MySQL server connections capped at 1013

We had the same issue. Put this at the top of the mysql start script.

ulimit -u 102400 -n 102400

Has to be at mysql start time.

Best Answer

Related Solutions

MySQL 5.5 Runs Out of Memory, Drops All Connections When Creating Many Databases

SUGGESTIONS

MySQL server connections capped at 1013

Related Question