MySQL RDS Instance eating up memory and swapping

amazon-rdsinnodbmemoryMySQLmysql-5.5

I have an issue with memory on a RDS instance which is of db.m2.4xlarge class.

Below are the specifications of the RDS instance

Instance Class: db.m2.4xlarge ( 68.4 GB of RAM )

Engine: mysql (5.5.37)

IOPS: 8000

Storage: 805GB ( 128 GB of data )

Multi AZ: Yes

Automated Backups: Enabled (5 Days)

The mysql instance is tuned without changing the InnoDB buffers values.

Total buffers : 23.9G global + 3.0M per thread (1000 max threads)

Maximum possible memory usage: 26.8G (42% of installed RAM)

Data in InnoDB tables: 98G

Earlier, we used to have higher buffer value which was more optimized, but the frequency of reboots made us to reduce it back to the default buffer size as set by RDS.

Our application works from 4 ColdFusion server and 4 asp.net servers, they all communicate to the concerned MySQL RDS instance. We are using connection pooling for the applications. All the DML statements are done using stored procedures and they are executed very frequently.

My issue is that the RDS instance if left alone, slowly eats the memory and starts to swap. When it swaps our applications starts to fail one by one. So we are left with the only option of rebooting the instance. Currently, we reboot the server, whenever the memory goes below 5 GB.

Another approach that we take is restart cold fusion servers, restart IIS and then issue a "flush tables" query. But this yields results at times, but at times, there is no change. Theoretically, I believe even without flush tables query, the memory should be released, which doesn't happen. Again, the released memory is usually within a range of 1 – 7 GB.

The interactive timeout set for MySQL is 300 seconds and ColdFusion connection lifetime is 300 seconds and the timeout set in asp.net is 270 seconds.

I am unable to track down what is using the memory and keeping it to itself. The memory should ideally get freed when MySQL needs it. But it is not happening.

I need some guidance on how to track down the memory hog, so that the RDS instance goes on functioning without a need for reboot.

We already have MONyog monitoring the RDS. Also, have enabled Global Status History (GoSH) in RDS, which hasn't been really useful for me.

Already I have seen similar issue mentioned here @ When should I think about upgrading our RDS MySQL instance based on memory usage?
We already have done all that is recommended there as a part of DB tuning.

Below are the graphs

The graph depicting 3 weeks of RDS

This graph depicts last 3 weeks

The graph starts with a reboot, which happened after the memory went down to 0 and started swapping. After which there has been 2 reboots.

The present week

The current week

There are 2 memory releases that happened on August 19th, 2014. One that relieved 6GB was done by restarting services and flushing table as explained below.

Past 24 Hours

This is how past 24 hours has been.

EDIT

Adding further details as needed.

Processlist is as below

mysql> SHOW PROCESSLIST;

+——–+—————–+—————————————————-+————+———+——-+—————————–+——————+ | Id | User | Host
| db | Command | Time | State | Info
|
+——–+—————–+—————————————————-+————+———+——-+—————————–+——————+ | 1 | event_scheduler | localhost
| NULL | Daemon | 30 | Waiting for next activation | NULL
| | 332 | user | ip3:32355 |
NULL | Sleep | 1 | | NULL
| | 836 | rdsadmin | localhost:40463
| mysql | Sleep | 6 | | NULL
| | 345919 | user | server2:49173 | NULL |
Sleep | 0 | | NULL | |
354132 | user | server2:49641 | NULL |
Sleep | 82 | | NULL | |
386366 | user | ip1.us-west-2.compute.internal:64097 | db | Sleep
| 1 | | NULL | | 389625 |
user | ip2.us-west-2.compute.internal:59819 | db | Sleep | 0
| | NULL | | 390109 | user
| ip1.us-west-2.compute.internal:64879 | db | Sleep | 1 |
| NULL | | 391366 | user |
ip2.us-west-2.compute.internal:60319 | db | Sleep | 1 |
| NULL | | 392045 | user | ip3:39593
| NULL | Sleep | 20193 | | NULL
| | 393625 | user | ip3:26708 | db
| Sleep | 3902 | | NULL | |
393626 | user | ip3:37544 | NULL
| Sleep | 5677 | | NULL | |
393933 | user | ip2.us-west-2.compute.internal:60912 | db | Sleep
| 0 | | NULL | | 394462 |
user | ip1.us-west-2.compute.internal:49971 | db | Sleep |
3 | | NULL | | 394802 | user
| ip7:1142 | db | Sleep | 88 |
| NULL | | 395410 | user |
ip2.us-west-2.compute.internal:64725 | db | Sleep | 4 |
| NULL | | 396217 | user |
ip2.us-west-2.compute.internal:64891 | db | Sleep | 12 |
| NULL | | 396581 | user | ip7:1423
| db | Sleep | 22 | | NULL
| | 396731 | user | ip7:1429 | db
| Sleep | 21 | | NULL | |
396954 | user | ip1.us-west-2.compute.internal:50472 | db | Sleep
| 7 | | NULL | | 398509 |
user | ip1.us-west-2.compute.internal:50595 | db | Sleep |
179 | | NULL | | 399337 | user
| ip3.us-west-2.compute.internal:49539 | db | Sleep | 219 |
| NULL | | 399338 | user |
ip3.us-west-2.compute.internal:49540 | db | Sleep | 219 |
| NULL | | 399360 | user |
ip4.us-west-2.compute.internal:62560 | db | Sleep | 0 |
| NULL | | 399363 | user |
ip4.us-west-2.compute.internal:62568 | db | Sleep | 0 |
| NULL | | 399580 | user |
ip5.us-west-2.compute.internal:50055 | db | Sleep | 1 |
| NULL | | 399581 | user |
ip6.us-west-2.compute.internal:58023 | db | Sleep | 1 |
| NULL | | 399607 | user |
ip6.us-west-2.compute.internal:58034 | db | Sleep | 1 |
| NULL | | 399608 | user |
ip6.us-west-2.compute.internal:58036 | db | Sleep | 1 |
| NULL | | 399609 | user |
ip6.us-west-2.compute.internal:58037 | db | Sleep | 1 |
| NULL | | 399669 | user |
ip4.us-west-2.compute.internal:62615 | db | Sleep | 0 |
| NULL | | 399682 | user | ip3:9507
| NULL | Query | 0 | NULL | SHOW
PROCESSLIST | | 399696 | user | server1:53241 |
db | Sleep | 1 | | NULL
| | 399704 | user | server1:53242 | db | Sleep
| 0 | | NULL | | 399705 |
user | ip4.us-west-2.compute.internal:62618 | db | Sleep |
0 | | NULL | | 399706 | user
| ip4.us-west-2.compute.internal:62619 | db | Sleep | 0 |
| NULL | | 399707 | user |
ip4.us-west-2.compute.internal:62620 | db | Sleep | 0 |
| NULL |
+——–+—————–+—————————————————-+————+———+——-+—————————–+——————+

SHOW ENGINE INNODB STATUS

I was not able to format the data properly, so I am putting that over pastebin
http://pastebin.com/JETQ6KG3

Thank you.

The server was again rebooted in the morning to free up memory.

Best Answer

Memory is not instrumented in MySQL until version 5.7 (currently in development), so this does make your question a bit of a guessing game.

I can see from inside InnoDB status, that it doesn't appear to be InnoDB consuming the memory (assuming you collected this from when the problem was occurring):

Total memory allocated 26217103360; in additional pool allocated 0
Dictionary memory allocated 1212141
Buffer pool size   1563519

Total memory: 24.41 GiB (InnoDB needs about 5-10% more than your buffer pool for internal structures) Buffer Pool memory: ~23.85 GiB

I'm going to have to guess that the memory is consumed at the MySQL layer (not InnoDB), and two common causes are intrinsic temporary tables, and very large sort/join buffers.