Mysql – Unexpected failover of RDS-MySQL server

amazon-rdsfailoverMySQL

We have a 16GB RDS-MySQL Multi-AZ instance. Of late, it has unexpectedly failed over multiple times causing the dependent application to be offline for over 15 minutes.

AWS-Support blamed it on high Swap Usage. However, the instance has failed when the Swap Usage was as low as 10 MB. We have started optimizing the queries.

Is there anything else that we can do to avoid such mishap?

===================================================================

EDIT :

We have made one observation. The CPU utilization of the instance remains fairly low. So, we presumed that the issue might be with disk access. The write-throughput is several times that of read-throughput. To add to it, Sort_merge_passes has a value of 79425(which is crazy, right ?). On going through the slow query log, it was found that there are multiple queries with ORDER BY clause not using indexes. It will take time before workarounds are found for all these queries. Can you suggest some immediate fix to overcome the failover issue ? Should we increase the global value of sort_buffer_size ? Current value is 2MB.

===========================================================

EDIT 2 :

Below is a screenshot of the rate of change of Sort_Merge_Passes(Queries per second) as seen on RDS-Performance Insights.

As can be seen, during periods of load,the value varies from 0.5 to 1(we have observed values higher than 1 as well). A query can cause multiple Sort_merge_passes, 8 being the highest that we have observed.

=======================================================================

EDIT 3 :

We also found that the global status Created_tmp_disk_tables is increasing at the rate of approx 2-3/second. Badly structured queries I guess.

Best Answer

Swapping is a bad symptom of over-use of RAM.
An excessive number of simultaneous connections may lead to swapping.
Very complex queries may lead to swapping (due to temp tables). Let's see such queries if you have any.
Have you changed any things in my.cnf (or the equivalent in RDS)? If so let's see if you raised some values too much.
As a quick fix (and if RDS lets you), decrase innodb_buffer_pool_size by a gigabyte.
Sort_merge_passes - divide by Uptime. 1/second would be "high", but not high enough to be concerned with.
For deeper analysis, see http://mysql.rjweb.org/doc.php/mysql_analysis -- the GLOBAL STATUS and VARIABLES may find some clues; pt-query-digest will focus on the 'worst' queries.
sort_buffer_size = 2M is fine.
"ORDER BY clause not using indexes" - show us the queries, plus SHOW CREATE TABLE.

Related Solutions

Mysql – CPU usage on RDS instance monotonically increasing with no change to query volume

I have some queries for you regarding table sizes that you can run in MySQL during these spikes

Database size in terms of StorageEngine (MB)

SELECT IFNULL(B.engine,'Total') "Storage Engine", CONCAT(LPAD(REPLACE(FORMAT(
B.DSize/POWER(1024,pw),3),',',''),17,' '),' ',SUBSTR(' KMGTP',pw+1,1),'B') "Data Size",
CONCAT(LPAD(REPLACE(FORMAT(B.ISize/POWER(1024,pw),3),',',''),17,' '),' ',
SUBSTR(' KMGTP',pw+1,1),'B') "Index Size",CONCAT(LPAD(REPLACE(FORMAT(B.TSize/
POWER(1024,pw),3),',',''),17,' '),' ',SUBSTR(' KMGTP',pw+1,1),'B') "Table Size"
FROM (SELECT engine,SUM(data_length) DSize,
SUM(index_length) ISize,SUM(data_length+index_length) TSize FROM information_schema.tables
WHERE table_schema NOT IN ('mysql','information_schema','performance_schema') AND
engine IS NOT NULL GROUP BY engine WITH ROLLUP) B,(SELECT 2 pw) A ORDER BY TSize;

Database size in terms of Databases (MB)

SELECT DBName,CONCAT(LPAD(FORMAT(SDSize/POWER(1024,pw),3),17,' '),' ',
SUBSTR(' KMGTP',pw+1,1),'B') "Data Size",
CONCAT(LPAD(FORMAT(SXSize/POWER(1024,pw),3),17,' '),' ',
SUBSTR(' KMGTP',pw+1,1),'B') "Index Size",
CONCAT(LPAD(FORMAT(STSize/POWER(1024,pw),3),17,' '),' ',
SUBSTR(' KMGTP',pw+1,1),'B') "Total Size" FROM (SELECT
IFNULL(DB,'All Databases') DBName,SUM(DSize) SDSize,SUM(XSize) SXSize,
SUM(TSize) STSize FROM (SELECT table_schema DB,data_length DSize,
index_length XSize,data_length+index_length TSize FROM information_schema.tables
WHERE table_schema NOT IN ('mysql','information_schema','performance_schema')) AAA
GROUP BY DB WITH ROLLUP) AA,(SELECT 2 pw) BB ORDER BY (SDSize+SXSize);

Database size in terms of Database/StorageEngine (MB)

SELECT IF(ISNULL(B.table_schema)+ISNULL(B.engine)=2,"Storage for All Databases",
IF(ISNULL(B.table_schema)+ISNULL(B.engine)=1,CONCAT("Storage for ",B.table_schema),
CONCAT(B.engine," Tables for ",B.table_schema))) Statistic,CONCAT(LPAD(REPLACE(FORMAT(
B.DSize/POWER(1024,pw),3),',',''),17,' '),' ',
SUBSTR(' KMGTP',pw+1,1),'B') "Data Size",CONCAT(LPAD(REPLACE(FORMAT(
B.ISize/POWER(1024,pw),3),',',''),17,' '),' ',SUBSTR(' KMGTP',pw+1,1),'B') "Index Size",
CONCAT(LPAD(REPLACE(FORMAT(B.TSize/POWER(1024,pw),3),',',''),17,' '),' ',
SUBSTR(' KMGTP',pw+1,1),'B') "Table Size" FROM (SELECT table_schema,engine,
SUM(data_length) DSize,SUM(index_length) ISize,SUM(data_length+index_length) TSize
FROM information_schema.tables WHERE table_schema NOT IN
('mysql','information_schema','performance_schema') AND engine IS NOT NULL
GROUP BY table_schema,engine WITH ROLLUP) B,(SELECT 2 pw) A ORDER BY TSize;

Pay attention to certain markers

Innodb_buffer_pool_pages_dirty
Innodb_data_reads
Innodb_data_writes

I recommend downloading MySQL Administrator (I know, it's old but I still you it for quick and dirty "I WANNA SEE STATS NOW" moments of day) and set it up. I customized my own graphs to watch the size of the InnoDB Buffer Pool and its dirty pages. You could also just use the Connection Health tab.

Mysql – statistics State in MySQL Processlist

If the query thead stays longer for statistics indicates that the server is probably disk-bound performing other work.

How to reduce Disk-Bound:

1) Increase the size of innodb_buffer_pool_size

If you are using Innodb tables then When table data is cached in the InnoDB buffer pool, it can be processed over and over by queries without requiring any disk I/O. Specify the size of the buffer pool with the innodb_buffer_pool_size option. This memory area is important enough that busy databases often specify a size approximately 80% of the amount of physical memory.

2) In some versions of GNU/Linux and Unix, flushing files to disk with the Unix fsync() call (which InnoDB uses by default) and similar methods is surprisingly slow. If database write performance is an issue, conduct benchmarks with the innodb_flush_method parameter set to O_DSYNC.

3) Increase the size of innodb_log_buffer_size - Set the amount of memory allocated to the buffer storing InnoDB write-ahead log entries. For large transactions, the log can be loaded into the log buffer instead of writing log to the log files on disk untill the log buffer is flushed on each transaction commit. If you see large log I/Os in the show innodb status output at runtime, you probably need to set a larger value for the innodb_log_buffer_size parameter to save disk I/O.

4) Increase the memory used to cache the tables and the queries - check the cache hit ratio of them Check and increase these MySQL variables: query_cache_size, query_cache_limit, query_cache_min_res_unit, tmp_table_size, join_buffer_size , sort_buffer_size etc..

5) Make sure that proper index is applied to all tables on a server and use proper datatypes.

You can refer these links to resolve Disk-bound issue: http://dev.mysql.com/doc/refman/5.5/en/disk-issues.html https://blogs.oracle.com/luojiach/entry/mysql_innodb_performance_tuning_for

Best Answer

Related Solutions

Mysql – CPU usage on RDS instance monotonically increasing with no change to query volume

Mysql – statistics State in MySQL Processlist

Related Question