MySQL commits taking time

amazon-rdsMySQLmysql-5.7performanceslow-log

We are facing an issue, where at certain times there are number of commits that takes a more than a second. The time interval for this is not fixed. Our application sends thousands of commits but at some instance the commits with exact same timestamp gets slow.

This is extracted from MySQL slow query logs

There isn't anyother query in the slow query log at the same time. Any pointers on how it can be further investigated?

We are using MySQL 5.7.16 on Amazon RDS.

Best Answer

Since your commits are stalling, it's pretty safe to assume you're running with innodb_flush_log_at_trx_commit=1 (see #4 here), so one obvious change you could do is switch to innodb_flush_log_at_trx_commit=0, however that may not be the right choice for you - see the previous link.

In any case, COMMITs don't usualy stall due to other activity on the server (like queries), unless that other activity locks some kind of mutex for far too long.

If I had to guess without any further evidence, I would say that your writes are so intensive, that MySQL is unable to flush fast enough using adaptive checkpointing, so it exceeds 75% of the log capacity and starts sharp checkpointing (which blocks server activity for a short period).

There are few quick ways to make MySQL handle writes better (besides one already mentioned above):

Increase innodb log files (innodb_log_file_size) - you can go up to 4GB or so for your logs (esimating based on zero information given on the amount of writes),
Increase pIOPS capacity for the data drive
Increase innodb_io_capacity (up to ~70-80% of pIOPS capacity)

Otherwise it would be really useful to see disk activity from AWS, amount of data server currently writes (you can calculate it using Log sequence number delta between two consecutive show engine innodb status\G runs)

There are other ways too, but for that, I'd need a lot more details on the actual workload.

Related Solutions

Mysql – Slower queries after upgrade from thesql-5.5 to thesql-5.6

The problem was that our new RDS instance was being throttled on writeIOPS, we could see queued disk operations (DiskQueueDepth of 10 or greater) and we could see writeIOPS averaging 300, whereas Amazon only provide 100 reliable iOPS (source).

The solution was to fork out approx $200 per month for "provisioned iOPS". 1000 is the smallest you could buy (which easily covers us). After this change which took 5 hours to complete on 350G of storage (but only 30s of downtime with a multi-AZ instance) everything was fine and our MySQL 5.6 instance was running better than the original MySQL 5.5 instance like all the literature promises.

The problem was that the new RDS instance that we were given since migrating to 5.6 was just not as good as our original, which was just bad luck. We asked Amazon support "Q: Why would there be a difference in mysql5.5 than mysql5.6?" From Amazon:

Your previous underlying host may have been benefiting from unusually light use on the underlying volumes. You may have grown accustomed to very light use which allowed you to consume additional IOPs without accumulating as much DiskQueueDepth. However, when you moved to a new underlying host by upgrading you may now be experiencing more typical use of the underlying volumes. With typical use of the underlying volumes it's not recommended to consistently exceed the amount of reliable IOPs while expecting to maintain the lower DiskQueueDepth that you've grown accustomed to. However, there is no way to confirm or deny that this was the cause.

Mysql – theSQL running on 100% CPU both on RDS and EC2

Your question is about troubleshooting, not about 'fixing' the CPU issue.

Nearly all CPU issues can be traced to bad/missing indexes or poorly written queries.

Do you have the slowlog turned on? With long_query_time = 2? Can you run pt-query-digest? Well, probably you said no to all of those. Pursue them for future use.

Can you do SHOW FULL PROCESSLIST? Probably. Look at the queries that don't say "Sleep" and don't look like they have something to do with replication. They will probably be the naughty queries.

Show us the naughtiest query, together with SHOW CREATE TABLE for the table(s) that it uses. Also provide EXPLAIN SELECT ... so we can see how the query is being performed.

If you are lucky, we will say "add INDEX(...)" or "change the WHERE clause in this way...". If you are less lucky, the solution could be more complex.

(None of the specs you mentioned -- ram, cores, #tables, table size, etc., are important to the question. The version might be important because the solution might involve a 'new' feature.)

Since you have provided the SELECT, I can make some guesses and proceed without the rest of the data.

Keeping junk in the table and filtering them out hurts performance (='N', ='Y'), but I will live with that for the moment.

INDEX(rep_type, rep_status, pub_id, rep_pub_date) might make things run a lot faster. But... The speedup will only work (I think) when the IN clause has only one item.

OFFSET 1200 means that 1200 rows have to be fetched, then tossed. Let's work on decreasing that overhead...

(Note that JOIN...ON is a better way to write JOINs.)

Oops, I need the PRIMARY KEY of reports; I'll assume it is rep_id.

SELECT  r2.* , d.name AS dom_name, p.pub_name
    FROM  
      ( SELECT  rep_id
            FROM  reports
            WHERE  rep_type = 'N'
              AND  pub_id IN ( 3 )
              AND  rep_status = 'Y'
            ORDER BY  rep_pub_date DESC
            LIMIT  1200 , 100 
      ) r1
    JOIN  reports AS r2  USING(rep_id)
    JOIN  domains AS d   ON d.dom_id = r2.domain_id
    JOIN  publisher AS p ON p.pub_id = r2.pub_id;

Some notes:
* The subquery looks only at reports, and does not have to haul around 1200 rows to be thrown away.
* Once the subquery is finished, the JOINs are doing only 100 rows each.
* This format will benefit from the above index.
* When the IN has multiple items, the index will work even better for the subquery than the original query. But still not 'perfectly'.
* My index works well only because it handles all of the WHERE clause, plus all of the ORDER BY.

To get even better performance for "pagination" than you can get through OFFSET, see my blog: mysql.rjweb.org. That blog also points out two bugs you possibly don't know you currently have.

Another issue... Do users really go as far as the 13th page? Or is this a search engine crawler causing you grief?

Best Answer

Related Solutions

Mysql – Slower queries after upgrade from thesql-5.5 to thesql-5.6

Mysql – theSQL running on 100% CPU both on RDS and EC2

Related Question