Long Requests Freezing MySQL Server – Performance Tips

MySQLperformancequery-performance

i'm working on a 5.7 Mysql server with MyISAM tables.

All my tables ares used by internal software for my customers. (and the tables, defined indexes, and storage type are imposed by the software)

I have create somes of request for generate reports. My reports work with multiples joined tables and lot of conditions. A request can make 5 minutes for display a result.

My problem is not the time ( because it's really due to the complexity of request). But during the request, all connexions are freezed and the users can't works.

Have you some idea for change this situation ( i have already
optimised my requests, and y can't work more on this side) ?

i search a solution about concurrent access, or other solutions.

note : my request are "read only".

thanks for help, and sorry for my lack of english.

Best Answer

Option 1: Change engines

First thing to do, if at all possible, is probably change MyISAM by InnoDB, which handles concurrency much better. See "MyISAM vs InnoDB", specially the point about "Table-locking vs Row-locking". I quote them:

When a query runs against a MyISAM table, the entire table in which it is querying will be locked. This means subsequent queries will only be executed after the current one is finished. If you are reading a large table, and/or there are frequent read and write operations, this can mean a huge backlog of queries.

When a query runs against an InnoDB table, only the row(s) which are involved are locked, the rest of the table remains available for CRUD operations. This means queries can run simultaneously on the same table, provided they do not use the same row.

[emphasis mine]

Option 2: Master/Slave replication

Another possible (although probably not cheap) solution is to have your database being replicated using an asynchronous Master/Slave setup. [I guess that if you cannot change from MyISAM to InnoDB, it won't be easy to setup a replicated DB... but I don't know either your possibilities nor your limitations.]

Your users (i.e.: your application) should always be working with your Master node, and all INSERTs, UPDATEs and DELETEs should always and only be executed there. The SELECTs can be executed in either node, taking into account that the Slave might be slightly not up-to-date. The SELECTs that take a long time should be executed in the Slave, so all the users in the Master node are not blocked at all.

The Slave node can not be completely up-to-date (as the replication is asynchronous, it can lag behing the master by anything from a few miliseconds to a long time, if the Slave could not process the updates from the master and they "accumulate").

This kind of scenario is not uncommon in applications where you have the Master node working with a setup optimized for OLTP (OnLine Transaction Processing); and the Slave node with a setup optimized for OLAP (OnLine Analytical Processing). OLTP means (mostly) entering and manipulating data, and making relatively simple queries, with small amounts of data every time, whereas OLAP means querying data, normally in large amounts and with aggregations and computations (i.e.: analysis and reporting).

Related Solutions

SQL Server – Impact of Column Count in SELECT Query on Buffer Cache

SQL Server stores all the columns for one row together in a single disk page. (It's more complex than this but for int and char etc. this is more-or-less true.) To retrieve any column's value the whole page is read into RAM. So once you have any column available for a given row, all of the columns for that row are available.

There are several buffers / caches involved in responding to a query. One is the great big page bufferpool where pages read from disk are held in memory so execution plans can act on them. And yes, if one query causes a page to be loaded into the bufferpool all subsequent queries will use the same page from the bufferpool and not have to suffer the IO overhead. This is an incidental performance gain, however, and shouldn't be relied upon because SQL Server may choose to evict any page from the bufferpool at any time it needs the space.

Another set of buffering occurs when the output from your query gets sent over your connection to your application. This is (mostly) on a per-connetion basis so duplicating the work duplicates the resources required (more or less).

Another cost is the optimisations you miss out on when you return all columns. Covering indexes and "INCLUDES" columns are meaningless. Every read will have to retrieve the underlying clustered index (or heap) page. It's a fair bet that SQL Server will not be using specific indexes it might otherwise have chosen because a scan-the-cluster plan is cheaper than an index-and-cluster plan.

MySQL Replication – Replicated Tables for Cognos BI

Enable the slow query log and log_queries_not_using_indexes on the slave server and should be able to confirm what your gut is telling you. You can take the captured queries, run EXPLAIN on them, and have some better documentation of your suspicions.

To more directly answer the question, it's absolutely valid to declare indexes on a slave that are not declared on the master. I would suggest that it's a common practice, and one of many reasons why replicas are useful for so many things. Connect to the slave with an account with sufficient privilege, and alter the table directly on the slave.

The only exceptions to this -- which should be intuitively obvious, since the row data needs to be consistent on master and replica -- is that you cannot safely declare a UNIQUE constraint or a foreign key constraint on a slave when it doesn't exist on the master. This won't work, which is fine, since it doesn't make sense at any rate.

Remember also that indexes in MySQL have names, which are automatically generated if they are not supplied when the index is added. A good practice might be to explicitly name these indexes on the replica so that future indexes added to the master cannot possibly cause an index name collision, which would break replication. My local convention is to prefix the index name with "ix_repl_" which would of course never be used on the master.

ALTER TABLE t1 ADD KEY ix_repl_last_first (last_name,first_name);

Note also that proper duplicate indexes (where exactly the same columns, and no others, are included in more than one index) are deprecated in MySQL 5.6 and disallowed by default in MySQL 5.7, making it apparently possible in later versions to cause replication to stop if you subsequently declare an identical index (even with a different name) on the master. This wouldn't be a critical issue (as long as you're monitoring replication -- you are, right?) since replication would be safe to restart by simply removing the now-redundant index on the replica before restarting the slave SQL thread. The failed event would be retried, and would now be valid, since there's no conflicting index, and the replacement index would be built on the replica.

Side note: remember that MySQL replication requires the version of MySQL on the replica servers to be the same as, or newer than, the version on the master... so when upgrading (or migrating to MariaDB) you will almost certainly want to upgrade the replicas first, and then the master. The reason for this is that a newer replica will understand the capabilities and quirks of an older master, but a newer master may introduce behaviors in the replication stream that an older replica server won't be able to interpret. There are limited exceptions to this rule, but it's most definitely the rule.