Mysql – how to safely stop thesqldump

debianlinuxMySQLmysql-5.5mysqldump

I (well, my cron script) tried killall mysqldump and that didn't end very well – mysql server stopped accepting connections after a while.

It was Debian Jessie machine with mysql 5.5.55-0+deb8u1.

Usage scenario was:

there was a long running (few hours) SELECT query which was either really that slow or the client that sent it was having problems (query state was Sending data), but all the other queries were happily coming and going (only the load was perhaps little higher).
in the night backup was being run with mysqldump --max_allowed_packet=2147483648 --hex-blob --single-transaction --master-data --routines --order-by-primary --databases db1 db2 db3... | pigz -p8 > backup.sql.gz. It never finished, probably because it was waiting for the SELECT above to finish first (guessing here – it was only thing which looked out of ordinary, and same setup worked OK for months).
cron job run in the morning, which had killall -q mysqldump which was supposed to safely terminate backup in case it was not finished by set time (notifying the admin to examine and fix the problem later), thus allowing people to continue working with mysql server normally.
result however was full connection table and thus no user able to log in to mysql server. There was FLUSH /*!40101 LOCAL */ TABLES query stuck in Waiting for table flush and hundreds of SELECT queries stuck in same Waiting for table flush state.
in addition, admin killing LOCK TABLES mysql query didn't help, as other SELECT queries remained in Waiting for table flush (which seems to be intended behavior?)

Restarting mysql server finally "fixed" the problem. However, Wanting to avoid this situation (and emergency admin interventions) repeating , I'd like to safely terminate mysqldump backup in Debian Jessie mysql-5.5.55 (or upcoming Debian Stretch mariadb-10.1.23-8). Is there a way?

If not, what are other options for accomplishing mysql backup and avoiding server load in the morning (which is – in this case – almost as bad as completely hung server)?

(I'd like to stay with Debian Stable packages if at all possible)

Best Answer

Since you are using --master_data to take a consistent value of master status.

The internals of mysqldump will issue below commands to mysql server.

2017-05-31T04:39:05.843130Z    48 Query /*!40100 SET @@SQL_MODE='' */
2017-05-31T04:39:05.843273Z    48 Query /*!40103 SET TIME_ZONE='+00:00' */
2017-05-31T04:39:05.843411Z    48 Query FLUSH /*!40101 LOCAL */ TABLES
2017-05-31T04:39:05.846031Z    48 Query FLUSH TABLES WITH READ LOCK
2017-05-31T04:39:05.846166Z    48 Query SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ
2017-05-31T04:39:05.846279Z    48 Query START TRANSACTION /*!40100 WITH CONSISTENT SNAPSHOT */
2017-05-31T04:39:05.846413Z    48 Query SHOW MASTER STATUS
2017-05-31T04:39:05.846539Z    48 Query UNLOCK TABLES
..
..... Here it continues to take backup of data and structures .

What had happened?:

Your backup would have just started and there was a query that was going on running on a particular table for a longer period of time before even FLUSH TABLES command and didn't release the lock on the table and FLUSH TABLES must have been waiting for that thread to get completed or keep trying to flush until the revision_version of that table is same as all tables.

Thus you get other threads blocked for other tables as well. As this is entire DBs*.Tables* level lock while flush tables was going on. Finally it would have got accumulated every new connection in processlist and piled up until max_connections and not allowing anyone to login.

Let's say if you have managed to login to the terminal and tried to kill flush tables, I don't think there is a way to pull back or rollback the flushed tables that has been done and release its own thread connection. So it might in a KILLED STATE for longer time. And thus you might have reached the last option that is to restart the server.

How to fix it?:

At the time of issue, when admin managed to login to mysql prompt.

Instead of issuing kill command on FLUSH TABLES thread, if kill was given to the thread running for long SELECT.There are chances that SELECT would have dropped and table is left FLUSH TABLES to acquire and update revision_version and release the lock for new queries. And backup would have continued. Since I don't think anyone expecting an answer on the other end waiting for the result for query running long hrs.

What is the long term solution?:

You have to ensure no such long time queries to be running at the time of backup.
Looks like this could be a new deployment or someone triggered a bad query and didn't bother to close the session.
Try to kill the query if a query is running more than Xsecs (Depends upon your requirement). Or
Have an adjustment with the teams to tune the queries or send it to Admin to give them the result data or give them a separate slave all together.

Related Solutions

MySQL stuck in EXPLAIN statement

For some queries EXPLAIN tries to run some subqueries to get the stats - it is a known bug which is fixed in 5.6 and in MariaDB (10 at least, not sure about 5.5).

Your indexes are not good enough - you have two single-column indexes on yearly_project_commits - but the query can use only one at a time.

You should create one multicolumn index with both the columns (project_id, committer_id) - that should turn the subquery into index-only scan with direct access in join and make it substantially faster.

MySQL – Manipulating MyISAM files directly brings down entire database

Is this standard documented behavior?

Is it always a no-no to manipulate the MyISAM files directly?

Yes, and yes, viz.:

normally you should not get corrupted tables unless one of the following happens:

[...]

Some external program is manipulating data files or index files at the same time as mysqld without locking the table properly.

http://dev.mysql.com/doc/refman/5.7/en/crashing.html

Manipulating the data files is not a standard technique.

Note that the files copied from the origin server are guaranteed not to be written to while the copy takes place

That actually doesn't matter, either... you can't safely copy from or to while the server has control of the files.

For MyISAM tables, this is the correct solution, for both source and target:

       -- obtain an exclusive lock on the table
mysql> LOCK TABLE t1 WRITE;
       -- force the server to evict the table from the open table cache,
       -- flush changes to disk, close any open filehandles, and block 
       -- other queries that try to access the table
mysql> FLUSH TABLE t1;
mysql> -- <<< wait for the previous query to return to the prompt
       -- then copy the file; this connection must remain established
       -- in order to hold the table lock.  When the copy is complete...
mysql> UNLOCK TABLES;

Note that the above is only valid for MyISAM. This does not work at all for InnoDB.

What is the recommended way of sharing data between servers

Replication is the most obvious candidate if you really need this functionality. If you need this data to feed into multiple targets, make them all slaves and the data source as master. It doesn't matter if they have slaves of their own -- any MySQL server can be both a master and slave, and this will work fine as long as there are no collisions among object names.

Federation is another solution, but it is not a magic bullet -- successfully using the FEDERATED storage engine requires an in-depth understanding of how it "thinks" so that you can outsmart it into behaving optimally.

Or simply an external process running code that connects to both servers and synchronizes the data by comparing the two tables and crafting insert/update/delete statements to bring one table into consistency with the other, when timing as isn't as critical.

Or abstracting this away so that the application doesn't need everything to be on one server.

Best Answer

Related Solutions

MySQL stuck in EXPLAIN statement

MySQL – Manipulating MyISAM files directly brings down entire database

Related Question