I (well, my cron script) tried killall mysqldump
and that didn't end very well – mysql server stopped accepting connections after a while.
It was Debian Jessie machine with mysql 5.5.55-0+deb8u1
.
Usage scenario was:
-
there was a long running (few hours)
SELECT
query which was either really that slow or the client that sent it was having problems (query state wasSending data
), but all the other queries were happily coming and going (only the load was perhaps little higher). -
in the night backup was being run with
mysqldump --max_allowed_packet=2147483648 --hex-blob --single-transaction --master-data --routines --order-by-primary --databases db1 db2 db3... | pigz -p8 > backup.sql.gz
. It never finished, probably because it was waiting for theSELECT
above to finish first (guessing here – it was only thing which looked out of ordinary, and same setup worked OK for months). -
cron job run in the morning, which had
killall -q mysqldump
which was supposed to safely terminate backup in case it was not finished by set time (notifying the admin to examine and fix the problem later), thus allowing people to continue working with mysql server normally. -
result however was full connection table and thus no user able to log in to mysql server. There was
FLUSH /*!40101 LOCAL */ TABLES
query stuck inWaiting for table flush
and hundreds ofSELECT
queries stuck in sameWaiting for table flush
state. -
in addition, admin killing
LOCK TABLES
mysql query didn't help, as other SELECT queries remained inWaiting for table flush
(which seems to be intended behavior?)
Restarting mysql server finally "fixed" the problem. However, Wanting to avoid this situation (and emergency admin interventions) repeating , I'd like to safely terminate mysqldump backup in Debian Jessie mysql-5.5.55 (or upcoming Debian Stretch mariadb-10.1.23-8). Is there a way?
If not, what are other options for accomplishing mysql backup and avoiding server load in the morning (which is – in this case – almost as bad as completely hung server)?
(I'd like to stay with Debian Stable packages if at all possible)
Best Answer
Since you are using
--master_data
to take a consistent value of master status.The internals of mysqldump will issue below commands to mysql server.
What had happened?:
Your backup would have just started and there was a query that was going on running on a particular table for a longer period of time before even FLUSH TABLES command and didn't release the lock on the table and
FLUSH TABLES
must have been waiting for that thread to get completed or keep trying to flush until the revision_version of that table is same as all tables.Thus you get other threads blocked for other tables as well. As this is entire DBs*.Tables* level lock while flush tables was going on. Finally it would have got accumulated every new connection in processlist and piled up until
max_connections
and not allowing anyone to login.Let's say if you have managed to login to the terminal and tried to kill flush tables, I don't think there is a way to pull back or rollback the flushed tables that has been done and release its own thread connection. So it might in a
KILLED STATE
for longer time. And thus you might have reached the last option that is to restart the server.How to fix it?:
At the time of issue, when admin managed to login to mysql prompt.
Instead of issuing kill command on FLUSH TABLES thread, if kill was given to the thread running for long SELECT.There are chances that SELECT would have dropped and table is left FLUSH TABLES to acquire and update revision_version and release the lock for new queries. And backup would have continued. Since I don't think anyone expecting an answer on the other end waiting for the result for query running long hrs.
What is the long term solution?:
You have to ensure no such long time queries to be running at the time of backup.
Looks like this could be a new deployment or someone triggered a bad query and didn't bother to close the session.
Try to kill the query if a query is running more than Xsecs (Depends upon your requirement). Or