SELECT RELEASE_LOCK Causing ‘MySQL Has Gone Away’ – Troubleshooting

connectivitylockingMySQLmysql-5.5

During a long running process (90m+), our code calls SELECT RELEASE_LOCK('..');. However it always gives a "Mysql has gone away" error when we do this.

We never got to see the return value of the RELEASE_LOCK; the query didn't execute. We lost connection before the result was found.

We have never encountered this problem before. It's after an upgrade of Magento 1.12 to 1.13

We have tried a number of things including increasing various timeouts but to no avail. What might you suggest is a good debugging strategy or something that may be causing this issue?

Best Answer

The most likely explanation for this is that an intermediate piece of network hardware is tearing down or "forgetting" your idle TCP connection, assuming your wait_timeout system variable is not set too low.

Stateful firewalls maintain internal data structures of all current "flows" -- source/destination address/port quads -- and if a flow has no traffic for too long, the record of the flow is simply dropped from the firewall's memory. The next TCP packet to arrive doesn't match any established flows, and doesn't have the SYN flag set, so it's not a new connection, and the firewall considers it invalid and either drops it or spoofs an RST packet from the destination.

Most likely, you would get this same error with any query SELECT NOW(); for example, would probably return the same result, once the connection has been totally idle for too long. Note that running a query doesn't prevent the connection from being "idle" at the TCP layer.

The connection might not disappear from the processlist in the mean time, because, since it's completely idle, the server would not have sent anything and would therefore not have realized that at some point on the network, it is an invalid TCP connection.

This occurs, for just one example in my experience, within Amazon EC2.

The workaround is to enable TCP keepalives in the operating system itself on at least one end of the connection. This will cause TCP traffic to continue to be exchanged by the hosts during the idle time, even though there's no payload on the connection. Enabling these should mediate this issue.

Linux: http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html

Windows: https://technet.microsoft.com/en-us/library/cc957549.aspx

If curious, then you should also, by trial and error, be able to discover a period of time after which idle connections are no longer usable... for example, connections left idle for 14 minutes may be fine, but connections left idle for 16 minutes will always be dead, suggesting the firewall has a 15 minute flow expiration timer.