Mysql – Lock time not involved in MySQL slow query selection

lockingMySQLperformanceslow-log

I wish to analyze some read-write contentions problems on a very huge table where writes are very fast, less than 1ms in most case. The point is that the contention makes those write queries sometimes wait for the lock for a long time, i.e. hundreds of milliseconds. But the query itself remains very fast so it's not selected as a "slow query", which threshold is actually 10ms.

The only way to gather the information I need is to put à 0ms threshold so I can capture all queries, but it's not sustainable for more than 10mn on our production environment, where it would produce about 200MB of logs per minute.

In fact I can't understand why MySQL don't include the lock time, added to the execution time, for the slow query log selection process. At least, I wish there were an option for that.

Am I missing something ?

Best Answer

Confirming what you already know, for future reference:

"The time to acquire the initial locks is not counted as execution time."

http://dev.mysql.com/doc/refman/5.6/en/slow-query-log.html

It stands to reason, at least to me, that a "slow query" is best defined as a query that is poorly-optimized and takes longer than it should even when run in isolation. It follows from that, that queries which only coincidentally take a long time to run because of circumstances unrelated to the query itself -- such as waiting for table locks -- should not be included in the slow query log.

I base this on two things: intuition, since the slow query log can also "log_queries_not_using_indexes" -- which is presumably also intended to catch bad optimization before the data set grows to the point that the queries actually turn into "slow" queries... and the fact that it doesn't seem to have been designed that way just because it was easier to implement or for any architectural limitation.

In sql/sql_class.h:

void update_server_status()
  {
    ulonglong end_utime_of_query= current_utime();
    if (end_utime_of_query > utime_after_lock + variables.long_query_time)
      server_status|= SERVER_QUERY_WAS_SLOW;
  }

If utime_after_lock were replaced with start_utime ... that should drive the behavior you are expecting. YMMV, but that's what it looks like from here, from a brief look into the source code.

Alternately, you could write a script in your favorite language to hold a connection to the server and execute SHOW PROCESSLIST every "n" seconds. The script would "remember" what each thread's Time value was from iteration to iteration, and could report execution times to you with the granularity of the polling interval when it found interesting patterns. When it sees an interesting query, it could switch to SHOW FULL PROCESSLIST to get the entire body of the query.

I would poll my servers for their processlist (not "full") at 1 second intervals -- yes, in production -- and display the number of threads and the current longest-running query on my real-time status dashboard and have never seen evidence of an ill-effect from this.