MySQL Load Impact – Understanding the Load Impact of Zabbix DB Size Query

cpuloadMySQL

I am in the process of implementing Zabbix monitoring on multiple servers at an organization. I have recently run into some resistance from an Oracle / MySQL DBA. He claims that the following query is resource intensive because his database server contains thousands of tables. It's a one liner, but I've split it across multiple lines to help with readability:

echo "select sum($(case "$3" in both|"") echo
"data_length+index_length";; data|index) 
echo "$3_length";; free) echo "data_free";; esac)) from 
information_schema.tables$([[ "$1" = "all" || ! "$1" ]] || 
echo " where table_schema=\"$1\"")$([[ "$2" = "all" || ! "$2" ]] || 
echo "and table_name=\"$2\"");" | HOME=/var/lib/zabbix mysql -N

which looks like it reduces to:

select sum(data_length+index_length) from information_schema.tables;

My specific question is, how resource intensive is this query and should the organization be worried about the performance impact it will have on the DB server if it runs once per minute?

As an experienced UNIX/Linux sysadmin and from reading a little bit from the MySQL manual, it seems like it should be simple addition of values retrieved from the already in memory "information_schema", but I'm hoping for an answer that can definitively describe the internal actions MySQL has to take and the associated impact.

Best Answer

This 'random person on the Internet' can shed some light...

  • The information_schema is not a 'real' database (until MySQL 8.0), rather it is derived from a bunch of data that may or may not be in RAM already.
  • If not all the data is in memory, then the query will have to get it from disk -- by reading lots of .frm files (etc). "Thousands" in your case. So, this could be quite costly.
  • The Data_length, etc are relatively stable; I hope your monitoring is not doing this query every minute; that may be a noticeable burden. Once an hour should be sufficient.
  • What is the value of table_open_cache? Is it more than the number of tables? If not, then this random person on the Internet thinks that you could be in deep ?.
  • How lame! Rather than generating a different query when you ask for "all", the package builds a complex boolean expression.

I have been involved in the monitoring of hundreds of machines. A simple df is sufficient to alert a human to dig into what is causing the problem. Checking the MySQL directory with a du is one of several secondary steps.

But, I disagree with every monitor I have seen. I don't care how big the data is. It does not matter if the disk is 90% full -- as long as it is not growing. I do care if it is 20% full but growing so fast that it will fill up before the end of the week.

Sure graphing the size is handy, but that means that a human has to look at it (and not fall asleep). I want an alert. Furthermore, due to cyclical things, such as binlog purging, the graph may look like saw teeth -- up, down, up down! This makes "trend analysis" difficult.

Back to your question of whether that query is "intensive". If Zabbix is worth having it will give you the answer on a silver (or electronic) platter! Does it monitor the "slowlog"? If so, does this query show up? QED.