MySQL reads / writes per table

MySQLoptimization

I am optimising our DB. Essentially I am trying to find the most written and the most read tables in our db. After that I will follow by sym-linking those tables into separate drives.

Is there a way of following each tables activity ?
As in follow IOPS, writes, reads per table ?

Best Answer

Method 1

If you are using Percona Server or MariaDB (>= 5.2), you can simply set the userstat/userstat_running variable to enable a bunch of new INFORMATION_SCHEMA tables including one called TABLE_STATISTICS that provides exactly this information.

For example:

mysql> SELECT TABLE_NAME, ROWS_READ, ROWS_CHANGED, ROWS_CHANGED_X_INDEXES FROM TABLE_STATISTICS ORDER BY ROWS_CHANGED DESC LIMIT 5;
+-------------------+------------+--------------+------------------------+
| TABLE_NAME        | ROWS_READ  | ROWS_CHANGED | ROWS_CHANGED_X_INDEXES |
+-------------------+------------+--------------+------------------------+
| user              |   21122527 |      5989231 |               23956924 |
| audit             |       1208 |      5020929 |               20083716 |
| sometemp          |   13995426 |      3182150 |                9546450 |
| creditcards       |    3566482 |      2998976 |               11995904 |
| order             | 2147483647 |      2662606 |               53252120 |
+-------------------+------------+--------------+------------------------+

ROWS_CHANGED would correspond to the most written to tables and ROWS_READ would be the most read from. You should also look at INDEX_STATISTICS to find your most and least used indexes.

Method 2

If you are not using Percona Server, you could use pt-query-digest to capture a sample of your queries and then filter out only INSERT/UPDATE/DELETEs. That would look something like this:

mysql> SELECT @@GLOBAL.slow_query_log_file;
+------------------------------------------+
| @@GLOBAL.slow_query_log_file             |
+------------------------------------------+
| /var/logs/mysql/slowquery.log            |
+------------------------------------------+
1 row in set (0.00 sec)
mysql> SET GLOBAL slow_query_log_file='/tmp/allqueries.log';
mysql> SELECT @@GLOBAL.long_query_time;
+--------------------------+
| @@GLOBAL.long_query_time |
+--------------------------+
|                 0.250000 |
+--------------------------+
1 row in set (0.00 sec)
mysql> SET GLOBAL long_query_time = 0;
mysql> FLUSH LOGS;
mysql> SLEEP 600; SET GLOBAL long_query_time = 0.25; SET GLOBAL slow_query_log_file='/var/logs/mysql/slowquery.log'; FLUSH LOGS;

Now you have a file, /tmp/allqueries.log that contains every query executed on your server for ~10 minutes.

Next, analyze it with pt-query-digest to get the most frequently written to tables:

pt-query-digest /tmp/allqueries.log --group-by=distill --filter '$event->{arg} =~ m/^(update|delete|insert)/i' --limit 5 > /tmp/writes.txt

If you examine /tmp/writes.txt, you will see a section near the top that looks like this:

# Profile
# Rank Query ID Response time Calls R/Call Apdx V/M   Item
# ==== ======== ============= ===== ====== ==== ===== ====================
#    1 0x        0.0558 26.8%   282 0.0002 1.00  0.00 INSERT UPDATE user
#    2 0x        0.0448 21.5%   246 0.0002 1.00  0.00 UPDATE audit
#    3 0x        0.0228 10.9%    11 0.0021 1.00  0.00 UPDATE sometemp
#    4 0x        0.0108  5.2%    16 0.0007 1.00  0.00 UPDATE creditcards
#    5 0x        0.0103  4.9%    43 0.0002 1.00  0.00 UPDATE order

Roughly, these are your most written to tables for the duration of the sample you chose. To get the most read from tables (roughly), you can change the --filter parameter to --filter '$event->{arg} =~ m/^select/i' and you will see similar output.

If you are only interested in writes, you can pass a binary log into pt-query-digest and get similar results:

mysqlbinlog mysql-bin.000511 | pt-query-digest --type=binlog --group-by=distill > /tmp/writes.txt

You can also get the same data with tcpdump and pt-query-digest --type=tcpdump

So, this being said, assuming that you are using InnoDB tables, I highly doubt that you will see much performance benefit from doing this at all. Because of the way data is buffered to the InnoDB log and then written to disk, I wouldn't expect much or any performance gain from moving individual tables around like this. You might see some benefit from moving the InnoDB log files themselves to separate, faster disk to separate the log read/writes from the tablespace read/writes, but even that is questionable. Investing in fast, high quality RAID arrays with a battery backed cache (or better yet, SSD) will be a better use of your resources.

Related Solutions

Mysql – Solution for deploying multiple instances of Django Server with Database replication

I'm going to highlight one of the products listed in the wiki link provided by Jack Douglas.

Galera is a product I've been hearing a lot about recently, but have not had a cause to implement it yet into a production environment.

Back in July, Percona wrote a brief blog about implementing it, but mentioned there was no 'production' release at that time.

They've since released the first major version, v1.0. And Vladimir has been shedding more light on it this month. And with some benchmarks

Due to your write-heavy environment, I highlight this use-case:

Distributing writes across the cluster will harness the CPU power in slave nodes for better use to process client write transactions. Due to the row based replication method, only changes made during a client transaction will be replicated and applying such a transaction in slave applier is much faster than the processing of the original transaction. Therefore the cluster can distribute the heavy client transaction processing across many master nodes and this yields in better write transaction throughput overall.

_{Disclaimer: I'm not affiliate with Galeria or Percona, and have not used it personally...yet.}

MySQL InnoDB + high table creation-destruction rate

What kind of memory and HD space leak problems can this structure give us in MySQL-InnoDB?

Given your description it's impossible to give any meaningful value to memory or storage requirements. You say "16 million per year" but what is that? 16 millions rows/tables? With no schema example it's still impossible to estimate. For tables you do have you can run

select table_name, data_length, index_length, data_free from information_schema.tables where table_schema='mydatabase';

To get a feel for current storage requirements. What you'll be more interested in is growth rate over time though to ensure you're not going to run out of space in 3months down the road. MySQL doesn't give a built in growth rate metric so that's something you'll need to periodically poll and record yourself.

For memory requirements, in a perfect world your innodb_buffer_pool would be big enough to accommodate all your data and indexes in memory. It's never a perfect world though and can become impractical or cost prohibitive to have such a machine. At the same time, strive to make the buffer pool as large as you can to keep most frequently accessed data in memory.

If you are straight up dropping tables the storage space will be reclaimed to the file system if and only if you are configured with innodb_file_per_table = true. Other wise all your data is stored along with metadata and transaction history in ibdata1.

Is this a good practice or should we consider a structure change?

It's not unheard of to offline old historical data. However your description makes it seems like this becoming active is some what common. I can't imagine any on demand process where you have to rebuild a table before being able to service the request.

Rebuilding a table and it's indexes is much slower then a raw copy of a file of similar size to the final table.

You say it's being "packed" and "unpacked" indicating it's still being stored somewhere. If it's going to be needed to be accessible by an application just keep it in the database and let old pages fall out of the buffer pool. When requested again the information needs to be read from disk versus having a table completely rebuilt.

Best Answer

Related Solutions

Mysql – Solution for deploying multiple instances of Django Server with Database replication

MySQL InnoDB + high table creation-destruction rate

Related Question