I'm going to highlight one of the products listed in the wiki link provided by Jack Douglas.
Galera is a product I've been hearing a lot about recently, but have not had a cause to implement it yet into a production environment.
Back in July, Percona wrote a brief blog about implementing it, but mentioned there was no 'production' release at that time.
They've since released the first major version, v1.0. And Vladimir has been shedding more light on it this month. And with some benchmarks
Due to your write-heavy environment, I highlight this use-case:
Distributing writes across the cluster will harness the CPU power in slave nodes for better use to process client write transactions. Due to the row based replication method, only changes made during a client transaction will be replicated and applying such a transaction in slave applier is much faster than the processing of the original transaction. Therefore the cluster can distribute the heavy client transaction processing across many master nodes and this yields in better write transaction throughput overall.
Disclaimer: I'm not affiliate with Galeria or Percona, and have not used it personally...yet.
What kind of memory and HD space leak problems can this structure give
us in MySQL-InnoDB?
Given your description it's impossible to give any meaningful value to memory or storage requirements. You say "16 million per year" but what is that? 16 millions rows/tables? With no schema example it's still impossible to estimate. For tables you do have you can run
select table_name, data_length, index_length, data_free from information_schema.tables where table_schema='mydatabase';
To get a feel for current storage requirements. What you'll be more interested in is growth rate over time though to ensure you're not going to run out of space in 3months down the road. MySQL doesn't give a built in growth rate metric so that's something you'll need to periodically poll and record yourself.
For memory requirements, in a perfect world your innodb_buffer_pool would be big enough to accommodate all your data and indexes in memory. It's never a perfect world though and can become impractical or cost prohibitive to have such a machine. At the same time, strive to make the buffer pool as large as you can to keep most frequently accessed data in memory.
If you are straight up dropping tables the storage space will be reclaimed to the file system if and only if you are configured with innodb_file_per_table = true. Other wise all your data is stored along with metadata and transaction history in ibdata1.
Is this a good practice or should we consider a structure change?
It's not unheard of to offline old historical data. However your description makes it seems like this becoming active is some what common. I can't imagine any on demand process where you have to rebuild a table before being able to service the request.
Rebuilding a table and it's indexes is much slower then a raw copy of a file of similar size to the final table.
You say it's being "packed" and "unpacked" indicating it's still being stored somewhere. If it's going to be needed to be accessible by an application just keep it in the database and let old pages fall out of the buffer pool. When requested again the information needs to be read from disk versus having a table completely rebuilt.
Best Answer
Method 1
If you are using Percona Server or MariaDB (>= 5.2), you can simply set the userstat/userstat_running variable to enable a bunch of new INFORMATION_SCHEMA tables including one called TABLE_STATISTICS that provides exactly this information.
For example:
ROWS_CHANGED would correspond to the most written to tables and ROWS_READ would be the most read from. You should also look at INDEX_STATISTICS to find your most and least used indexes.
See also the MariaDB user statistics documentation.
Method 2
If you are not using Percona Server, you could use pt-query-digest to capture a sample of your queries and then filter out only INSERT/UPDATE/DELETEs. That would look something like this:
Now you have a file,
/tmp/allqueries.log
that contains every query executed on your server for ~10 minutes.Next, analyze it with pt-query-digest to get the most frequently written to tables:
If you examine
/tmp/writes.txt
, you will see a section near the top that looks like this:Roughly, these are your most written to tables for the duration of the sample you chose. To get the most read from tables (roughly), you can change the
--filter
parameter to--filter '$event->{arg} =~ m/^select/i'
and you will see similar output.If you are only interested in writes, you can pass a binary log into
pt-query-digest
and get similar results:You can also get the same data with tcpdump and
pt-query-digest --type=tcpdump
So, this being said, assuming that you are using InnoDB tables, I highly doubt that you will see much performance benefit from doing this at all. Because of the way data is buffered to the InnoDB log and then written to disk, I wouldn't expect much or any performance gain from moving individual tables around like this. You might see some benefit from moving the InnoDB log files themselves to separate, faster disk to separate the log read/writes from the tablespace read/writes, but even that is questionable. Investing in fast, high quality RAID arrays with a battery backed cache (or better yet, SSD) will be a better use of your resources.