MySQL – Determine Partition Loading into Memory

indexmemoryMySQLmysql-5.6partitioning

I have a server where the MySQL indexes won't fit into memory (2TB total index, 512GB RAM on machine), and am wondering how MySQL decides what parts to put into memory and what not.

The relevant tables are range partitioned on ID, and I'm told (but can't find anything proving this) that the last 3 months is available in memory.

I'm hoping there's a more substantial answer available somewhere.

Best Answer

InnoDB's "buffer_pool" contains blocks that are kept there in (roughly) a "least recently used" order. That is, when you touch a block for either reading or writing, it is pulled into the buffer_pool and marked as "most recently used". If there was not room for this block, then the "least recently used" block was first removed from the buffer_pool.

PARTITIONing has nothing to do with the caching scheme. In particular, partitioning by the PRIMARY KEY is almost always useless, and almost always redundant with letting the caching (described above) happen.

If you would like to discuss performance improvements, please provide SHOW CREATE TABLE and the slow query.

Using PARTITION BY RANGE(TO_DAYS(...)) is an excellent use of partitioning, but only because it makes dropping "old" data very efficient. Note: I did not say BY RANGE(id).

Even if the math is correct, you may or may not have "the last 3 months" cached in RAM -- if someone scanned stuff from 4 months ago, someblocks would have to be bump to allow for that scan. Then the 4-month-old bllcks would clutter the buffer_pool until they get bumped out.

Update

Since you have a "mission critical" system that must accept writes all the time, and have no PRIMARY KEY, you need some drastic action. Here's what I would do.

I would back up into the code. If there is not already code between the "users" and the database, I would start by building such a layer and force the "users" to call an API that say "write ... to the database".

To start with, the API would simply do an INSERT like the original code. This has minimal impact.

Then I would prepare new code, wherein the API has more complex code behind it. In particular, it would be writing to the existing table and the new table (with PRIMARY KEY, new index(es), changed datatypes, PARTITIONing, etc).

Meanwhile, another script would be copying, in chunks, the existing data from the existing table to the new table. (This is part of what pt-online-schema-change does, but we will take some liberties.)

I suggest chunk size of 1000 rows, each chunk as a its own 'transaction' (assuming you are a using InnoDB). The chunking should follow some index, whichever one closes to being unique. (An yes/no column would be poor because you could not get more than 2 chunks.)

When all finished, you will have two copies of the data; you can test before dropping the old table.

Reads can proceed unchanged during the process because the old table remains intact.

Some blogs with tips:

Related Solutions

Mysql – Why does MySQL say I’m out of memory

Given this is a Windows installation, @DTest still provided the initial proper direction.

Apply the following formula:

Most people use this:

Maximum MySQL Memory Usage = innodb_buffer_pool_size + key_buffer_size + (read_buffer_size + sort_buffer_size) X max_connections

I prefer this:

Maximum MySQL Memory Usage = innodb_buffer_pool_size + key_buffer_size + ((read_buffer_size + read_rnd_buffer_size + sort_buffer_size + join_buffer_size) X max_connections)

These variables are the ones you need to adjust until the formula yields 80% of installed RAM or less.

sort_buffer_size
read_buffer_size
read_rnd_buffer_size
join_buffer_size
max_connections

MySQL – Does MySQL Load Entire Row or Just Columns Used in WHERE, ORDER BY, and GROUP BY?

Because it is missing, I would suppose that your query execution plan shows a full-scan on your raw-data table. If this is the case, then your entire rows will be loaded into memory (including raw columns), which obviously is not ok.

The easiest way to avoid this would be to create an index on the filtering columns, so that the query will scan the index prior, and only if the row is a candidate for your result-set, will reach the table and load it into memory.

From comment discussion:

The query performs fast mostly because of the index (thus the slow performance when removing it) ...but your rows, are still loaded entirely. As you thought, only some of the rows get into the memory, but this does not mean that only parts of them are loaded. The trick here is that the index points only to those rows that are relevant. You're using the index with a ">" operator, which causes a Range Scan, so it is possible that more rows than you need/get are referenced.

You could try to apply the 18 limit earlier, rather at the end of the query. Or you could make a new table with only the raw column and a foreign key to the first table. Link the result from the first table just at the very end of the query. Remember though, if you want to order the rows, and perform the ORDER BY after you join the two tables, it will still perform the sorting with the raw data loaded into memory.

Try to sort the rows of the main table before joining the raw table. If the main table is leading the join, then you should get the rows sorted after the join too.

Best Answer

Related Solutions

Mysql – Why does MySQL say I’m out of memory

MySQL – Does MySQL Load Entire Row or Just Columns Used in WHERE, ORDER BY, and GROUP BY?

Related Question