Mysql – Does MySQL load the entire row into memory, or just the columns used in WHERE, ORDER BY and GROUP BY

join;memoryMySQL

I have a table that isn't very large row wise (in the 100ks range), but that contains a lot of raw data that is very large in size. Despite having a relatively little number of rows, it's around 1.5GB.

So it's quite important to know if MySQL loads the entire row into memory, or just the columns used in WHERE, ORDER BY and GROUP BY and indexes when performing the query, and the rest of the columns at the very end?

An example query:

SELECT HugeDataTable.*, Table2.Name
FROM Table1
LEFT JOIN Table2 ON Table1.`ID` = Table2.`Table1ID` 
LEFT JOIN HugeDataTable FORCE INDEX(RowOrder) ON Table2.`ID` = HugeDataTable.`Table2ID` 
WHERE HugeDataTable.Category = 5
AND HugeDataTable.RowOrder >10000 AND HugeDataTable.ID <> "h4324h534"
ORDER BY HugeDataTable.`RowOrder` DESC LIMIT 18 ;

Using Explain SELECT I've managed to find that MySQL scans around 70k rows per query. The query is rather fast, but I'm not sure if it's due to row caching, as I can't simulate a heavy load on the server.

So, my question is, will the columns containing the large raw data be loaded after the query limits the result to 18 rows, and thus loading only the little raw data needed in the end?

Or will they be loaded before the limit, and so 70k rows, which are around 1GB's worth of data be loaded before the limit? And if it's the latter case, what can be made to prevent such a thing, since the server only has 1GBs of RAM.

Edit: I've added the EXPLAIN.

id  select_type  table          type    possible_keys                       key               key_len  ref                           rows   Extra                     
1   SIMPLE       HugeDataTable  range   Table2ID,Category,RowOrder          RowOrder          9                                      49591  Using where               
1   SIMPLE       Table2         eq_ref  PRIMARY                             PRIMARY           10       const,HugeDataTable.Table2ID  1      Using where; Using index  
1   SIMPLE       Table1         ref     PRIMARY                             PRIMARY           2        Table2.Table1ID               1    

Best Answer

Because it is missing, I would suppose that your query execution plan shows a full-scan on your raw-data table. If this is the case, then your entire rows will be loaded into memory (including raw columns), which obviously is not ok.

The easiest way to avoid this would be to create an index on the filtering columns, so that the query will scan the index prior, and only if the row is a candidate for your result-set, will reach the table and load it into memory.

From comment discussion:

The query performs fast mostly because of the index (thus the slow performance when removing it) ...but your rows, are still loaded entirely. As you thought, only some of the rows get into the memory, but this does not mean that only parts of them are loaded. The trick here is that the index points only to those rows that are relevant. You're using the index with a ">" operator, which causes a Range Scan, so it is possible that more rows than you need/get are referenced.

You could try to apply the 18 limit earlier, rather at the end of the query. Or you could make a new table with only the raw column and a foreign key to the first table. Link the result from the first table just at the very end of the query. Remember though, if you want to order the rows, and perform the ORDER BY after you join the two tables, it will still perform the sorting with the raw data loaded into memory.

Try to sort the rows of the main table before joining the raw table. If the main table is leading the join, then you should get the rows sorted after the join too.