MariaDB Subquery Cache Scope and Settings – How to Configure

mariadbsubquery

I am migrating an application from MySQL 5.6 to MariaDB 10. One major new feature of MariaDB is the subquery cache. I am trying to determine the expected effect this cache will have on our application before enabling and testing it.
Under the "Implementation" section, the knowledge base article says this:

Every subquery cache creates a temporary table where the results and all parameters are stored. It has a unique index over all parameters. First the cache is created in a MEMORY table (if doing this is impossible the cache becomes disabled for that expression). When the table grows up to the minimum of tmp_table_size and max_heap_table_size, the hit rate will be checked:

  • if the hit rate is really small (<0.2) the cache will be disabled.
  • if the hit rate is moderate (<0.7) the table will be cleaned (all records deleted) to keep the table in memory
  • if the hit rate is high the table will be converted to a disk table (for 5.3.0 it can only be converted to a disk table).

The first thing I'm trying to determine is what exactly it means by, "Every subquery cache." Since it states that the caches are created in temporary tables, it implies that they must be scoped to the current connection. So my guess is that there is one subquery_cache table created per connection. (I doubt every query has its own cache, because then the line about the cache "growing" wouldn't make sense.) But that's just a guess; I can't find any more details. Perhaps these temporary tables are indeed shared between connections somehow, unlike regular temp tables. Or perhaps there are multiple caches per connection.

Can anyone confirm the number and scope of subquery caches created?

My next question is whether there are any options for tuning the cache. Since we are moving from an optimized MySQL application, we don't currently have any queries that make extensive use of cachable subqueries (since such queries would be very slow under MySQL). We do use plenty of subqueries though, just not in ways that are likely to cause a lot of cache hits. Given that, I expect that enabling the cache will simply create an overhead in temporary table creation and lookup for each subquery, as well as memory used to store the cache tables (which apparently can't be tuned separately from the main tmp_table_size and max_heap_table_size variables). Given that I expect to find that our application as it currently exists is more efficient without the cache enabled. The ideal though would be if it supports something like the 'DEMAND' setting for the main query cache (query_cache), which allows you to specify exactly which queries should use the cache.

Since no such features are documented anywhere, I expect they don't exist, but it would be fantastic if anyone has knowledge regarding why not, or when we might expect them to.

Best Answer

The cache needs to return the same results as running the subquery would, so the scope has to be limited by active transaction too.

But seeing that the cache can only be enabled or disabled as an optimizer feature and not configured otherwise, my guess is that it actually is instantiated once for each specific cacheable subquery for a query where the optimizer decides it is worth it.

That seems to be confirmed by: "It has a unique index over all parameters." - this is not doable if you store results for more than one subquery in the same table.

So it seems to me that the cache is instantiated once for each cacheable subquery and its lifetime would be limited by the outer query execution.

Edit: I checked some source code (disclaimer: I definitely do not fully understand the code as MariaDB is quite big and complex piece of software and my knowledge of C++ is far from expert level). The cache is implemented by the Expression_cache_tmptable. The table used to store the cached results is created in the init() method which is being called from some methods in Item_cache_wrapper and AFAIK lifespan of various Items is single (outer) query execution as different types of Items keep and manage different parts of the query (sort of expression tree if I understand it correctly).

From that I deduce that one instance of the subquery cache is created for each cacheable subquery at the time query is being "prepared" for execution. It is then initialized (and the temp table created) when it is first accessed during the query execution (so only if the subquery is actually executed at least once). The condition of "<0.2" is then checked (for each subquery independently) after 200 misses (that will probably be after processing some 200+ rows of the outer query) to see if the cache has any positive effect, and it is disabled early if it does not help. Then, if the cache table grows too much, it is checked again according to the conditions listed in the manual.

So the subquery cache is not tuneable, but seems to be made as unintrusive as possible. The optimizer will ever instantiate it for subqueries which can at least theoretically use it and it will be disabled early in the query execution if it does not prove useful.

If you are still worried about it slowing your queries down, you can disable it globally by configuring the optimizer_switch and then enable it only for selected queries by calling

SET SESSION optimizer_switch='subquery_cache=on';

before the query execution and optionally turn it off afterwards.