MySQL: Optimize UNION with “ORDER BY” in inner queries

MySQLorder-byperformancequery-performancesubqueryunion

I just set up a logging system which consists of multiple tables with the same layout.

There is one table for each data source.

For the log viewer, I want to

  • UNION all the log tables,
  • filter them by account,
  • add a pseudo column for identification of the source,
  • sort them by time,
  • and limit them for pagination.

All tables contain a field called zeitpunkt that is an indexed date/time column.

My first attempt was:

(SELECT l.id, l.account_id, l.vnum, l.count, l.preis, l.zeitpunkt AS zeit,
 'hp' AS source FROM is_log AS l WHERE l.account_id = 730)

UNION

(SELECT l.id, l.account_id, l.vnum, l.count, l.preis, l.zeitpunkt,
 'ig' AS source FROM ig_is_log AS l WHERE l.account_id = 730)

ORDER BY zeit DESC LIMIT 10;

The optimizer cannot use the indexes here because all rows from both tables are returned by the subqueries and sorted after the UNION.

My workaround was the following:

(SELECT l.id, l.account_id, l.vnum, l.count, l.preis, l.zeitpunkt AS zeit,
 'hp' AS source FROM is_log AS l WHERE l.account_id = 730
 ORDER BY l.zeitpunkt DESC LIMIT 10)

UNION

(SELECT l.id, l.account_id, l.vnum, l.count, l.preis, l.zeitpunkt,
 'ig' AS source FROM ig_is_log AS l WHERE l.account_id = 730
 ORDER BY l.zeitpunkt DESC LIMIT 10)

ORDER BY zeit DESC LIMIT 10;

I was expecting the query engine would use the indexes here since both subqueries should be sorted and limited already prior to the UNION, which then merges and sorts the rows.

I really thought this would be it, but running EXPLAIN on the query tells me the subqueries still search both tables.

EXPLAINing the subqueries themselves shows me the desired optimization but UNIONing them together it does not.

Did I miss something?

I know that ORDER BY clauses inside UNION subqueries are ignored without a LIMIT, but there is a limit.

Edit:
Actually, there probably will also be queries without the account_id condition.

The tables already exist and are filled with data. There might be changes in the layout depending on the source so I want to keep them divided. Additionally, the logging clients use different credentials for a reason.

I have to keep a kind of layer between the log readers and the actual tables.

Here are the execution plans for the whole query and the first subquery as well as the table layout in detail:

https://gist.github.com/ca8fc1093cd95b1c6fc0

Best Answer

Just out of curiosity, can you try this version? It may trick the optimizer to use the same indices that the subqueries would use separately:

SELECT *
FROM
(SELECT l.id, l.account_id, l.vnum, l.count, l.preis, l.zeitpunkt AS zeit,
 'hp' AS source FROM is_log AS l WHERE l.account_id = 730
 ORDER BY l.zeitpunkt DESC LIMIT 10) 
    AS a

UNION ALL

SELECT *
FROM
(SELECT l.id, l.account_id, l.vnum, l.count, l.preis, l.zeitpunkt,
 'ig' AS source FROM ig_is_log AS l WHERE l.account_id = 730
 ORDER BY l.zeitpunkt DESC LIMIT 10)
    AS b

ORDER BY zeit DESC LIMIT 10;

I still think that the best index you could have is the compound (account_id, zeitpunkt). It would yield the 10 rows fast, and no tricks would ne needed.