MySQL – Why Left Joining Two Views is Slow

join;MySQLview

Here is a question I asked yesterday – https://stackoverflow.com/questions/22180727/left-joining-two-views-is-slow.

I got a good answer that helped me but I don't understand why the LEFT JOIN is so much slower than the lookup. The LEFT JOIN was 16 seconds – and I am pretty sure my tables are at least 90% optimized – and when doing the lookup it is just .14 seconds. When I LEFT JOIN tables it is not this slow so why views?

Best Answer

According to the MySQL Documentation on Views

Views (including updatable views) are available in MySQL Server 5.6. Views are stored queries that when invoked produce a result set. A view acts as a virtual table.

The first thing that must be realized about a view is that it produces a result set. The result set emerging from query invoked from the view is a virtual table because it is created on-demand. There is no DDL you can summon afterwards to immediately index the result set. For all intents and purposes, the result set is a table without any indexes. In effect, the LEFT JOIN you were executing is basically a Cartesian product with some filtering.

To give you a more granular look at the JOIN of two views, I will refer to a post I made last year explaining the internal mechanisms MySQL uses to evaluate JOINs and WHEREs (Is there an execution difference between a JOIN condition and a WHERE condition?). I will show you the mechanism as published in Understanding MySQL Internals (Page 172):

Determine which keys can be used to retrieve the records from tables, and choose the best one for each table.
For each table, decide whether a table scan is better that reading on a key. If there are a lot of records that match the key value, the advantages of the key are reduced and the table scan becomes faster.
Determine the order in which tables should be joined when more than one table is present in the query.
Rewrite the WHERE clauses to eliminate dead code, reducing the unnecessary computations and changing the constraints wherever possible to the open the way for using keys.
Eliminate unused tables from the join.
Determine whether keys can be used for ORDER BY and GROUP BY.
Attempt to simplify subqueries, as well as determine to what extent their results can be cached.
Merge views (expand the view reference as a macro)

OK, it seems like indexes should be used. However, look closer. If you substitute word View for Table, look what happens to the mechanism's execution:

MECHANISM MODIFIED

Determine which keys can be used to retrieve the records from views, and choose the best one for each view.
For each view, decide whether a view scan is better that reading on a key. If there are a lot of records that match the key value, the advantages of the key are reduced and the view scan becomes faster.
Determine the order in which views should be joined when more than one views is present in the query.
Rewrite the WHERE clauses to eliminate dead code, reducing the unnecessary computations and changing the constraints wherever possible to the open the way for using keys.
Eliminate unused views from the join.
Determine whether keys can be used for ORDER BY and GROUP BY.
Attempt to simplify subqueries, as well as determine to what extent their results can be cached.
Merge views (expand the view reference as a macro)

Every table (view) has no index. Thus, working with virtual tables, temp tables, or tables with no indexes really becomes indistinct when doing a JOIN. The keys used are just for JOIN operations, not so much for looking things up faster.

Think of your query as picking up two phone books, the 2014 Yellow Pages and the 2013 Yellow Pages. Each Yellow Pages book contains the White Pages for Residential Phone Numbers.

In late 2012, a database table was used to generate the 2013 Yellow Pages.
During 2013
- People changed phone numbers
- People received new phone numbers
- People dropped phone numbers, switching to cell phone
In late 2013, a database table was used to generate the 2014 Yellow Pages.

Obviously, there are differences between the two Phone Books. Doing a JOIN of database tables to figure out the differences between 2013, and 2014 should pose no problem.

Imagine merging the two phone books by hand to locate differences. Sounds insane, doesn't it? Notwithstanding, that is exactly what you are asking mysqld to do when you join two views. Remember, you are not joining real tables and there are no indexes to piggyback from.

Now, let's look back at the actual query.

SELECT DISTINCT
viewA.TRID, 
viewA.hits,
viewA.department,
viewA.admin,
viewA.publisher,
viewA.employee,
viewA.logincount,
viewA.registrationdate,
viewA.firstlogin,
viewA.lastlogin,
viewA.`month`,
viewA.`year`,
viewA.businesscategory,
viewA.mail,
viewA.givenname,
viewA.sn,
viewA.departmentnumber,
viewA.sa_title,
viewA.title,
viewA.supemail,
viewA.regionname
FROM
viewA
LEFT JOIN viewB ON viewA.TRID = viewB.TRID
WHERE viewB.TRID IS NULL

You are using a virtual table (table with no indexes), viewA, joining it to another virtual table, viewB. The temp table being generated intermittently would be as large as viewA. Then, you running an internal sort on the large temp table to making it distinct.

EPILOGUE

Given the internal mechanisms of evaluating JOINs, along the transient and indexless nature of the result set of a view, your original query (LEFT JOIN of two views) should be getting running times that are orders of magnitude. At the same time, the answer you got from StackOverflow should perform well, given the same JOIN algorithm I just described.

I hope the gory details I just posted answers your question as to why.

Best Answer

MECHANISM MODIFIED

EPILOGUE

Related Solutions

MySQL – Rebuilding Indexes to Prevent Site Downtime

Mysql – inner join on PK with extra criteria slow despite indices

Related Question