The problem is not really index fragmentation. It is more associated with the index statistics. If you are using MyISAM, sudden spikes of INSERTs would throw the statistics off in the eyes of the MySQL Query Optimizer. This would cause the MySQL Query Optimizer to take horrific guesses within the EXPLAIN plans of SELECT queries.
If you are using InnoDB, ANALYZE TABLE
becomes completely useless.
As long as the table is reasonably small, ANALYZE TABLE
is all you can really do for MyISAM. Rebuilding indexes may help InnoDB tables periodically.
As long as tables are not heavily written, index statistics will remain stable and query EXPLAIN plans will remain more consistent.
Just remember: once you have tons of INSERTs
, UPDATEs
, and DELETEs
, all bets are off for trustworthy index statistics until the next rebuild or ANALYZE TABLE
.
The optimizer does not see that your conditions are correlated and picks the wrong access method.
Basically, it considers two options:
Scan the index on siteVisitId
until the first match on site_visits
and the first satisfied timestamp
condition.
Scan the index on timestamp
until the first match on site_visits
.
Since timestamp
is a part of the primary key and siteVisitId
is not, the second plan would involve table lookups on product_views
which is several times more slow than a pure index scan (note Using index
in the plan).
The optimizer calculates the conditional probability of the timestamp
condition being satisfied (given that a corresponding site_visit
record exists) and compares it to the overhead of the table access.
Since your timestamp
condition is quite wide (as seen on the index histograms), the optimizer prefers the first method.
However, since both siteVisitId
and timestamp
are incremental, they are correlated and the conditional probability of both matches is not a mere product of their independent probabilities.
In simple words, you have to filter through many low siteVisitId
until you find the first matching timestamp
, which is exactly what is happening to your query.
You should add ORDER BY timestamp
to your query to make the timestamp
index cheaper as it won't have to sort. It would also help to create an index on timestamp, siteVisitId
(in this order) to avoid table lookups.
Best Answer
According to the MySQL Documentation on Views
The first thing that must be realized about a view is that it produces a result set. The result set emerging from query invoked from the view is a virtual table because it is created on-demand. There is no DDL you can summon afterwards to immediately index the result set. For all intents and purposes, the result set is a table without any indexes. In effect, the LEFT JOIN you were executing is basically a Cartesian product with some filtering.
To give you a more granular look at the JOIN of two views, I will refer to a post I made last year explaining the internal mechanisms MySQL uses to evaluate JOINs and WHEREs (Is there an execution difference between a JOIN condition and a WHERE condition?). I will show you the mechanism as published in Understanding MySQL Internals (Page 172):
ORDER BY
andGROUP BY
.OK, it seems like indexes should be used. However, look closer. If you substitute word
View
forTable
, look what happens to the mechanism's execution:MECHANISM MODIFIED
views
, and choose the best one for eachview
.view
, decide whether aview
scan is better that reading on a key. If there are a lot of records that match the key value, the advantages of the key are reduced and theview
scan becomes faster.views
should be joined when more than oneviews
is present in the query.views
from the join.ORDER BY
andGROUP BY
.Every table (view) has no index. Thus, working with virtual tables, temp tables, or tables with no indexes really becomes indistinct when doing a JOIN. The keys used are just for JOIN operations, not so much for looking things up faster.
Think of your query as picking up two phone books, the 2014 Yellow Pages and the 2013 Yellow Pages. Each Yellow Pages book contains the White Pages for Residential Phone Numbers.
Obviously, there are differences between the two Phone Books. Doing a JOIN of database tables to figure out the differences between 2013, and 2014 should pose no problem.
Imagine merging the two phone books by hand to locate differences. Sounds insane, doesn't it? Notwithstanding, that is exactly what you are asking mysqld to do when you join two views. Remember, you are not joining real tables and there are no indexes to piggyback from.
Now, let's look back at the actual query.
You are using a virtual table (table with no indexes), viewA, joining it to another virtual table, viewB. The temp table being generated intermittently would be as large as viewA. Then, you running an internal sort on the large temp table to making it distinct.
EPILOGUE
Given the internal mechanisms of evaluating JOINs, along the transient and indexless nature of the result set of a view, your original query (LEFT JOIN of two views) should be getting running times that are orders of magnitude. At the same time, the answer you got from StackOverflow should perform well, given the same JOIN algorithm I just described.
I hope the gory details I just posted answers your question as to why.