Mysql – Joining two Mysql VIEWs takes a very long time

join;MySQLperformancequery-performanceview

I have a textbox (for which I used jQuery auto-completion) in my PHP form where user can insert an actor/actress name and then by clicking "Search" button, a new window is opened showing list of movies by that actor/actress.

This is a query where I get movies by actor name:

$query = $conn->prepare("SELECT DISTINCT c.movieName, c.castName, c.movieImdbId, f.year, f.posterLink FROM cast_movie as c JOIN film_info as f ON c.ImdbId = f.ImdbId WHERE c.castName LIKE :q");
 $query->execute(array(':q' => '%' . $searchText . '%' ));

My question:

The above query works fine if user select a name from auto-completion list. However I would like to enable user to write any name (even if he couldn't find from auto-completion list). For example, If user write "tom" in the textbox, and click on "search" button, I want to show list of all movies by all actors that their name containing word "tom".

For this purpose, I used LIKE :q and ':q' => '%' . $searchText . '%' in the above query, but the query never ends!!(I think because cast_movie is a VIEW which is quite large (with 3 million rows) and joining this view with the other table takes a very long time (actually I waited for 10 minutes and it didn't finished yet).

Could someone kindly let me know if there is any way to fix this? (I read we can use index for joining very large tables, but I think it's not possible to define index for cast_movie since it is VIEW.)

More info about tables and views:

cast_movie is a view which is made by joining "movie_roleNames" and "movies".

movie_roleNames is also a view which is made by joining two tables "Cast" and "nameRoles".

film_info is also a view that is made by joining two tables "movies" and "movies_info".

The structure of the above tables:

table "movies": Id, movieName, ImdbId(unique Id of movies), Rate, numVotes, year (indexes: ImdbId, movieName, year)

table "cast": castName, castImdbID (unique Id of casts) (indexes: castName, castImdbID)

table "nameRoles": Id, castImdbId, movieImdbId, role_Id, (indexes: movieImdbId, castImdbId)

VIEW "movie_roleNames": Id, castName, castImdbId, movieImdbId and the join statement was: SELECT n.Id, c.castName, n.castImdbId, n.movieImdbId FROM nameRoles as n join Cast as c ON n.castImdbId = c.castImdbID

VIEW "cast_movie": Id, castName, castImdbId, movieImdbId, movieName and the join statement was: SELECT m.Id, r.castName, r.castImdbId, r.movieImdbId, m.movieName FROM movie_roleNames AS r JOIN movies AS m ON r.movieImdbId = m.ImdbId

All ideas are highly appreciated,

Best Answer

Using FULLTEXT indexes has to be handled with great care. Why ? While FULLTEXT index searches do work, the MySQL Query optimizer tends to suggest full table scans if you do not express the query properly.

Let's take your query and look for 'tom'

SELECT DISTINCT c.movieName, c.castName, c.movieImdbId, f.year, f.posterLink
FROM cast_movie as c JOIN film_info as f ON c.ImdbId = f.ImdbId
WHERE c.castName LIKE '%tom%';

This joins everything. Then, it scans the resulting temp table and returns every row with tom.

Let's implement the FULLTEXT method

SELECT DISTINCT c.movieName, c.castName, c.movieImdbId, f.year, f.posterLink
FROM cast_movie as c JOIN film_info as f ON c.ImdbId = f.ImdbId
WHERE MATCH(c.castName) AGAINST ('+tom' IN BOOLEAN MODE);

I can almost assure you that a full table scan will occur during the join because I have experienced it hundreds of times:

I started writing posts about FULLTEXT and query refactoring ever since I saw a post from webmasterworld.com entitled MySQL Match Against + a very expensive join. How do I optimize this?

Given the information I just mentioned let's take your query and refactor it to handle the FULLTEXT and stilldo a good join.

SELECT DISTINCT c.movieName, c.castName, c.movieImdbId, f.year, f.posterLink
FROM cast_movie as c JOIN film_info as f ON c.ImdbId = f.ImdbId
WHERE MATCH(c.castName) AGAINST ('+tom' IN BOOLEAN MODE);

First, apply the MATCH ... AGAINST operator to cast_movie and return ImdbId and other fields

SELECT ImdbId,movieName,castName,movieImdbId FROM cast_movie
WHERE MATCH(c.castName) AGAINST ('+tom' IN BOOLEAN MODE)

Make that a subquery to provide keys to join and retrieve data from film_info

SELECT DISTINCT c.movieName, c.castName, c.movieImdbId, f.year, f.posterLink FROM
(
    SELECT ImdbId,movieName,castName,movieImdbId FROM cast_movie
    WHERE MATCH(c.castName) AGAINST ('+tom' IN BOOLEAN MODE)
) as c JOIN film_info as f ON c.ImdbId = f.ImdbId;

This should produce a better result.

If your tables use InnoDB Storage Engine, remember to tune the InnoDB Fulltext options. Any fulltext option that does not start with innodb_ is meant for MyISAM (See my post MySQL FullText search on string shorter than 3 chars returns no rows)

Give it a Try !!!