Mysql – How to get for a percentage here in below query

MySQLoptimizationquery

A small change: report for each year the percentage of movies in that year with only female actors, and the total number of movies made that year.

SELECT Year,
       COUNT(*),
       (COUNT(*)*100/(count(Movie.mid)*1.0))
FROM Movie
WHERE TRIM(MID) IN (SELECT TRIM(MID)
                    FROM M_Cast
                    WHERE TRIM(PID) IN (SELECT TRIM(PID)
                                        FROM PERSON
                                        WHERE Gender ='Female'
                                       )
                   )
GROUP BY Movie.Year
ORDER BY COUNT(*) DESC

What should I change in this to get the correct percentage?

Best Answer

I'll approach it with a new DB design that suits the purpose.

So the Movie entity has a n:m relation with the Person entity. A person is cast in many movies and a movie has many persons in its cast. The movie entity has an year column and the person entity has a gender column.

So create this database as done at SQLFiddle here. Create two tables by queries as follows.

DROP TABLE IF EXISTS temp1;
DROP TABLE IF EXISTS temp2;
CREATE TABLE temp1 (
  SELECT COUNT(Pname) AS pcount, 
    Pname, 
    YEAR 
  FROM Movie m, 
    Person p, 
    M_Cast MC 
  WHERE  gender="Female" 
  AND m.mid=mc.mid 
  AND p.pid=mc.pid  
  GROUP BY YEAR,pname
);

create table temp2 
select Year, sum(pcount) as sumyear 
from temp1 
group by Year;

Then query the two temp tables as so:

select
  t1.pname,
  t1.year,
  t1.pcount as MoviesByPerson,
  t2.sumyear MoviesInYear, 
  t1.pcount*100/t2.sumyear as percentage 
from temp1 t1, temp2 t2 
where t1.year=t2.year

You can run the SQL Fiddle snippet and see the result.

Related Solutions

MySQL optimization – year column grouping – using temporary table, filesort

I don't see a lot of opportunity for improvement.

The index you added was probably a big help, because it's being used for the range matching on the WHERE clause (type => range, key => tran_date), and it's being used as a covering index (extra => using index), avoiding the need to seek into the table to fetch the row data.

But since you're using functions to construct the financial_year value for the group by, both the "using filesort" and "using temporary" can't be avoided. But, those aren't the real problem. The real problem is that you're evaluating MONTH(tran_date) 346,485 times and YEAR(tran_date) at least that many times... ~700,000 function calls in one second doesn't seem too bad.

Plan B: I am definitely not a fan of storing redundant data, and I'm dead-set against making the application responsible for maintaining it... but one option I might be tempted to try would be to create a dashboard_stats_by_financial_year table, and use after-insert/update/delete triggers on the transactions1 table to manage keeping those stats current.

That option has a cost, of course -- adding to the amount of time it takes to update/insert/delete a transaction... but, waiting > 1200 milliseconds for stats for your dashboard is a cost, too. So it may come down to whether you want to pay for it now or pay for it later.

Mysql – Joining two Mysql VIEWs takes a very long time

Using FULLTEXT indexes has to be handled with great care. Why ? While FULLTEXT index searches do work, the MySQL Query optimizer tends to suggest full table scans if you do not express the query properly.

Let's take your query and look for 'tom'

SELECT DISTINCT c.movieName, c.castName, c.movieImdbId, f.year, f.posterLink
FROM cast_movie as c JOIN film_info as f ON c.ImdbId = f.ImdbId
WHERE c.castName LIKE '%tom%';

This joins everything. Then, it scans the resulting temp table and returns every row with tom.

Let's implement the FULLTEXT method

SELECT DISTINCT c.movieName, c.castName, c.movieImdbId, f.year, f.posterLink
FROM cast_movie as c JOIN film_info as f ON c.ImdbId = f.ImdbId
WHERE MATCH(c.castName) AGAINST ('+tom' IN BOOLEAN MODE);

I can almost assure you that a full table scan will occur during the join because I have experienced it hundreds of times:

Oct 25, 2011 : FULLTEXT index ignored in BOOLEAN MODE with 'number of words' conditional
Jan 26, 2012 : Mysql fulltext search my.cnf optimization
May 07, 2012 : MySQL EXPLAIN doesn't show 'use index' for FULLTEXT
Jul 18, 2012 : Why full-text-search returns less rows than LIKE

I started writing posts about FULLTEXT and query refactoring ever since I saw a post from webmasterworld.com entitled MySQL Match Against + a very expensive join. How do I optimize this?

Given the information I just mentioned let's take your query and refactor it to handle the FULLTEXT and stilldo a good join.

SELECT DISTINCT c.movieName, c.castName, c.movieImdbId, f.year, f.posterLink
FROM cast_movie as c JOIN film_info as f ON c.ImdbId = f.ImdbId
WHERE MATCH(c.castName) AGAINST ('+tom' IN BOOLEAN MODE);

First, apply the MATCH ... AGAINST operator to cast_movie and return ImdbId and other fields

SELECT ImdbId,movieName,castName,movieImdbId FROM cast_movie
WHERE MATCH(c.castName) AGAINST ('+tom' IN BOOLEAN MODE)

Make that a subquery to provide keys to join and retrieve data from film_info

SELECT DISTINCT c.movieName, c.castName, c.movieImdbId, f.year, f.posterLink FROM
(
    SELECT ImdbId,movieName,castName,movieImdbId FROM cast_movie
    WHERE MATCH(c.castName) AGAINST ('+tom' IN BOOLEAN MODE)
) as c JOIN film_info as f ON c.ImdbId = f.ImdbId;

This should produce a better result.

If your tables use InnoDB Storage Engine, remember to tune the InnoDB Fulltext options. Any fulltext option that does not start with innodb_ is meant for MyISAM (See my post MySQL FullText search on string shorter than 3 chars returns no rows)

Give it a Try !!!

Best Answer

Related Solutions

MySQL optimization – year column grouping – using temporary table, filesort

Mysql – Joining two Mysql VIEWs takes a very long time

Related Question