I have a query that is taking a particularly long time to run (15+ seconds) and it is only getting worse with time as my dataset grows. I have optimized this in the past, and have added indices, code-level sorting and other optimizations, but it needs some further refining.
SELECT sounds.*, avg(ratings.rating) AS avg_rating, count(ratings.rating) AS votes FROM `sounds`
INNER JOIN ratings ON sounds.id = ratings.rateable_id
WHERE (ratings.rateable_type = 'Sound'
AND sounds.blacklisted = false
AND sounds.ready_for_deployment = true
AND sounds.deployed = true
AND sounds.type = "Sound"
AND sounds.created_at > "2011-03-26 21:25:49")
GROUP BY ratings.rateable_id
The query's purpose is to get me the sound id
's and the average rating of the most recent, released sounds. There are about 1500 sounds, and 2 Million ratings.
I have several indices on sounds
mysql> show index from sounds;
+--------+------------+------------------------------------------+--------------+----------------------+-----------+-------------+----------+--------+------+------------+————+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------+------------+------------------------------------------+--------------+----------------------+-----------+-------------+----------+--------+------+------------+————+
| sounds | 0 | PRIMARY | 1 | id | A | 1388 | NULL | NULL | | BTREE | |
| sounds | 1 | sounds_ready_for_deployment_and_deployed | 1 | deployed | A | 5 | NULL | NULL | YES | BTREE | |
| sounds | 1 | sounds_ready_for_deployment_and_deployed | 2 | ready_for_deployment | A | 12 | NULL | NULL | YES | BTREE | |
| sounds | 1 | sounds_name | 1 | name | A | 1388 | NULL | NULL | | BTREE | |
| sounds | 1 | sounds_description | 1 | description | A | 1388 | 128 | NULL | YES | BTREE | |
+--------+------------+------------------------------------------+--------------+----------------------+-----------+-------------+----------+--------+------+------------+---------+
and several on ratings
mysql> show index from ratings;
+---------+------------+-----------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+————+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+-----------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+————+
| ratings | 0 | PRIMARY | 1 | id | A | 2008251 | NULL | NULL | | BTREE | |
| ratings | 1 | index_ratings_on_rateable_id_and_rating | 1 | rateable_id | A | 18 | NULL | NULL | | BTREE | |
| ratings | 1 | index_ratings_on_rateable_id_and_rating | 2 | rating | A | 9297 | NULL | NULL | YES | BTREE | |
+---------+------------+-----------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
Here is the EXPLAIN
mysql> EXPLAIN SELECT sounds.*, avg(ratings.rating) AS avg_rating, count(ratings.rating) AS votes FROM sounds INNER JOIN ratings ON sounds.id = ratings.rateable_id WHERE (ratings.rateable_type = 'Sound' AND sounds.blacklisted = false AND sounds.ready_for_deployment = true AND sounds.deployed = true AND sounds.type = "Sound" AND sounds.created_at > "2011-03-26 21:25:49") GROUP BY ratings.rateable_id;
+----+-------------+---------+--------+--------------------------------------------------+-----------------------------------------+---------+-----------------------------------------+---------+——————+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+--------+--------------------------------------------------+-----------------------------------------+---------+-----------------------------------------+---------+——————+
| 1 | SIMPLE | ratings | index | index_ratings_on_rateable_id_and_rating | index_ratings_on_rateable_id_and_rating | 9 | NULL | 2008306 | Using where |
| 1 | SIMPLE | sounds | eq_ref | PRIMARY,sounds_ready_for_deployment_and_deployed | PRIMARY | 4 | redacted_production.ratings.rateable_id | 1 | Using where |
+----+-------------+---------+--------+--------------------------------------------------+-----------------------------------------+---------+-----------------------------------------+---------+-------------+
I do cache the results once obtained, so site performance is not much of an issue, but my cache warmers are taking longer and longer to run due to this call taking so long, and that is starting to become an issue. This doesn't seem like a lot of numbers to crunch in one query…
What more can I do to make this perform better?
Best Answer
After looking over the query, the tables, and the WHERE AND GROUP BY clauses, I recommend the following:
Recommendation #1) Refactor the Query
I reorganized the query to do three(3) things:
Here is my proposed query:
Recommendation #2) Index the sounds table with an index that will accommodate the WHERE clause
The columns of this index include all the columns from the WHERE clause with static values first and moving target last
I sincerely believe you will be pleasantly surprised. Give it a Try !!!
UPDATE 2011-05-21 19:04
I just saw the cardinality. OUCH !!! Cardinality of 1 for rateable_id. Boy, I feel stupid !!!
UPDATE 2011-05-21 19:20
Maybe making the index will be enough to improve things.
UPDATE 2011-05-21 22:56
Please run this:
UPDATE 2011-05-21 23:34
I refactored it again. Try This One Please:
UPDATE 2011-05-21 23:55
I refactored it again. Try This One Please (Last Time):
UPDATE 2011-05-22 00:12
I hate giving up !!!!
UPDATE 2011-05-22 07:51
It has been bothering me that ratings is coming back with 2 million rows in the EXPLAIN. Then, it hit me. You might need another index on the ratings table which starts with rateable_type:
The goal of this index is to reduce the temp table that manipulates ratings so that it is less that 2 million. If we can get that temp table significantly smaller (at least half), then we can have a better hope in your query and mine working faster too.
After making that index, please Retry my original proposed query and also try yours:
UPDATE 2011-05-22 18:39 : FINAL WORDS
I had refactored a query in a stored procedure and added an index to help answer a question on speeding things up. I got 6 upvotes, had the answer accepted,and picked up a 200 bounty.
I had also refactored another query (marginal results) and added an index (dramatic results). I got 2 upvotes and had the answer accepted.
I added an index for yet another query challange and was upvoted once
and now your question.
Wanting to answers all questions like these (including yours) were inspired by a YouTube video I watched on refactoring queries.
Thank you again, @coneybeare !!! I wanted to answer this question to fullest extent possible, not just accept points or accolades. Now, I can feel that I earned the points !!!