Mysql – Subquery inside a select vs left join

MySQLmysql-5.7optimizationperformancequery-performance

Often I have to return the count of of some row in a query that fetch other non related rows.

For example A table User a table Review and a Table picture

User:
id
nickname

Review:
id
to_user_id
from_user_id
rating

Picture:
id:
user_id
url

Let's say I want in one single query retrieve the nickname of a 'given' userId all its pictures url as well as the count of how many people reviewed that user.

The first and easy way I think when doing this query would this:

SELECT
  u.nickname
  (SELECT count(*) FROM review WHERE to_user_id = u.id) as reviewCount,
  p.url
FROM user
LEFT JOIN picture ON p.user_id = u.id
WHERE 
  u.id = 1

The other way of doing this is without that subselect and by joining the review table on the right user_id

SELECT 
 u.nickname,
 r.reviewCount,
 p.url
FROM user u 
LEFT JOIN (
    SELECT to_user_id, count(*) reviewCount FROM review GROUP BY to_user_id
 ) r ON r.to_user_id = u.id
LEFT JOIN picture ON p.user_id = u.id 
WHERE u.id = 1;

I'm not very an expert with db query performance and tuning. Could someone explain me if a solution is better than the other? (Or if there is an other better solution) ?

EDIT:
Sorry forgot to mention. I'm working with the latest MySQL

Best Answer

You do not specify which RDBMS you are working with. Most what I write here should be quite independent but I mostly have experence in MySQL so maybe different systems allow some other optimizations.

The (SELECT count(*) FROM review WHERE to_user_id = u.id) as reviewCount is a dependent subquery - it will be executed for each row in your results. Even if one execution is fast, potencially thousands of them can make it slow.

The one in the JOIN is a derived table - it will be executed only once and materialized into a temporary table, which will then be joined to your other tables. If the query is fast (can use index on (to_user_id)), it is good. But in this case the count will be counted even for users which do not really show in the results. But.. you can just push the condition in there (to_user_id = 1 instead of the GROUP BY).

But to make things not so simple there exist some optimizations in newer versions. The dependent subquery can be made faster by using a subquery cache in MariaDB 10 (and IIRC MySQL 5.7, but I did not verify). That means that in your case all rows in result have u.id = 1 -> to_user_id = 1 and the subquery will be actually executed only once and then the cached results will be used. If that is available, the difference between both versions will be minimal.

Personally I prefer your second version most of the time but there are some cases when the first one will be faster - I once had a query where it was not simply possible to limit the rows in the JOINED subquery the right way, but switching to the dependent subquery only few unique combinations were actually read.