MySQL – Counting Results with LEFT JOIN Doubles the Results

MySQL

I have this query generated by Django where a user has orders and jobs as related tables. jobs are of two types: clean and wash. I want to count for each user shown the number of orders, clean jobs and wash jobs.

One of the users has 3 clean jobs in the database but this query returns the value of 6 for column clean_ct. If I remove the join to the order table and the orders count the counts are as expected so the second LEFT OUTER JOIN is causing this doubling to occur. I'm suspecting a wrong GROUP BY but I haven't found a solution. Any ideas?

SELECT DISTINCT `auth_user`.`id`, 
                `auth_user`.`username`, 
                Count(DISTINCT `order`.`id`) AS `orders_ct`, 
                Count(CASE 
                        WHEN `job`.`service` = 'clean' THEN 1 
                        ELSE NULL 
                      end)                   AS `clean_ct`, 
                Count(CASE 
                        WHEN `job`.`service` = 'wash' THEN 1 
                        ELSE NULL 
                      end)                   AS `wash_ct` 
FROM   `auth_user` 
       LEFT OUTER JOIN `order` 
                    ON ( `auth_user`.`id` = `order`.`user_id` ) 
       LEFT OUTER JOIN `job` 
                    ON ( `auth_user`.`id` = `job`.`user_id` ) 
GROUP  BY `auth_user`.`id` 
ORDER  BY `auth_user`.`id` DESC 
LIMIT  100 

clean_ct should be 3 but returns 6

Best Answer

You have two joins from the base table (auth_user) and both joins are to tables with 1-to-many relationship with the base table. That creates the multiple rows. If for example, an auth_user has 3 orders and 7 jobs, the join (before any group by) will result in 21 rows.

When you then GROUP BY, the (created) rows are counted and you get incorrect results (in the example, you'd get the job counts multiplied by 3 and the orders count multiplied by 7). Your code would manage to avoid the x7 because it uses DISTINCT when counting orders but not avoid the x3 when counting jobs.

One solution would be to use DISTINCT in all counts. In your case, you could write the "wash" count as:

            Count(DISTINCT CASE 
                             WHEN `job`.`service` = 'wash' THEN job.id 
                             ELSE NULL 
                           END)                   AS `wash_ct` 

But this solution will become quite complicated and inefficient with many joins.

Another solution is to group by in each individual table and then join (the derived tables) to auth_user. Also:

  • GROUP BY and DISTINCT in the same SELECT are rarely needed:
  • ELSE NULL is the default in CASE expressions. (it's not a problem if you like it there, just not needed.)
  • You don't need to quote all table and column names. Only reserved words - which should be avoided anyway - and names with special characters.
  • It would best to avoid reserved words - like order - when naming tables and columns.

The query could be written:

SELECT au.id, 
       au.username, 
       COALESCE(orders_ct, 0) AS orders_ct, 
       COALESCE(clean_ct, 0)  AS clean_ct, 
       COALESCE(wash_ct, 0)   AS wash_ct 
FROM   auth_user AS au 
       LEFT OUTER JOIN 
           ( SELECT user_id,
                    Count(*)  AS orders_ct                    
             FROM   `order`
             GROUP  BY user_id
           ) AS o 
                    ON au.id = o.user_id 
       LEFT OUTER JOIN  
           ( SELECT user_id,
                    Count(CASE WHEN service = 'clean' THEN 1 
                          END)  AS clean_ct,
                    Count(CASE WHEN service = 'wash' THEN 1 
                          END)  AS wash_ct                            
             FROM   job
             GROUP  BY user_id
           ) AS j
                    ON au.id = j.user_id 
ORDER  BY au.id DESC 
LIMIT  100 ;