Mysql – Getting Skewed Results from LEFT OUTER JOIN with 3 TABLES

join;MySQL

I have 3 tables, users (user info), mghours (volunteer time), mgtraining (training time). I want a result that lists some user columns, sum of volunteer hours and sum of two types of training:

uid lastname firstname mgstatus trainingyear vol at ce

I use the following and as long as I have the two fields from the mgtraining table and the LEFT OUTER JOIN for those in the query, I get skewed results. If I take that out of the query, my user info and vol columns are fine.

SELECT users.uid AS 'uid',
users.lastname AS 'lastname',
users.firstname AS 'firstname',
users.mgstatus AS 'mgstatus',
users.trainingyear AS 'trainingyear',
sum(mghours.hourstime) AS 'vol',
SUM(IF(mgtraining.trainingtype = 'AT', mgtraining.trainingtime, 0)) AS 'at',
SUM(IF(mgtraining.trainingtype = 'CE', mgtraining.trainingtime, 0)) AS 'ce'
FROM users
LEFT OUTER JOIN mghours ON users.uid = mghours.uid AND year(mghours.hoursdate) = YEAR(CURDATE())
LEFT OUTER JOIN mgtraining ON users.uid = mgtraining.uid AND year(mgtraining.trainingdate) = YEAR(CURDATE())
WHERE users.coid = 33
GROUP BY users.uid
ORDER BY users.lastname ASC

The users.coid = 33 references a county ID.

I've been pouring over this for hours. What am I doing wrong?

Best Answer

JOINs together with aggregates (eg, SUM()) often give you bigger values than you expect. To see this, remove the SUMs and GROUP BY and stare at the output. You may see, for example, the same mghours.hourstime showing up multiple times. Hence, SUM(mghours.hourstime) is inflated.

This is happening because JOIN mgtraining is coming up with multiple rows in mgtraining for each row in mghours.

The solution is to do the query in steps. Notice how the inner query avoids inflating "vol":

SELECT  x.*,
        SUM(...) AS 'at
    FROM (
        SELECT  ...,
                SUM(mghours.hourstime) AS vol
            FROM ...
            GROUP BY ...
         ) x
    LEFT JOIN mgtraining ON ...
    GROUP BY ...
    ORDER BY ...;