Mysql – SQL query returning cartesian and duplicates and not single records

database-designMySQLselect

So I have four tables as follows:

CREATE TABLE softdrinks (
  drinkID INT NOT NULL AUTO_INCREMENT,
  drinkName VARCHAR(100),
  PRIMARY KEY (drinkID)
  );

 CREATE TABLE categories (
  categoryID INT NOT NULL AUTO_INCREMENT,
  drinkID INT NOT NULL,
  categoryName VARCHAR(100),
  FOREIGN KEY (drinkID) REFERENCES softdrinks(drinkID),
  PRIMARY KEY (categoryID)
  );

  CREATE TABLE staff (
    staffID INT NOT NULL AUTO_INCREMENT,
    staffName VARCHAR(150),
    staffAge INT NOT NULL,
    PRIMARY KEY (staffID)
    );

  CREATE TABLE sold (
    soldID INT NOT NULL AUTO_INCREMENT,
    drinkID INT NOT NULL,
    staffID INT NOT NULL,
    amount INTEGER,
    FOREIGN KEY (staffID) REFERENCES staff(staffID),
    FOREIGN KEY (drinkID) REFERENCES softdrinks(drinkID),
    PRIMARY KEY (soldID)
    );

This is what they look like when I show all the databases on their own:

drinks table:

drinkID drinkName
1        Pepsi
2        Dr Pepper
3        Sprite

categories table:

categoryID  drinkID categoryName
1              1    Cola
2              2    Cola
3              3    Orange

staff table:

staffID staffName   staffAge    gender
1        Bruce         35        MALE
2        Patel         65        MALE
3        Drew          25        MALE

sold table:

soldID  drinkID staffID amount
1           1   1       5
2           1   2       10
3           1   3       25
4           2   2       10
5           3   1       30
6           3   2       14
7           3   3       25

So for drinkID = 1, the sold table gives

soldID  drinkID staffID amount
1         1       1      5
2         1       2      10
3         1       3      25

Basically, I have a complex statement that is fetching all the data as follows:

SELECT tbl_softdrinks.drinkID, tbl_softdrinks.drinkName, 
               (IFNULL(tbl_sold.totalAVG, 0.0)) AS finalAVG, tbl_soldUsers.amount,
               tbl_categories.categoryName, 
               tbl_soldUsers.staffID
                FROM softdrinks tbl_softdrinks
                INNER JOIN categories tbl_categories
                     ON tbl_softdrinks.drinkID = tbl_categories.drinkID
                LEFT JOIN
                     (SELECT tbl_sold.drinkID, ROUND(AVG(tbl_sold.amount),1) AS totalAVG, tbl_sold.amount, 
                        tbl_sold.staffID FROM sold tbl_sold
                      GROUP BY tbl_sold.drinkID) tbl_sold
                     ON tbl_softdrinks.drinkID = tbl_sold.drinkID
                LEFT JOIN
                     sold tbl_soldAmount
                     ON tbl_softdrinks.drinkID = tbl_soldAmount.drinkID
                LEFT JOIN
                     sold tbl_soldUsers
                     ON tbl_softdrinks.drinkID = tbl_soldUsers.drinkID
                INNER JOIN (SELECT DISTINCT staffID, staffName, staffAge FROM staff tbl_staff
                    GROUP BY tbl_staff.staffID) tbl_staff
                    ON tbl_soldUsers.staffID = tbl_staff.staffID
                WHERE tbl_softdrinks.drinkID = 1;

This is returning:

drinkID drinkName   finalAVG    amount  categoryName    staffID
1        Pepsi        13.3        5        Cola           1
1        Pepsi        13.3        5        Cola           1
1        Pepsi        13.3        5        Cola           1
1        Pepsi        13.3        10       Cola           2
1        Pepsi        13.3        10       Cola           2
1        Pepsi        13.3        10       Cola           2
1        Pepsi        13.3        25       Cola           3
1        Pepsi        13.3        25       Cola           3
1        Pepsi        13.3        25       Cola           3

As opposed to:

drinkID drinkName   finalAVG    amount  categoryName    staffID   staffName  staffAge
1        Pepsi        13.3        5        Cola           1       Bruce      35
1        Pepsi        13.3        10       Cola           2       Patel      65
1        Pepsi        13.3        25       Cola           3       Drew       25

This is on sqlfiddle too: http://sqlfiddle.com/#!9/8a6223/9

I believe I am returning cartesian

Would really mean a lot if someone could fix my query, not necessarily database design

Best Answer

Of course your data model could be improved, but them main problem is an unneccesary join.

You join sold twice on drinkID, there are 3 rows with drinkID=1, so the first join returns 3 rows and then you join each row 3 times, it's a kind of cross join resulting in 3*3 = 9.

The fix is removing the 2nd join (of course simly adding DISTINCT will also work, but that's just hiding the root cause):

SELECT tbl_softdrinks.drinkID, tbl_softdrinks.drinkName, 
           (IFNULL(tbl_sold_avg.totalAVG, 0.0)) AS finalAVG, tbl_sold.amount,
           tbl_categories.categoryName, 
           tbl_sold.staffID
            FROM softdrinks tbl_softdrinks
            INNER JOIN categories tbl_categories
                 ON tbl_softdrinks.drinkID = tbl_categories.drinkID
            LEFT JOIN
                 (SELECT tbl_sold.drinkID, ROUND(AVG(tbl_sold.amount),1) AS totalAVG, tbl_sold.amount, 
                    tbl_sold.staffID FROM sold tbl_sold
                  GROUP BY tbl_sold.drinkID) tbl_sold_avg
                 ON tbl_softdrinks.drinkID = tbl_sold_avg.drinkID
            LEFT JOIN
                 sold tbl_sold
                 ON tbl_softdrinks.drinkID = tbl_sold.drinkID
            INNER JOIN (SELECT DISTINCT staffID, staffName, staffAge FROM staff tbl_staff
                GROUP BY tbl_staff.staffID) tbl_staff
                ON tbl_sold.staffID = tbl_staff.staffID
            WHERE tbl_softdrinks.drinkID = 1;

See fiddle