Mysql – Why Output is wrong after doing the left join as subquery

join;MySQLmysql-8.0subquery

I have TWO QUERY which work perfect, but when i join those two query as LEFT JOIN the output become wrong. So what is the problem.?

Query 1:

select Batsman.innings_no,
       bowler,
       sum(Balls)                   as B,
       ifnull(sum(Runs_In_Over), 0) as R,
       ifnull(sum(Zero), 0)         as 0s,
       ifnull(sum(Four), 0)         as 4s,
       ifnull(sum(Six), 0)          as 6s
from (SELECT A.innings_no,
             A.bowler,
             count(*)                   as Balls,
             Sum(B.Runs_Scored)         As 'Runs_In_Over',
             case B.runs_scored
               when 0 then count(*) end as Zero,
             case B.runs_scored
               when 4 then count(*) end as Four,
             case B.runs_scored
               when 6 then count(*) end as Six
      FROM `database`.ball_by_ball A
             INNER JOIN `database`.batsman_scored B
                        using (match_id, over_id, ball_id, innings_no)
      where match_id = 981018
        and innings_no = 2
      GROUP BY A.innings_no, A.bowler, B.runs_scored) as Batsman
group by innings_no, bowler order by bowler;

Query 2:

select innings_no,
       bowler,
       ifnull(sum(Wides), 0)   as WD,
       ifnull(sum(NoBalls), 0) as NB,
       sum(Extra_runs) as Extra
from (SELECT D.innings_no,
             D.bowler,
             case E.extra_type_id
               when 2 then count(*) end as Wides,
             case E.extra_type_id
               when 4 then count(*) end as NoBalls,
             Sum(E.Extra_Runs)          As 'Extra_runs'
      FROM `database`.ball_by_ball D
             INNER JOIN `database`.extra_runs E
                        using (match_id, over_id, ball_id, innings_no)
      WHERE match_id = 981018
        and innings_no = 2
        and E.Extra_Type_Id IN (2, 4)
      GROUP BY D.innings_no, D.bowler, E.extra_type_id) as Extra
group by innings_no, bowler order by bowler;

Join Query:

select Batsman.innings_no,
       Batsman.bowler,
       sum(Balls)                   as B,
       ifnull(sum(Runs_In_Over), 0) as R,
       ifnull(sum(Zero), 0)         as 0s,
       ifnull(sum(Four), 0)         as 4s,
       ifnull(sum(Six), 0)          as 6s,
       ifnull(sum(Wides), 0)        as WD,
       ifnull(sum(NoBalls), 0)      as NB,
       ifnull(sum(Extra_runs), 0)              as Extra
from (SELECT A.innings_no,
             A.bowler,
             count(*)                   as Balls,
             Sum(B.Runs_Scored)         As 'Runs_In_Over',
             case B.runs_scored
               when 0 then count(*) end as Zero,
             case B.runs_scored
               when 4 then count(*) end as Four,
             case B.runs_scored
               when 6 then count(*) end as Six
      FROM `database`.ball_by_ball A
             INNER JOIN `database`.batsman_scored B
                        using (match_id, over_id, ball_id, innings_no)
      where match_id = 981018
        and innings_no = 2
      GROUP BY A.innings_no, A.bowler, B.runs_scored) as Batsman
       LEFT JOIN (SELECT D.innings_no,
                         D.bowler,
                         case E.extra_type_id
                           when 2 then count(*) end as Wides,
                         case E.extra_type_id
                           when 4 then count(*) end as NoBalls,
                         Sum(E.Extra_Runs)          As 'Extra_runs'
                  FROM `database`.ball_by_ball D
                         INNER JOIN `database`.extra_runs E
                                    using (match_id, over_id, ball_id, innings_no)
                  WHERE match_id = 981018
                    and innings_no = 2
                    and E.Extra_Type_Id IN (2, 4)
                  GROUP BY D.innings_no, D.bowler, E.extra_type_id) as Extra using (innings_no, bowler)
group by innings_no, bowler order by bowler;

Query 1 Output:

enter image description here

Query 2 Output:

enter image description here

Join Query Output:

enter image description here

If you saw Query 1 and 2 screenshot and compare with join query for specific Bowler, then it's clear that when i use left join as sub query the result don't show exactly what it shows in single query.

So what is the problem.? What i need to do to solve this problem and why this problem happening?

Best Answer

You were close, here is what I think you were going for:

select Batsman.innings_no,
       Batsman.bowler,
       sum(Balls)                   as B,
       ifnull(sum(Runs_In_Over), 0) as R,
       ifnull(sum(Zero), 0)         as 0s,
       ifnull(sum(Four), 0)         as 4s,
       ifnull(sum(Six), 0)          as 6s,
       ifnull(sum(Wides), 0)        as WD,
       ifnull(sum(NoBalls), 0)      as NB,
       ifnull(sum(Extra_runs), 0)              as Extra
from (SELECT A.innings_no,
             A.bowler,
             count(*)                   as Balls,
             Sum(B.Runs_Scored)         As 'Runs_In_Over',
             case B.runs_scored
               when 0 then count(*) end as Zero,
             case B.runs_scored
               when 4 then count(*) end as Four,
             case B.runs_scored
               when 6 then count(*) end as Six
      FROM `database`.ball_by_ball A
             INNER JOIN `database`.batsman_scored B
                        using (match_id, over_id, ball_id, innings_no)
      where match_id = 981018
        and innings_no = 2
      GROUP BY A.innings_no, A.bowler) as Batsman
       LEFT JOIN (SELECT D.innings_no,
                         D.bowler,
                         case E.extra_type_id
                           when 2 then count(*) end as Wides,
                         case E.extra_type_id
                           when 4 then count(*) end as NoBalls,
                         Sum(E.Extra_Runs)          As 'Extra_runs'
                  FROM `database`.ball_by_ball D
                         INNER JOIN `database`.extra_runs E
                                    using (match_id, over_id, ball_id, innings_no)
                  WHERE match_id = 981018
                    and innings_no = 2
                    and E.Extra_Type_Id IN (2, 4)
                  GROUP BY D.innings_no, D.bowler) as Extra using (innings_no, bowler)
group by innings_no, bowler order by bowler;

The difference in my query and your Join Query is that each of the subqueries is being grouped by just innings_no and bowler in my query.

How does group by work

The GROUP BY clause will restrict the resulting query to return just a single row for each unique combination of the columns listed in the GROUP BY.

In all of your queries, you are initially grouping by 3 columns, which means you are getting a separate row for each combination of innings_no, bowler, and a 3rd column. For Query 1 this 3rd column is runs_scored, for Query 2 this column is extra_type_id. This is generally not a valid way to use GROUP BY with aggregate functions, like SUM, that you used.

However, in both Query 1 and Query 2, you then wrap them in ANOTHER query, having just innings_no and bowler as the GROUP BY, so you are getting the output you expected.

But in Join Query you are applying the LEFT JOIN before applying that outer GROUP BY, which was fixing your results in Query 1 and Query 2. Therefore, your Join Query is joining together different results than the output of the other two queries. This is why it doesn't appear to work as expected.

My query should fix things, but if it doesn't provide the correct result (because I missed some valid reason you might have had for grouping the way you did) then just use the LEFT JOIN after applying the outer GROUP BYs. That will likely result in worse performance, that depending on your data, that may not matter to you.

LEFT JOIN Query 2 to Query 1

Here is how to simply do a LEFT JOIN of the two queries, which should yield the result you want, based on comments.

SELECT * FROM
(
select Batsman.innings_no,
       bowler,
       sum(Balls)                   as B,
       ifnull(sum(Runs_In_Over), 0) as R,
       ifnull(sum(Zero), 0)         as 0s,
       ifnull(sum(Four), 0)         as 4s,
       ifnull(sum(Six), 0)          as 6s
from (SELECT A.innings_no,
             A.bowler,
             count(*)                   as Balls,
             Sum(B.Runs_Scored)         As 'Runs_In_Over',
             case B.runs_scored
               when 0 then count(*) end as Zero,
             case B.runs_scored
               when 4 then count(*) end as Four,
             case B.runs_scored
               when 6 then count(*) end as Six
      FROM `database`.ball_by_ball A
             INNER JOIN `database`.batsman_scored B
                        using (match_id, over_id, ball_id, innings_no)
      where match_id = 981018
        and innings_no = 2
      GROUP BY A.innings_no, A.bowler, B.runs_scored) as Batsman
group by innings_no, bowler
) q1
LEFT JOIN (
select innings_no,
       bowler,
       ifnull(sum(Wides), 0)   as WD,
       ifnull(sum(NoBalls), 0) as NB,
       sum(Extra_runs) as Extra
from (SELECT D.innings_no,
             D.bowler,
             case E.extra_type_id
               when 2 then count(*) end as Wides,
             case E.extra_type_id
               when 4 then count(*) end as NoBalls,
             Sum(E.Extra_Runs)          As 'Extra_runs'
      FROM `database`.ball_by_ball D
             INNER JOIN `database`.extra_runs E
                        using (match_id, over_id, ball_id, innings_no)
      WHERE match_id = 981018
        and innings_no = 2
        and E.Extra_Type_Id IN (2, 4)
      GROUP BY D.innings_no, D.bowler, E.extra_type_id) as Extra
group by innings_no, bowler
) USING (innings_no, bowler)
ORDER BY bowler

All I did to create this query was take your Query 1 and Query 2, stuck a LEFT JOIN between them, and added a SELECT * FROM to the start and a USING clause for the join. I also placed the ORDER BY outside, as it won't have any effect inside, unlike the GROUP BY.

Be aware that there is likely simpler way to get the result you want, but it requires a deeper understanding of what output you want and how your data is organized, and so is beyond the scope of this question.