Mysql – optimise join query with group by SUM

MySQL

I have 2 tables:
sales transaction tablem payment (reference_no) can be duplicate

-----------------------------------
| id | reference_no | invoice_amt |
| 1  | inv001       | 100.00      |
| 2  | inv001       | 30.00       |
| 3  | inv002       | 150.00      |
| 4  | inv003       | 50.00       |

payment tablem (reference_no) also can be duplicate.

-----------------------------------
| id | reference_no | payment_amt |
| 1  | inv001       | 130.00      |
| 2  | inv002       | 30.00       |
| 3  | inv002       | 50.00       |
| 4  | inv002       | 50.00       |
| 5  | inv002       | 20.00       |
| 6  | inv003       | 20.00       |

I want to match so those payment_amt not tally with invoice_amount will appear.
Example:

inv001 total sum invoice_amt is 130 and payment_amt is 130

inv002 total sum invoice_amt is 150 and payment_amt is 150

inv003 total sum invoice_amt is  50 and payment_amt is 20

So it will only display inv003 since the SUM of the invoice amount is not same with total payment_amt.

How to write the sql for this display?

Best Answer

I'm answering from a SQL Server perspective; however, something similar should work in most environments.

/* SET-UP  */

CREATE TABLE #inv (id INT IDENTITY(1,1), refno varchar(20), inv_amt money);
INSERT INTO #inv (refno, inv_amt)
VALUES ('inv001', 100)
      ,('inv001', 30)
      ,('inv002', 150)
      ,('inv003', 50)
;

CREATE TABLE #pay (id INT IDENTITY(1,1), refno varchar(20), pay_amt money);
INSERT INTO #pay (refno, pay_amt)
VALUES ('inv001', 130)
      ,('inv002', 30)
      ,('inv002', 50)
      ,('inv002', 50)
      ,('inv002', 20)
      ,('inv003', 20)
;


/* ANSWER */

CREATE TABLE #inv_total (refno varchar(20) PRIMARY KEY, inv_amt money);
CREATE TABLE #pay_total (refno varchar(20) PRIMARY KEY, pay_amt money);

INSERT INTO #inv_total
SELECT refno, SUM(inv_amt) FROM #inv GROUP BY refno;

INSERT INTO #pay_total
SELECT refno, SUM(pay_amt) from #pay GROUP BY refno;

SELECT COALESCE(i.refno, p.refno) as refno
      ,COALESCE(i.inv_amt, 0) as inv_amt
      ,COALESCE(p.pay_amt, 0) as pay_amt
  FROM #inv_total i
         FULL OUTER JOIN #pay_total p ON (i.refno = p.refno)
 WHERE COALESCE(i.inv_amt, 0) <> COALESCE(p.pay_amt, 0)
;

Executed on SQL Server 2008R2, this does indeed only bring back inv003.

NOTE: Depending on what version of SQL you're working in, you may be able to use the queries that populate the two _total temporary tables as sub-queries in the final query.

If you do need the temp tables, setting refno as a primary key isn't required; however, for very large amounts of data, it might speed up the final query.

ALSO: In my test data, I assumed that the two payment rows with the same id value was a typo. However, it's not relevant to the solution as far as I can tell, so if it was deliberate, change the set-up to not make the id column an IDENTITY column, and to explicitly populate that column in the INSERT statements.