Sql-server – Sum multiple columns, based on distinct values in different Columns

group bysql serversum

I have the following table, in an Azure SQL DB that has duplicate values that I'm trying to Sum.

Here is the Logic: If the PaymentID is unique, then Sum Payment,
If the creditID is unique, then sum credit,
if the debitid is unique, then sum debit.
And take the max(source)
The idea is to get a single line, per ID with distinct values for the ID summed.

id payment credit debit Source paymentid creditid debitid
1510142123 -589.53 0 0 CC 5831879 NULL NULL
1510142123 -589.53 0 0 CC 5831882 NULL NULL
1510142123 -155.06 0 0 CC 5898896 NULL NULL
157771145 -126.42 0 0 CC 5885900 NULL NULL
157771145 -58.73 0 0 CC 5885903 NULL NULL
158088837 -55.14 0 -3.45 CC 5897306 NULL 5897303
158088837 -5.75 0 -3.45 CC 5897309 NULL 5897303
158464166 -161 0 -3.45 CC 5910551 NULL 5910548
158464166 -24.15 0 -3.45 CC 5910554 NULL 5910548
1591970734 -111.61 0 0 Bank 5939648 NULL NULL
1591970734 -0.01 0 0 Cash 5939711 NULL NULL
1591970734 -0.01 0 0 Cash 5939714 NULL NULL
159297565 -708.93 20 0 CC 5943728 5910848 NULL
159297565 -0.02 20 0 Cash 5948207 5910848 NULL

For example:

158464166 | -185.15 | 0 | -3.45 | CC | 5910551 | 5910548

(in the above – I've taken the min(paymentid) to make it look nicer

Please note that although in the above snippet, Creditid and Debitid only have a duplicate ID, it's possible that they may have distinct IDs, so any code will have to be able to handle that. PaymentID will always be unique.

It is also possible that the values for payment, credit and debit may not be unique (e.g. a payment of $50 is made twice to a single ID), so we can't group on payment.

I got as far as this:

SELECT id, sum(payment), sum(credit), sum(debit), max(source), creditid, debitid  
FROM (  
  SELECT *,  
         COUNT(*) OVER (PARTITION BY id) AS cnt  
  FROM Temp_Payment) AS t  
WHERE t.cnt > 1  

group by id, creditid, debitid

but it's not giving the expected outcome.

Best Answer

For the one row you posted expected results for, this will return what you want.

In the future, please post your table as an insert script so that it's easier to work with.

SELECT
    x.*
INTO #x
FROM
(
    VALUES
        (1510142123,-589.53,0,0,'CC',5831879,NULL,NULL),
        (1510142123,-589.53,0,0,'CC',5831882,NULL,NULL),
        (1510142123,-155.06,0,0,'CC',5898896,NULL,NULL),
        (157771145,-126.42,0,0,'CC',5885900,NULL,NULL),
        (157771145,-58.73,0,0,'CC',5885903,NULL,NULL),
        (158088837,-55.14,0,-3.45,'CC',5897306,NULL,5897303),
        (158088837,-5.75,0,-3.45,'CC',5897309,NULL,5897303),
        (158464166,-161,0,-3.45,'CC',5910551,NULL,5910548),
        (158464166,-24.15,0,-3.45,'CC',5910554,NULL,5910548),
        (1591970734,-111.61,0,0,'Bank',5939648,NULL,NULL),
        (1591970734,-0.01,0,0,'Cash',5939711,NULL,NULL),
        (1591970734,-0.01,0,0,'Cash',5939714,NULL,NULL),
        (159297565,-708.93,20,0,'CC',5943728,5910848,NULL),
        (159297565,-0.02,20,0,'Cash',5948207,5910848,NULL)
)AS x (id,payment, credit, debit, [source], paymentid, creditid, debitid);

SELECT 
    x.id,
    SUM(DISTINCT y.payment) AS payment, 
    SUM(DISTINCT y.credit) AS credit, 
    SUM(DISTINCT y.debit) AS debit,
    MAX(x.source) AS source,
    MIN(x.paymentid) AS min_paymentid,
    MAX(x.debitid) AS max_debitid
FROM #x AS x
CROSS APPLY
(
    SELECT
        SUM(x2.payment) AS payment,
        SUM(x2.credit) AS credit,
        SUM(x2.debit) AS debit
    FROM #x AS x2
    WHERE x.id = x2.id
    AND   x.paymentid = x2.paymentid
    GROUP BY x2.id, 
             x2.creditid, 
             x2.debitid
) AS y
GROUP BY x.id
ORDER BY x.id;