Sql-server – Finding rows with duplicate values

group bysql serversql-server-2008-r2t-sql

I have a one to many relationship between the Account table and a ScheduledPayment table. One account can have many payments but there should be a maximum of one payment per day.
Although unlikely, there are two scenarios which I need to cater for…

  1. If there is one or more Payment records for an AccountId on a
    day, with different payment and or reference then all payments
    should be marked as cancelled.

  2. If there is more than one Payment record for an AccountId on a
    day, with the same payment and reference then one record should be
    the winner and the other be marked as cancelled.

In reality, I don't anticipate seeing this scenario all that much because it is down to a concurrency issue in code, if two users are working on the same record at the same time.

Any ideas would be greatly appreciated!

Account
---------
AccountId
---------
    1
    2
    3
    4
ScheduledPayment
----------------
ScheduledPaymentId | AccountId | Amount | Reference | State
------------------------------------------------------------
       1           |    1      |  100   | ABCDE     | Live
       2           |    1      |  100   | ABCDE     | Live  (For these two, 1 would the winner and two marked as cancelled)
       3           |    2      |  100   | ABCDE     | Live
       4           |    2      |  110   | ABCDE     | Live
       5           |    2      |  110   | ABCDE     | Live
       6           |    2      |  130   | ABCDE     | Live (For these, all would be marked as cancelled)
       7           |    3      |  100   | ABCDE     | Live
       8           |    4      |  100   | ABCDE     | Live

Best Answer

This should return tag the records that need attention. I put the tagging in SELECT, but you could easily turn this into a second CTE and simply select out the payments to clean up.

-- 
-- find all accounts with more than one payment and mark payments to cancel
--
WITH cte_DuplicatePayments AS
(
SELECT COUNT(*) OVER(PARTITION BY accountID) AS numberOfPaymentsPerAccountID
, COUNT(*) OVER(partition BY accountID, amount) AS numberOfPaymentsPerAccountIDAndAmount
, ROW_NUMBER() OVER(partition BY accountID ORDER BY amount asc) AS PaymentsNumberPerAccountID
, *
FROM ScheduledPayment
)
SELECT CASE 
    WHEN numberOfPaymentsPerAccountID != numberOfPaymentsPerAccountIDAndAmount THEN 'MARK AS CANCELLED: Duplicate Payments with amount mismatch' 
    WHEN PaymentsNumberPerAccountID > 1 THEN 'MARK AS CANCELLED: Duplicate Payments with matching amount' 
    ELSE ''
   END AS PaymentAuditAction
, ScheduledPaymentID, accountID, amount,
FROM cte_DuplicatePayments
WHERE numberOfPaymentsPerAccountID > 1