SQL Server – Handling Duplicate Rows Based on Columns

sql server

Here is what I am trying to do,

  • Get duplicates based on 2 columns (let say returns 500 rows)

  • Get duplicates based on above 2 columns + another column (let say returns 100 rows)

Now I want to get remaining 400 rows. In simple words, I want all duplicates where there are not duplicates because of columnC…

-- get duplicates based on ColumnA, ColumnB
SELECT '-'
    ,ColumnA
    ,ColumnB
    ,ColumnC
    ,COUNT(*)
FROM MainTable
     ...SOME joins(INNER AND left)
WHERE ColumnA IS NOT NULL
GROUP BY ColumnA
    ,ColumnB
    ,ColumnC
HAVING COUNT(*) > 1

EXCEPT

-- get duplicates based on ColumnA, ColumnB, ColumnC
SELECT '-'
    ,ColumnA
    ,ColumnB
    ,ColumnC
    ,COUNT(*)
FROM MainTable
     ...SOME joins(INNER AND left)
WHERE ColumnA IS NOT NULL
GROUP BY ColumnA
    ,ColumnB
    ,ColumnC
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC

I am just not able to complete this query 🙁

Best Answer

Using window functions will likely be simpler here:

DB<>Fiddle

WITH cte AS
( 
    SELECT 
        ColumnA,
        ColumnB,
        ColumnC,
        COUNT(*) OVER (PARTITION BY ColumnA, ColumnB)
          AS count_ab,
        COUNT(*) OVER (PARTITION BY ColumnA, ColumnB, ColumnC)
          AS count_abc
    FROM MainTable
         ...SOME joins(INNER AND left)
    WHERE ColumnA IS NOT NULL
)
SELECT
    ColumnA,
    ColumnB,
    ColumnC,
    count_ab
FROM 
    cte
WHERE
    count_ab > 1
  AND
    count_abc = 1 ;