Sql-server – Check for duplicates in column and determine which row to keep

sql-server-2012

I've found myself running in to this on multiple occasions now where I got a query and would need too further refine it. Where one column might contain duplicates and I need to evaluate which row to keep and which rows needs to be discarded.

So far I've had trouble to apply what I've found on my queries as most examples contains selections from one or two tables where my queries are three or more.

I've tried using group by and having count as these examples show but I just
get a lot of aggregate errors.

I think what I'd like to do would be to use the table created from my query and then query that one if it's possible and basically cull rows which contains duplicates in one of the columns depending on some criteria. For example if one row contains a datetime remove all except the newest.

SELECT
    stamps.stampsnr AS 'stampsnr' ,
    stamps.time AS 'time' ,
    stamps.amount AS 'amount' ,
    products.productnamn AS 'productnamn' ,
    products.Artikelnummer AS 'Artikelnummer' ,
    FContainer.id AS 'FlaggId' ,
    FContainer.FlaggId AS 'Flagga' ,
    Tempo.tempo AS 'nuTempoNr' ,
    Tempo.amounttempo AS 'amountTempon'
FROM
    stamps
INNER JOIN Tempo
ON  stamps.temponr = Tempo.temponr
INNER JOIN products
ON  Tempo.productnr = products.productnr
INNER JOIN FContainer
ON  stamps.ID = FContainer.id
WHERE
    ( stamps.time > '" & dtmYesterday & "' + ' 06:00:00' )
    AND ( stamps.time < '" & dtmNow & "' + ' 06:00:00' )
    AND FContainer.flaggid = 5
ORDER BY
    time;

Here is an example query. Now on the table created I'd like to check the column renamed to flaggID for duplicates if any found remove every row except the newest and the time is stored in stamps.time.

Best Answer

As @sabinbio commented, you need to add a ROW_NUMBER:

SELECT *
FROM 
 (
   SELECT
      stamps.stampsnr AS 'stampsnr'
     ,stamps.time AS 'time'
     ,stamps.amount AS 'amount'
     ,products.productnamn AS 'productnamn'
     ,products.Artikelnummer AS 'Artikelnummer'
     ,FContainer.id AS 'FlaggId'
     ,FContainer.FlaggId AS 'Flagga'
     ,Tempo.tempo AS 'nuTempoNr'
     ,Tempo.amounttempo AS 'amountTempon'
     ,ROW_NUMBER()  
      OVER (PARTITION BY FContainer.FlaggId  -- for each FlaggId
            ORDER BY stamps.time DESC) AS rn -- assign #1 to latest time
   FROM stamps
   INNER JOIN Tempo
     ON stamps.temponr = Tempo.temponr 
   INNER JOIN products
     ON Tempo.productnr = products.productnr
   INNER JOIN FContainer
     ON stamps.ID = FContainer.id

   WHERE  (stamps.TIME > '" & dtmYesterday & "' + ' 06:00:00') 
     AND (stamps.TIME < '" & dtmNow & "' + ' 06:00:00') 
     AND fcontainer.flaggid = 5 
 ) AS dt
WHERE rn = 1
ORDER BY time