In my experience, an aggregate (DISTINCT or GROUP BY) can be quicker then a ROW_NUMBER() approach. Saying that, ROW_NUMBER is better with SQL Server 2008 than SQL Server 2005.
However, you'll have to try for your situation.
Compare query plans, and use Profiler and SET to capture IO, CPU, Duration etc
For a lot of background, see these SO questions:
Finally, do you need the ROW_NUMBER approach? It looks like you're fixing a problem caused by de-normalisation.
And some notes:
- shouldn't YearID be in the GROUP BY or PARTITION BY?
- Won't DISTINCT give different output?
- Are these columns indexed?
This solution seems to work against your sample data.
--Demo setup
Declare @TableA table
(FirstName varchar(6), LastName varchar(6), Merchant varchar(16), Amount decimal(11,2), TransDate varchar(10), Reference int)
;
INSERT INTO @TableA
(FirstName, LastName, Merchant, Amount, TransDate, Reference)
VALUES
('{Null}', '{Null}', 'BLANK-ROSTER NUM', 34.30, '2018-01-01', 12345),
('{Null}', '{Null}', 'BLANK-ROSTER NUM', 34.30, '2018-01-01', 77777),
('{Null}', '{Null}', 'BLANK-ROSTER NUM', 16.07, '2018-12-24', 99999),
('{Null}', '{Null}', 'BLANK-ROSTER NUM', 16.07, '2018-12-24', 44444),
('{Null}', '{Null}', 'BLANK-ROSTER NUM', 34.30, '2018-01-01', 66666)
;
Declare @TableB table
(FirstName varchar(6), LastName varchar(6), Merchant varchar(16), Amount decimal(11,2), TransDate varchar(10))
;
INSERT INTO @TableB
(FirstName, LastName, Merchant, Amount, TransDate)
VALUES
('SMITH', 'JOHN', 'BLANK-ROSTER NUM', 34.30, '2018-01-01'),
('JONES', 'JAMES', 'BLANK-ROSTER NUM', 34.30, '2018-01-01'),
('MILLER', 'STEVE', 'BLANK-ROSTER NUM', 16.07, '2018-12-24'),
('WILSON', 'JACK', 'BLANK-ROSTER NUM', 34.30, '2018-01-01'),
('DAVIS', 'BEN', 'BLANK-ROSTER NUM', 16.07, '2018-12-24')
;
--The solution
;
--Common table expressions to put a row number on each row
--partition by merchant, amount and transdata
WITH TableARowNumbers
AS (
SELECT *
,ROW_NUMBER() OVER (
PARTITION BY Merchant
,Amount
,TransDate ORDER BY Merchant
,Amount
,TransDate
) AS rn
FROM @TableA
)
,TableBRowNumbers
AS (
SELECT *
,ROW_NUMBER() OVER (
PARTITION BY Merchant
,Amount
,TransDate ORDER BY Merchant
,Amount
,TransDate
) AS rn
FROM @TableB
)
--Update TableA joining on Merchant, Amount, TransDate and rn
UPDATE ta
SET ta.FirstName = tb.FirstName
,ta.LastName = tb.LastName
FROM TableARowNumbers ta
JOIN TableBRowNumbers tb
ON tb.Merchant = ta.Merchant
AND tb.Amount = ta.Amount
AND tb.TransDate = ta.TransDate
AND tb.rn = ta.rn
SELECT *
FROM @TableA
| FirstName | LastName | Merchant | Amount | TransDate | Reference |
|-----------|----------|------------------|--------|------------|-----------|
| SMITH | JOHN | BLANK-ROSTER NUM | 34.30 | 2018-01-01 | 12345 |
| JONES | JAMES | BLANK-ROSTER NUM | 34.30 | 2018-01-01 | 77777 |
| MILLER | STEVE | BLANK-ROSTER NUM | 16.07 | 2018-12-24 | 99999 |
| DAVIS | BEN | BLANK-ROSTER NUM | 16.07 | 2018-12-24 | 44444 |
| WILSON | JACK | BLANK-ROSTER NUM | 34.30 | 2018-01-01 | 66666 |
Best Answer
The easiest query to write is for MySQL (with not strict ANSI settings). It uses the non-standard construction:
In recent version (5.7 and 8.0+) where the strict settings and
ONLY_FULL_GROUP_BY
are the default, you can use theANY_VALUE()
function, added in 5.7:For other DBMSs, that have window functions (like Postgres, SQL-Server, Oracle, DB2), you can use them like this. The advantage is that you can select other columns in the result as well (besides the
key
andvalue
) :For older versions of the above and for any other DBMS, a general way that works almost everywhere. One disadvantage is that you cannot select other columns with this approach. Another is that aggregate functions like
MIN()
andMAX()
do not work with some datatypes in some DBMSs (like bit, text, blobs):PostgreSQL has a special non-standard
DISTINCT ON
operator that can also be used. The optionalORDER BY
is for selecting which row from every group should be selected: