To just update an arbitrary one from each distinct group you could use
WITH T
AS (SELECT ROW_NUMBER() OVER (PARTITION BY [Finance_Project_Number]
ORDER BY (SELECT 0)) AS RN,
[Processing_Result_Text],
[Processing_Result]
FROM [InterfaceInfor].[dbo].[ProjectMaster]
WHERE NOT EXISTS (SELECT *
FROM [IMS].[dbo].[THEOPTION]
WHERE [InterfaceInfor].[dbo].[ProjectMaster].[Finance_Project_Number] = [IMS].[dbo].[THEOPTION].[NAME]))
UPDATE T
SET [Processing_Result_Text] = 'UNIQUE',
[Processing_Result] = 0
WHERE RN = 1;
If you decide there is some criteria by which the row to be updated should be selected after all simply change the ORDER BY (SELECT 0)
accordingly so that the desired target row is ordered first - e.g. ORDER BY DateInserted desc
would update the latest one as ordered by a column called DateInserted if such a column exists.
This uses a common table expression (CTE) because it is not permitted to reference ranking functions such as ROW_NUMBER
directly in the WHERE
clause. It is allowed to update data via common table expressions in the same circumstances as for updatable views (basically the data being updated must be able to be mapped back straight forwardly to specific items in a single base table).
If you are not yet familiar with ranking functions you may well find SELECT
-ing from the CTE first to be beneficial.
CREATE TABLE #TheOption(NAME VARCHAR(50));
CREATE TABLE #ProjectMaster
(
Finance_Project_Number VARCHAR(10) NOT NULL,
Processing_Result_Text VARCHAR(50) NULL,
Processing_Result INT NULL
);
INSERT INTO #ProjectMaster (Finance_Project_Number, Processing_Result_Text)
VALUES ('A00001', 'A'),
('A00001', 'B'),
('A00001', 'C'),
('B99999', 'D'),
('B99999', 'E'),
('C47474', 'F'),
('C47474', 'G');
INSERT INTO #TheOption (NAME) VALUES('C47474');
WITH T
AS (SELECT ROW_NUMBER() OVER (PARTITION BY Finance_Project_Number
ORDER BY (SELECT 0)) AS RN,
Finance_Project_Number,
Processing_Result_Text,
Processing_Result
FROM #ProjectMaster pm
WHERE NOT EXISTS (SELECT *
FROM #TheOption opt
WHERE pm.Finance_Project_Number =opt.NAME))
SELECT *
FROM T
ORDER BY Finance_Project_Number, RN;
Example results are below
The C47474
rows are filtered out as they do exist in the other table so don't meet the NOT EXISTS
, the remaining rows are grouped by Finance_Project_Number
and assigned a sequential number within each group.
In this case the yellow rows would meet the RN = 1
condition and be updated. However there is no guarantee exactly how these numbers will be assigned within each group unless you use a ORDER BY
clause on an expression guaranteed to be unique. Without this it could potentially change even between successive executions of the same statement.
This query with a DISTINCT
:
SELECT DISTINCT Name, MyID
FROM data;
Returns distinct Name
and MyID
:
Name | MyID
Alan | 2
John | 3
This query with a GROUP BY
:
SELECT MIN(ID) AS ID, Name, MyID
FROM data
GROUP BY Name, MyID;
Returns Name
and MyId
with the smallest ID
like your sample:
ID | Name | MyID
1 | Alan | 2
3 | John | 3
Both query are in this SQL Fiddle and will work with most RDBMS. (Sample uses SQL Server)
Best Answer
As Erik said in the comments, you should fix bad data rather than trying to query around it, but if you absolutely cannot fix the data, the below query will get the distinct list of cities by replacing double spaces with a single space:
This is a really basic example, however, if the input data is not being validated, this may not be the only type of whitespace you're encountering causing duplication.
Prior to SQL 2017, you need to daisy chain multiple REPLACE statements to replace multiple characters. For example, this code replaces double spaces and tab characters with a single space:
In 2017, you can use the TRANSLATE function to swap all of the characters you're searching for with a single character, then replace that character with nothing to ensure you're finding all exact duplicates:
This means you don't have to repeat REPLACE for every character you want to strip, you just add the character code (+ CHAR(?)) to the TRANSLATE function and add another replacement character (#). As you can see, the TRANSLATE example replaces 4 characters for basically the same amount of code as the two-character replacement in prior versions.