Sql-server – Change SELECT DISTINCT to UPDATE DISTINCT

distinctsql serverupdate

How can I modify my SELECT DISTINCT query to an UPDATE DISTINCT query?

It is important that it only updates the distinct records, as there are multiple records associated with each [Finance_Project_Number] (due to CRUD operations). I only want to update a single record, as this will simply set in motion a different process of validating the data etc.

If there are multiple records collapsed into a single row as a result of the DISTINCT, any of these can be updated – it does not matter.

When I run my select query, I get a result of: 6 982:

SELECT DISTINCT
    [Finance_Project_Number]
FROM [InterfaceInfor].[dbo].[ProjectMaster]
WHERE 
    NOT EXISTS 
    (
        SELECT *
        FROM [IMS].[dbo].[THEOPTION]  
        WHERE 
            [InterfaceInfor].[dbo].[ProjectMaster].[Finance_Project_Number] =
            [IMS].[dbo].[THEOPTION].[NAME]
    );

Here is my attempt at converting my query to a DISTINCT UPDATE query, but this updates 15 353 records:

UPDATE [InterfaceInfor].[dbo].[ProjectMaster]
SET 
    [Processing_Result_Text] = 'UNIQUE',
    [Processing_Result] = 0
WHERE 
    NOT EXISTS 
    (
        SELECT *
        FROM [IMS].[dbo].[THEOPTION]  
        WHERE 
            [InterfaceInfor].[dbo].[ProjectMaster].[Finance_Project_Number] =
            [IMS].[dbo].[THEOPTION].[NAME]
    );

Best Answer

To just update an arbitrary one from each distinct group you could use

WITH T
     AS (SELECT ROW_NUMBER() OVER (PARTITION BY [Finance_Project_Number] 
                                       ORDER BY (SELECT 0)) AS RN,
                [Processing_Result_Text],
                [Processing_Result]
         FROM   [InterfaceInfor].[dbo].[ProjectMaster]
         WHERE  NOT EXISTS (SELECT *
                            FROM   [IMS].[dbo].[THEOPTION]
                            WHERE  [InterfaceInfor].[dbo].[ProjectMaster].[Finance_Project_Number] = [IMS].[dbo].[THEOPTION].[NAME]))
UPDATE T
SET    [Processing_Result_Text] = 'UNIQUE',
       [Processing_Result] = 0
WHERE  RN = 1; 

If you decide there is some criteria by which the row to be updated should be selected after all simply change the ORDER BY (SELECT 0) accordingly so that the desired target row is ordered first - e.g. ORDER BY DateInserted desc would update the latest one as ordered by a column called DateInserted if such a column exists.


This uses a common table expression (CTE) because it is not permitted to reference ranking functions such as ROW_NUMBER directly in the WHERE clause. It is allowed to update data via common table expressions in the same circumstances as for updatable views (basically the data being updated must be able to be mapped back straight forwardly to specific items in a single base table).

If you are not yet familiar with ranking functions you may well find SELECT-ing from the CTE first to be beneficial.

CREATE TABLE #TheOption(NAME VARCHAR(50));

CREATE TABLE #ProjectMaster
(
Finance_Project_Number VARCHAR(10) NOT NULL,
Processing_Result_Text VARCHAR(50) NULL,
Processing_Result INT NULL
);

INSERT INTO #ProjectMaster (Finance_Project_Number, Processing_Result_Text)
VALUES ('A00001', 'A'), 
       ('A00001', 'B'), 
       ('A00001', 'C'), 
       ('B99999', 'D'), 
       ('B99999', 'E'), 
       ('C47474', 'F'), 
       ('C47474', 'G');

INSERT INTO #TheOption (NAME) VALUES('C47474');


WITH T
     AS (SELECT ROW_NUMBER() OVER (PARTITION BY Finance_Project_Number 
                                       ORDER BY (SELECT 0)) AS RN,
                Finance_Project_Number,
                Processing_Result_Text,
                Processing_Result
         FROM   #ProjectMaster pm
         WHERE  NOT EXISTS (SELECT *
                            FROM   #TheOption opt
                            WHERE  pm.Finance_Project_Number =opt.NAME))
SELECT *
FROM T
ORDER BY Finance_Project_Number, RN; 

Example results are below

enter image description here

The C47474 rows are filtered out as they do exist in the other table so don't meet the NOT EXISTS, the remaining rows are grouped by Finance_Project_Number and assigned a sequential number within each group.

In this case the yellow rows would meet the RN = 1 condition and be updated. However there is no guarantee exactly how these numbers will be assigned within each group unless you use a ORDER BY clause on an expression guaranteed to be unique. Without this it could potentially change even between successive executions of the same statement.