Sql-server – How to update ID=null values in table to incremental counter values

identityrowsql serverupdate

On SQL Server 2012, i've got an intermediate/staging table for merging existing with new data, where I want to insert numeric IDs for newly created rows:

ID   NaturalID               Comment

1    franknfurther03071972   blahblah
2    chrisrock12081980       nonsense
null clairecampbell24121990  merry christmas
3    walterhermes22032001    young guy
4    tanjaolsen16051996      nice
null timharris20041999       came late

The rows with "null" IDs are new, the numbered IDs are those already existing in the main, target table. The NaturalID can uniquely identify an entry (in fact, it's multiple columns). I want to set the "null" IDs to incremental values, following the current max ID, here: 5 and 6, increasing when more null IDs are found.

Currently, I use a cursor to iterate over the rows with ID null and update each ID with a value, but sincce it's really a very big table, it would take days.

I tried to do an update with row_number(), but it gives me an error "Windowed functions can only appear in the SELECT or ORDER BY clauses.":

update StagingTable set ID=ROW_NUMBER() over (order by NaturalId)
from StagingTable where id is null  -- fails

How can I do it?

Best Answer

You can do this.

WITH T
     AS (SELECT ISNULL((SELECT MAX(ID) FROM StagingTable), 0) + 
                    ROW_NUMBER() OVER (ORDER BY NaturalID) AS New_ID,
                ID
         FROM   StagingTable
         WHERE  ID IS NULL)
UPDATE T
SET    ID = New_ID

So the windowed function is used in the SELECT list but you can still use the result of it to UPDATE the column.

You should probably have a filtered index unique constraint on ID WHERE ID IS NOT NULL to prevent duplicates too. Or run this at serializable isolation level to block concurrent inserts.

Related Solutions

Sql-server – Filtering data ordered by rowversion

One solution is for the client application to remember the maximum rowversion per ID. The user-defined table type would change to:

CREATE TYPE
    dbo.guid_list_tbltype
AS TABLE 
    (
    Id      uniqueidentifier PRIMARY KEY, 
    LastRV  rowversion NOT NULL
    );

The query in the procedure can then be rewritten to use the APPLY pattern (see my SQLServerCentral articles part 1 and part 2 - free login required). The key to good performance here is the ORDER BY - it avoids unordered pre-fetching on the nested loops join. The RECOMPILE is necessary to allow the optimizer to see the cardinality of the table variable at compilation time (probably resulting in a desirable parallel plan).

ALTER PROCEDURE dbo.GetData

    @IDs        guid_list_tbltype READONLY,
    @MaxRows    bigint

AS
BEGIN

    SELECT TOP (@MaxRows)
        d.Id,
        d.[Date],
        d.Value,
        d.RV
    FROM @Ids AS i
    CROSS APPLY
    (
        SELECT
            d.*
        FROM dbo.Data AS d
        WHERE
            d.Id = i.Id
            AND d.RV > i.LastRV
    ) AS d
    ORDER BY
        i.Id,
        d.RV
    OPTION (RECOMPILE);

END;

You should get a post-execution query plan like this (estimated plan will be serial):

query plan

Sql-server – Displaying Parent Child Information, With Certain Parent Columns Only Shown Once

A general policy is to let the reporting layer handle things like only printing ParentIncome once. However, since you are delivering a spreadsheet that will be used by others in who knows what manner, then I suppose you are stuck.

Because of the knowledge required you will need to develop some extra information (MIN, MAX, first, last, etc.) that is not known by a single row. There are dodges different from ROW_NUMBER() OVER (PARTITION...), but there will still be an extra step.

See the following:

CREATE TABLE #parent
(ParentNum INT,
 ParentName VARCHAR(20),
 ParentIncome INT);

CREATE TABLE #child
(ChildNum INT,
 ChildParentNum INT,
 ChildName VARCHAR(20),
 ChildAllowance INT);

INSERT INTO #parent VALUES(10,'John',50000);
INSERT INTO #parent VALUES(20,'Jane',55000);
INSERT INTO #parent VALUES(30,'Jackie',90000);

INSERT INTO #child VALUES(1,10,'Johnny',5)
INSERT INTO #child VALUES(2,20,'Jackie',10)
INSERT INTO #child VALUES(3,20,'Billy',5)
INSERT INTO #child VALUES(4,20,'Sally',5)
INSERT INTO #child VALUES(5,30,'Monique',0)

-- Basic approach you may be using

See this example SQL Fiddle #1

SELECT pc.ParentName, pc.ParentNum,
    CASE WHEN pc.RowNum = 1 THEN CAST(pc.ParentIncome AS VARCHAR(10)) ELSE '' END as ParentIncome,
    pc.ChildName, pc.ChildNum, pc.ChildAllowance
FROM (SELECT p.ParentNum, p.ParentName, p.ParentIncome, 
         c.ChildNum, c.ChildParentNum, c.ChildName, c.ChildAllowance,
         ROW_NUMBER() OVER (PARTITION BY ParentNum ORDER BY ParentNum) AS RowNum
       FROM #parent p JOIN #child c ON p.ParentNum = c.ChildParentNum) AS pc
ORDER BY pc.ParentNum, pc.ChildNum 

    -- An alternative, but still using a subselect for one element

See this example: SQL Fiddle #2

SELECT  p.ParentName, p.ParentNum, 
        CASE WHEN c.ChildNum = mc.MinChild THEN CAST (ParentIncome AS VARCHAR(10)) ELSE '' END AS ParentIncome,
        c.ChildName, c.ChildNum, c.ChildAllowance
       FROM #parent p 
          JOIN #child as c 
             ON p.ParentNum = c.ChildParentNum
          -- This subselect gives the MIN (or First) ChildNum per Parent
          JOIN (SELECT ChildParentNum, MIN(ChildNum) AS MinChild
                  FROM #child
                  GROUP BY ChildParentNum) AS mc 
             ON mc.ChildParentNum = c.ChildParentNum 
ORDER BY p.ParentNum, c.ChildNum 

drop table #parent
drop table #child

Notice that I cast the ParentIncome as a VARCHAR(10) so that the datatype would be of the same type as the empty income of ''. I originally used a NULL instead of a blank, but that might give you EXCEL problems.

Is it worth doing things this way? It is up to you, but it primarily depends on what you like best. The ROW_NUMBER() is a more powerful operator than MIN() and gives you more options, but in this case it appears that a MIN() will work for you.

Best Answer

Related Solutions

Sql-server – Filtering data ordered by rowversion

Sql-server – Displaying Parent Child Information, With Certain Parent Columns Only Shown Once

Related Question