SQL Server – Update Values in Table Based on Ranges Defined in Another Table

sql serversql-server-2017update

I have two tables definitions and history with the following schema:

create table definitions (
    id UNIQUEIDENTIFIER,
    revision INT,
    majorVersion INT
  );
create table history (
   id UNIQUEIDENTIFIER,
   revision INT
   );

I want to update table history with another column majorVersion where the value ranges are defined in table definitions. For example, if records in definitions look like:

+--------------------------------------+-----------+---------------+
|                  id                  |  revision |  majorVersion |
+--------------------------------------+-----------+---------------+
| 9f717823-b9ca-4c7b-97f9-7770aaafb468 |         2 |             1 |
| 9f717823-b9ca-4c7b-97f9-7770aaafb468 |         4 |             2 |
+--------------------------------------+-----------+---------------+

Then the updated table history should look something like this:

+--------------------------------------+-----------+---------------+
|                  id                  |  revision |  majorVersion |
+--------------------------------------+-----------+---------------+
| 9f717823-b9ca-4c7b-97f9-7770aaafb468 |         1 |             1 |
| 9f717823-b9ca-4c7b-97f9-7770aaafb468 |         2 |             1 |
| 9f717823-b9ca-4c7b-97f9-7770aaafb468 |         3 |             2 |
| 9f717823-b9ca-4c7b-97f9-7770aaafb468 |         4 |             2 |
+--------------------------------------+-----------+---------------+

Here is a SQL Fiddle with a more number of rows.
The table definitions can contain potentially thousands of different id and multiple majorVersion. Subsequently, table history can contain can contains close to a million rows.

I want to make the query as fast and optimized as possible. One possible solution is to use something like this:

update history
set majorVersion = (
  select top 1 majorVersion
  from definitions
  where definitions.id = history.id
  and definitions.revision >= history.revision
  order by definitions.majorVersion
  )
where history.majorVersion = 0; 

But the problem with this is that we are querying rows for every row in table history (which can be very large compared to definitions). Any suggestions on how to improve upon this?

Best Answer

One of two solutions

solution 1

If you are able to split the work in two update statements, first update all the matching revisions

UPDATE h
set h.majorVersion = d.majorVersion
FROM dbo.history h
INNER JOIN
dbo.definitions d
ON d.id = h.id
and d.revision = h.revision;

Then use a CTE and windowing functions to update the majorVersion's that are still 0 on the updated table

;WITH CTE
AS
(
SELECT ID,revision, MAX(majorVersion) OVER (PARTITION BY Value2) as majorVersionUpdated,majorVersion
FROM
(
    SELECT ID, majorVersion,revision
        ,COUNT(case when majorVersion = 0 then NULL else majorVersion END) OVER (ORDER BY ID DESC,revision desc) AS Value2
    FROM dbo.history
) a
)

UPDATE CTE
SET majorVersion = majorVersionUpdated
WHERE majorVersion = 0;

SELECT * FROM dbo.history;

An index like this could help

CREATE INDEX IX_id_revision
on dbo.history(id,revision)
include(majorVersion)

Result

SELECT * FROM dbo.history;

id                                 revision majorVersion
9F717823-B9CA-4C7B-97F9-7770AAAFB468    1   1
9F717823-B9CA-4C7B-97F9-7770AAAFB468    2   1
9F717823-B9CA-4C7B-97F9-7770AAAFB468    3   1
9F717823-B9CA-4C7B-97F9-7770AAAFB468    4   2
9F717823-B9CA-4C7B-97F9-7770AAAFB468    5   2
9F717823-B9CA-4C7B-97F9-7770AAAFB468    6   3
9F717823-B9CA-4C7B-97F9-7770AAAFB468    7   3
546EF185-54AC-4AF8-82C6-61EFA3202353    1   1
546EF185-54AC-4AF8-82C6-61EFA3202353    2   2
546EF185-54AC-4AF8-82C6-61EFA3202353    3   2

SQL Fiddle


solution 2, still two updates but with a CROSS APPLY and a self join. The self join is not ideal.

First update all the matching revisions

UPDATE h
set h.majorVersion = d.majorVersion
FROM dbo.history h
INNER JOIN
dbo.definitions d
ON d.id = h.id
and d.revision = h.revision;

Then do a self join to update the other majorVersions to the next one that is not 0

UPDATE H 
SET h.majorVersion = h2.majorVersion
FROM dbo.history h
CROSS APPLY(
SELECT MIN(CASE WHEN h2.majorVersion = 0 
           THEN NULL 
           ELSE h2.majorVersion END) as majorVersion 
FROM
dbo.history h2 
WHERE h2.id = h.id 
AND h2.revision >= h.revision
) h2
WHERE h.majorVersion = 0;


SELECT * from dbo.history;

SQLFiddle

You would have to add BEGIN TRANSACTION ... COMMIT TRANSACTION to the updates if they have to be executed in one batch. The fastest way will depend on additional factors, such as your indexes & data, more than an example can show. YMMV