SQL Server – Update Values in Table Based on Ranges Defined in Another Table

sql serversql-server-2017update

I have two tables definitions and history with the following schema:

create table definitions (
    id UNIQUEIDENTIFIER,
    revision INT,
    majorVersion INT
  );

create table history (
   id UNIQUEIDENTIFIER,
   revision INT
   );

I want to update table history with another column majorVersion where the value ranges are defined in table definitions. For example, if records in definitions look like:

+--------------------------------------+-----------+---------------+
|                  id                  |  revision |  majorVersion |
+--------------------------------------+-----------+---------------+
| 9f717823-b9ca-4c7b-97f9-7770aaafb468 |         2 |             1 |
| 9f717823-b9ca-4c7b-97f9-7770aaafb468 |         4 |             2 |
+--------------------------------------+-----------+---------------+

Then the updated table history should look something like this:

+--------------------------------------+-----------+---------------+
|                  id                  |  revision |  majorVersion |
+--------------------------------------+-----------+---------------+
| 9f717823-b9ca-4c7b-97f9-7770aaafb468 |         1 |             1 |
| 9f717823-b9ca-4c7b-97f9-7770aaafb468 |         2 |             1 |
| 9f717823-b9ca-4c7b-97f9-7770aaafb468 |         3 |             2 |
| 9f717823-b9ca-4c7b-97f9-7770aaafb468 |         4 |             2 |
+--------------------------------------+-----------+---------------+

Here is a SQL Fiddle with a more number of rows.
The table definitions can contain potentially thousands of different id and multiple majorVersion. Subsequently, table history can contain can contains close to a million rows.

I want to make the query as fast and optimized as possible. One possible solution is to use something like this:

update history
set majorVersion = (
  select top 1 majorVersion
  from definitions
  where definitions.id = history.id
  and definitions.revision >= history.revision
  order by definitions.majorVersion
  )
where history.majorVersion = 0;

But the problem with this is that we are querying rows for every row in table history (which can be very large compared to definitions). Any suggestions on how to improve upon this?

Best Answer

One of two solutions

solution 1

If you are able to split the work in two update statements, first update all the matching revisions

UPDATE h
set h.majorVersion = d.majorVersion
FROM dbo.history h
INNER JOIN
dbo.definitions d
ON d.id = h.id
and d.revision = h.revision;

Then use a CTE and windowing functions to update the majorVersion's that are still 0 on the updated table

;WITH CTE
AS
(
SELECT ID,revision, MAX(majorVersion) OVER (PARTITION BY Value2) as majorVersionUpdated,majorVersion
FROM
(
    SELECT ID, majorVersion,revision
        ,COUNT(case when majorVersion = 0 then NULL else majorVersion END) OVER (ORDER BY ID DESC,revision desc) AS Value2
    FROM dbo.history
) a
)

UPDATE CTE
SET majorVersion = majorVersionUpdated
WHERE majorVersion = 0;

SELECT * FROM dbo.history;

An index like this could help

CREATE INDEX IX_id_revision
on dbo.history(id,revision)
include(majorVersion)

Result

SELECT * FROM dbo.history;

id                                 revision majorVersion
9F717823-B9CA-4C7B-97F9-7770AAAFB468    1   1
9F717823-B9CA-4C7B-97F9-7770AAAFB468    2   1
9F717823-B9CA-4C7B-97F9-7770AAAFB468    3   1
9F717823-B9CA-4C7B-97F9-7770AAAFB468    4   2
9F717823-B9CA-4C7B-97F9-7770AAAFB468    5   2
9F717823-B9CA-4C7B-97F9-7770AAAFB468    6   3
9F717823-B9CA-4C7B-97F9-7770AAAFB468    7   3
546EF185-54AC-4AF8-82C6-61EFA3202353    1   1
546EF185-54AC-4AF8-82C6-61EFA3202353    2   2
546EF185-54AC-4AF8-82C6-61EFA3202353    3   2

SQL Fiddle

solution 2, still two updates but with a CROSS APPLY and a self join. The self join is not ideal.

First update all the matching revisions

UPDATE h
set h.majorVersion = d.majorVersion
FROM dbo.history h
INNER JOIN
dbo.definitions d
ON d.id = h.id
and d.revision = h.revision;

Then do a self join to update the other majorVersions to the next one that is not 0

UPDATE H 
SET h.majorVersion = h2.majorVersion
FROM dbo.history h
CROSS APPLY(
SELECT MIN(CASE WHEN h2.majorVersion = 0 
           THEN NULL 
           ELSE h2.majorVersion END) as majorVersion 
FROM
dbo.history h2 
WHERE h2.id = h.id 
AND h2.revision >= h.revision
) h2
WHERE h.majorVersion = 0;


SELECT * from dbo.history;

SQLFiddle

You would have to add BEGIN TRANSACTION ... COMMIT TRANSACTION to the updates if they have to be executed in one batch. The fastest way will depend on additional factors, such as your indexes & data, more than an example can show. YMMV

Related Solutions

Sql-server – Updating a local table with a per-row count(*) which is an aggregate of inner joins on remote server

Depending on your permissions, the linked server could be trying to stream all the data over locally and then doing filtering. References

You might be able to skip that pain by computing the total aggregate count first into a table on the local server and then beat against that.

CREATE TABLE #LOCAL
(
    package_uuid nvarchar(255) NOT NULL PRIMARY KEY CLUSTERED
,   [count] bigint
);

INSERT INTO
    #LOCAL
SELECT 
    p.package_uuid
,   count(d.external_identification) AS [count]
FROM 
    ServerB.DATABASE.dbo.package p
    INNER JOIN 
        ServerB.DATABASE.dbo.doc2 d
        ON p.package_id = d.package_id
GROUP BY 
    p.package_uuid;

Try running that query locally on ServerB first to get an understanding of the theoretical throughput without factoring in your network. You can then do some quick and dirty estimates based on data sizes (500 + 8 per row in temporary table) and then it depends on your network. Hopefully this is all local network.

If the time is significantly different between the run on ServerB and pulling it back over, then you might need to use the OPENQUERY syntax to force the join on the remote server. Code approximately

CREATE TABLE #LOCAL
(
    package_uuid nvarchar(255) NOT NULL PRIMARY KEY CLUSTERED
,   [count] bigint
);

INSERT INTO
    #LOCAL
SELECT
    OQ.package_uuid
,   OQ.[count]
FROM
    OPENQUERY(ServerB,
    N'
    SELECT 
        p.package_uuid
    ,   count(d.external_identification) AS [count]
    FROM 
        DATABASE.dbo.package p
        INNER JOIN 
            DATABASE.dbo.doc2 d
            ON p.package_id = d.package_id
    GROUP BY 
        p.package_uuid
    ) AS OQ;

MySQL – Updating InnoDB Table Based on Multiple Joins

I think this will do:

UPDATE 
    ( SELECT COUNT(DISTINCT o_id) AS total_oids 
      FROM  oak_relation 
    ) AS d
  CROSS JOIN
    oak_relation AS u
  JOIN
    ( SELECT o_id, COUNT(*) AS matching_o_ids 
      FROM  oak_relation
      GROUP BY o_id 
    ) AS o
      ON o.o_id = u.o_id
  JOIN
    ( SELECT k_id, COUNT(*) AS matching_k_ids 
      FROM oak_relation
      GROUP BY k_id 
    ) AS k
      ON k.k_id = u.k_id
SET
    u.xy = u.initial * d.total_oids / o.matching_o_ids / k.matching_k_ids ;

It would be good to check it out first by running the equivalent SELECT query:

SELECT
    u.o_id, u.k_id, u.initial,
    d.total_oids,
    o.matching_o_ids,
    k.matching_k_ids,
    u.initial * d.total_oids / o.matching_o_ids / k.matching_k_ids AS new_xy
FROM  
    ( SELECT COUNT(DISTINCT o_id) AS total_oids 
      FROM  oak_relation 
    ) AS d
  CROSS JOIN
    oak_relation AS u
  JOIN
    ( SELECT o_id, COUNT(*) AS matching_o_ids 
      FROM  oak_relation
      GROUP BY o_id 
    ) AS o
      ON o.o_id = u.o_id
  JOIN
    ( SELECT k_id, COUNT(*) AS matching_k_ids 
      FROM oak_relation
      GROUP BY k_id 
    ) AS k
      ON k.k_id = u.k_id ;

Best Answer

One of two solutions

Related Solutions

Sql-server – Updating a local table with a per-row count(*) which is an aggregate of inner joins on remote server

MySQL – Updating InnoDB Table Based on Multiple Joins

Related Question