Sql-server – Updating a large table in sql server

sql serverupdate

I have a wide table (80 columns) with 70 million rows in it. I can currently trying to set a foreign key across all rows within the table.

Approach one
An update statement over the entire table such as:

BEGIN TRAN
UPDATE Cx
SET Cx.CohortId = Chx.CohortId
FROM Customers Cx
INNER JOIN Cohorts Chx ON Chx.NameField = Cx.NameField
COMMIT TRAN

I was informed this has the potential to fill up the transaction log (please note recovery mode is currently set to simple) and it would be better to batch the updates:

Approach two
Update in batches

DECLARE @Rows INT,
    @BatchSize INT; 

SET @BatchSize = 20000;
SET @Rows = @BatchSize; 

WHILE (@Rows = @BatchSize)
BEGIN
UPDATE TOP(@BatchSize) Cx
SET Cx.CohortId = Chx.CohortId
FROM Customers Cx
INNER JOIN Cohorts Chx ON Chx.NameField = Cx.NameField
WHERE Cx.CohortId IS NULL -- Do not update records that have already been uplifted to include the FK

SET @Rows = @@ROWCOUNT
END

Does the second approach yield any benefit in terms of the transaction log on large table updates?

Best Answer

I was informed this has the potential to fill up the transaction log . . . and it would be better to batch the updates

It would be better if it didn't fill up the transaction log. But the proposed remedy is the wrong one.

You should size your transaction log to support your transactions, not the other way around.

Does the second approach yield any benefit in terms of the transaction log on large table updates?

Yes. The transaction log can be reused between batches, so long as all the batches aren't in a transaction.

increasing the disk space to support that would be troublesome

Yes, sometimes you do have to code around space issues, but coding isn't free. Spending money on infrastructure to reduce coding time is usually a wise move.

Also if you are trying to minimize log space, consider loading an empty table with your query, and then performing an ALTER TABLE … SWITCH. Loading the new table can be minimally logged, and the ALTER TABLE is a metadata operation.

This will also enable you to perform the operation in a transaction, and not commit intermediate results.