SQL Server – How to Fix Simultaneous Parallel Table Updates?

sql serversql server 2014

I have to update all records (add Guids) on two (indexed) empty columns of 150 tables, each table with around 50k records (using a script to create 40k updates at once in c# and post it to the server) and exactly 4 existing columns.

On my local machine (16GB RAM, 500GB Samsung 850, SQL Server 2014, core i5) when I try to run 10 tables in parallel it takes a total of 13 minutes, while if I run 5 the process finishes in mere 1.7 minutes.

I do understand that something is busy on the disk level, but I need some help in how to quantify this huge difference in timings.

Is there a exact SQL Server DB view that I can check this discrepancy? Is there an exact way to figure out for a given hardware how many table updates can I run in parallel?? (the real test server has more RAM and 10k rpm disks).

Can anyone point to something that I can improve on the SQL Server to improve the timings for the running 10 tables in parallel?

I already tried increasing the Auto Growth size to 100MB from 10MB which improves the Disk Queue length (from around 5 to 0.1) but it does not actually decrease the total time that much.

I have asked the exact same question on stackoverflow, but not getting any helpful answers so far, so some or any insight/help would be immensely helpful. 🙂

Best Answer

Given the code in your answer, you would most likely improve performance by doing the following two changes:

  • Start the query batch with BEGIN TRAN and end the batch with COMMIT TRAN:

  • Decrease the number of updates per batch to less than 5000 to avoid lock escalation (which generally occurs at 5000 locks). Try 4500.

Doing those two things should decrease the massive amount of tran log writes and lock / unlock operations that you are currently generating by doing individual DML statements.

Example:

conn.Open();
using (SqlCommand cmd = new SqlCommand(
      @"BEGIN TRAN;
        UPDATE [TestTable] SET Column5 = 'some unique value' WHERE ID = 1;
        UPDATE [TestTable] SET Column5 = 'some unique value' WHERE ID = 2;
        ...
        UPDATE [TestTable] SET Column5 = 'some unique value' WHERE ID = 4500;
        COMMIT TRAN;
        ", conn));

UPDATE

The Question is a bit sparse on the details. The example code is only shown in an answer.

One area of confusion is that the description mentions updating two columns, yet the example code only shows a single column being updated. My answer above was based on the code, hence it only shows a single column. If there really are two columns to update, then both columns should be updated in the same UPDATE statement:

conn.Open();
using (SqlCommand cmd = new SqlCommand(
      @"BEGIN TRAN;
        UPDATE [TestTable]
        SET    Column5 = 'some unique value',
               ColumnN = 'some unique value'
        WHERE  ID = 1;
        UPDATE [TestTable]
               SET Column5 = 'some unique value',
               SET ColumnN = 'some unique value'
        WHERE  ID = 2;
        ...
        UPDATE [TestTable]
               SET Column5 = 'some unique value',
               SET ColumnN = 'some unique value'
        WHERE  ID = 4500;
        COMMIT TRAN;
        ", conn));

Another issue that is unclear is where is the "unique" data coming from? The Question mentions that the unique values are GUIDs. Are these being generated in the app layer? Are they coming from another data source that the app layer knows about and the database does not? This is important because, depending on the answers to these questions, it might make sense to ask:

  1. Can the GUIDs be generated in SQL Server instead?
  2. If yes to #1, then is there any reason to generate this code from app code instead of doing a simple batch loop in T-SQL?

If "yes" to #1 but the code, for whatever reason, needs to be generated in .NET, then you can use NEWID() and generate UPDATE statements that work on ranges of rows, in which case you do not need the BEGIN TRAN / 'COMMIT` since each statement can handle all 4500 rows in one shot:

conn.Open();
using (SqlCommand cmd = new SqlCommand(
      @"UPDATE [TestTable]
        SET    Column5 = NEWID(),
               ColumnN = NEWID()
        WHERE  ID BETWEEN 1 and 4500;
        ", conn));

If "yes" to #1 and there is no real reason for these UPDATEs to be generated in .NET, then you can just do the following:

DECLARE @BatchSize INT = 4500, -- this could be an input param for a stored procedure
        @RowsAffected INT = 1, -- needed to enter loop
        @StartingID INT = 1; -- initial value

WHILE (@RowsAffected > 0)
BEGIN
  UPDATE TOP (@BatchSize) tbl
  SET    tbl.Column5 = NEWID(),
         tbl.ColumnN = NEWID()
  FROM   [TestTable] tbl
  WHERE  tbl.ID BETWEEN @StartingID AND (@StartingID + @BatchSize - 1);

  SET @RowsAffected = @@ROWCOUNT;
  SET @StartingID += @BatchSize;
END;

The code above only works if the ID values are not sparse, or at least if the values do not have gaps larger than @BatchSize, such that there is at least 1 row updated in each iteration. This code also assumes that the ID field is the Clustered Index. These assumptions seem reasonable given the provided example code.

However, if the ID values do have large gaps, or if the ID field is not the Clustered Index, then you can just test for rows that do not already have a value:

DECLARE @BatchSize INT = 4500, -- this could be an input param for a stored procedure
        @RowsAffected INT = 1; -- needed to enter loop

WHILE (@RowsAffected > 0)
BEGIN
  UPDATE TOP (@BatchSize) tbl
  SET    tbl.Column5 = NEWID(),
         tbl.ColumnN = NEWID()
  FROM   [TestTable] tbl
  WHERE  tbl.Col1 IS NULL;

  SET @RowsAffected = @@ROWCOUNT;
END;

BUT, if "no" to #1 and the values are coming from .NET for a good reason, such as the unique values per each ID already exist in another source, then you can still speed this up (beyond my initial suggestion) by supplying a derived table:

conn.Open();
using (SqlCommand cmd = new SqlCommand(
      @"BEGIN TRAN;

        UPDATE tbl
        SET    tbl.Column5 = tmp.Col1,
               tbl.ColumnN = tmp.Col2
        FROM   [TestTable] tbl
        INNER JOIN (VALUES
          (1, 'some unique value A', 'some unique value B'),
          (2, 'some unique value C', 'some unique value D'),
          ...
          (1000, 'some unique value N1', 'some unique value N2')
                   ) tmp (ID, Col1, Col2)
                ON tmp.ID = tbl.ID;

        UPDATE tbl
        SET    tbl.Column5 = tmp.Col1,
               tbl.ColumnN = tmp.Col2
        FROM   [TestTable] tbl
        INNER JOIN (VALUES
          (1001, 'some unique value A2', 'some unique value B2'),
          (1002, 'some unique value C2', 'some unique value D2'),
          ...
          (2000, 'some unique value N3', 'some unique value N4')
                   ) tmp (ID, Col1, Col2)
                ON tmp.ID = tbl.ID;

        COMMIT TRAN;
        ", conn));

I believe the limit on the number of rows that can be joined via VALUES is 1000, so I grouped two sets together in an explicit transaction. You could test with up to 4 sets of these UPDATEs to do 4000 per transaction and keep below the limit of 5000 locks.