SQL Server 2012 – Updating Large Table and Indexes

indexsql-server-2012update

I'm trying to update a table with about 6.5 million rows in a reasonable amount of time, but I'm having some issues. We're adding a new column to an existing table and then setting a value for all rows based on the data in a column in another table.

UPDATE  TOP (20000) c
SET     c.NewColumn = ISNULL(p.Col1, p.Col2)
FROM    dbo.Child c
    INNER JOIN dbo.Parent p on c.FKId = p.Id
WHERE   c.NewColumn IS NULL

in a loop, like this article. The update was still running after 2.5 hours. I'm wondering if disabling indexes on dbo.Child would make an impact. NewColumn has no indexes, nor will it, but there are other indexes (about 5) on dbo.Child

Is SQL Server smart enough to see that it doesn't need to update the other indexes (as they are not part of the UPDATE), or would we benefit from temporarily disabling the indexes while we do the update statements?

This is SQL Server 2012 but the DB in question is in 2008 compatibility mode.

Best Answer

You say that the UPDATE is running in a loop and was still running after 2.5 hours. Did that loop update any rows at all? Was it the loop itself that was taking a long time or a single UPDATE statement? That is a somewhat important distinction and is currently ambiguous in the given information. However, here are some things to note based on things said in the question:

  • No, SQL Server should not be updating indexes that do not use the new column

  • Disabling indexes is not simply a matter of turning them off. It actually drops all of the index pages and keeps only the structure so that you don't need to run the full CREATE statement again. But re-enabling an index will have to rebuild it.

And here are some things to look for:

  • Are there any UPDATE triggers on the Child table?

  • What is the auto-growth setting on the transaction log? If set to a very low number it could be that the UPDATEs are impeded by a large number of auto-growth operations.

  • Of those 5 indexes on the Child table, do any of them have FKId as the leading column?

  • Have you tried lower batch sizes, or just 20k? I have seen several people randomly recommend large (10k - 50k) batch sizes for these types of operations without considering that not only does it limit the operation to that number, but it also hints to the Query Optimizer that the number is possible to hit in the first place. Hence, if there are only 5000 rows to find then it might keep looking past those to see if any others meet the c.NewColumn IS NULL criteria. And considering that the field is not indexed, all the Q.O. has to go on are the auto-generated statistics which don't tell it where to find the remaining NULL rows. Assuming that the first several iterations of the loop did succeed, SQL Server still has to find 20,000 rows that are NULL, out of the 6.5 million, by scanning as many as it takes, in no particular order, before it finds them. Hence the first few iterations probably happened quickly, but this type of operation slows down quickly as each successive pass needs to scan more and more of the records before it can find the 20,000 that meet the IS NULL criteria.

    So, two things to consider:

    • Start out with a small batch size, maybe 10 rows, and run that statement by itself (i.e. not in the loop). Does it come back? If not then you need to answer @Aaron's question about what the WAIT_RESOURCE is. If it does come back, increase the batch size to 100, then 500, 1000, 2000, 5000, and then 10000. It will take more loops to complete, but finding the required number of records to UPDATE will be easier to accomplish. Once you find a size that seems responsive, then you can run the loop again (I am guessing that somewhere in the 2000 - 5000 range is gonna work :)
    • Rather than get rid of indexes, you might actually want to create an index, temporarily, to support this particular UPDATE operation. A filtered index (which was introduced in SQL Server 2008 so the compatibility mode shouldn't pose a problem) will allow you to target the remaining rows to update. Something like:

      CREATE NONCLUSTERED INDEX [IX_Child_FKId_temp]
        ON dbo.Child ([FKId] ASC) -- you might need to include [NewColumn] ASC after [FKId]
        WHERE [NewColumn] IS NULL
        WITH ( FILLFACTOR = 100 );