Sql-server – Understanding why a query is slow

performancequery-performancesql serversql-server-2016update

I have a new column in SQL Server 2016, on a table of type bit. (I wanted it to be a persisted calculated column but was overruled and asked to make it a bit that is maintained by program code). Its default value is 0.

I used the following SQL excerpt to attempt to set the initial values for existing rows in a dev environment but had to terminate it after 5 minutes.

update dbo.tableName set newColumn = 

cast(
    (
        case when
            (
            isNull(colA,'') <> ''
            or isNull(colB,datefromparts(1901,1,1)) > datefromparts(1901,1,1)
            or colC is not null
            or isNull(colD,'') <> ''
            )
        then 1 
        else 0 
        end
    ) 
as bit);

The table has about 92,000 rows and about 3,200 should have the value 1 set. (I understand the above would also be setting the remaining 88,800 row values to 0 even though the default constraint has already set them to 0 but a SELECT using the above logic in a WHERE clause executes within 1 second).

My question is why the above should take so long, and how would I identify the root cause? (Should I obtain a query plan and work from there? What would I look for?)

I re-wrote the update as follows and it completed in 3 seconds.

with cteCommonTableExpression (cteIdColumn)
as
(
    select dbo.tableName.idColumn from dbo.tableName
        where
            case when
                (
                isNull(colA,'') <> ''
                or isNull(colB,datefromparts(1901,1,1)) > datefromparts(1901,1,1)
                or colC is not null
                or isNull(colD,'') <> ''
                )
            then 1 
            else 0 
            end
        = 1
)

update dbo.tableName set newColumn = 1
where dbo.tableName.idColumn in (select cteIdColumn from cteCommonTableExpression);

The following – which I expect should be logically identical to the last section of code – also completes in 3 seconds.

update dbo.tableName set newColumn = 1
where dbo.tableName.idColumn in 
(select tn2.idColumn from dbo.tableName tn2 where case when
                (
                isNull(tn2.colA,'') <> ''
                or isNull(tn2.colB,datefromparts(1901,1,1)) > datefromparts(1901,1,1)
                or tn2.colC is not null
                or isNull(tn2.colD,'') <> ''
                )
            then 1 
            else 0 
            end
        = 1)

Shortly after posting here I obtained the estimated query plan which reported 89% of the cost goes to a clustered index update on the primary key. The new column is not involved in the primary key. The question becomes: why does this update require a clustered index update (especially while the alternative updates apparently do not)?

Best Answer

Without seeing the plan...

Updating all 92k rows

  • scans the table start to end
  • updates all rows (whether value is the same or not)
  • clustered index leaf = the actual data

= you are actually doing clustered index update

When you updating 3,200 rows means you are doing 96.5% less data changes work (3200 is 3.5% of 92000) so of course it will run faster

Also the transaction size (for rollback) in the transaction log will be a lot bigger. This needs allocated and may need a log file size increase

Basically, update only the rows you need...