Sql-server – Understanding why a query is slow

I have a new column in SQL Server 2016, on a table of type bit. (I wanted it to be a persisted calculated column but was overruled and asked to make it a bit that is maintained by program code). Its default value is 0.

I used the following SQL excerpt to attempt to set the initial values for existing rows in a dev environment but had to terminate it after 5 minutes.

update dbo.tableName set newColumn = 

cast(
    (
        case when
            (
            isNull(colA,'') <> ''
            or isNull(colB,datefromparts(1901,1,1)) > datefromparts(1901,1,1)
            or colC is not null
            or isNull(colD,'') <> ''
            )
        then 1 
        else 0 
        end
    ) 
as bit);

The table has about 92,000 rows and about 3,200 should have the value 1 set. (I understand the above would also be setting the remaining 88,800 row values to 0 even though the default constraint has already set them to 0 but a SELECT using the above logic in a WHERE clause executes within 1 second).

My question is why the above should take so long, and how would I identify the root cause? (Should I obtain a query plan and work from there? What would I look for?)

I re-wrote the update as follows and it completed in 3 seconds.

with cteCommonTableExpression (cteIdColumn)
as
(
    select dbo.tableName.idColumn from dbo.tableName
        where
            case when
                (
                isNull(colA,'') <> ''
                or isNull(colB,datefromparts(1901,1,1)) > datefromparts(1901,1,1)
                or colC is not null
                or isNull(colD,'') <> ''
                )
            then 1 
            else 0 
            end
        = 1
)

update dbo.tableName set newColumn = 1
where dbo.tableName.idColumn in (select cteIdColumn from cteCommonTableExpression);

The following – which I expect should be logically identical to the last section of code – also completes in 3 seconds.

update dbo.tableName set newColumn = 1
where dbo.tableName.idColumn in 
(select tn2.idColumn from dbo.tableName tn2 where case when
                (
                isNull(tn2.colA,'') <> ''
                or isNull(tn2.colB,datefromparts(1901,1,1)) > datefromparts(1901,1,1)
                or tn2.colC is not null
                or isNull(tn2.colD,'') <> ''
                )
            then 1 
            else 0 
            end
        = 1)

Shortly after posting here I obtained the estimated query plan which reported 89% of the cost goes to a clustered index update on the primary key. The new column is not involved in the primary key. The question becomes: why does this update require a clustered index update (especially while the alternative updates apparently do not)?

Best Answer

Without seeing the plan...

Updating all 92k rows

scans the table start to end
updates all rows (whether value is the same or not)
clustered index leaf = the actual data

= you are actually doing clustered index update

When you updating 3,200 rows means you are doing 96.5% less data changes work (3200 is 3.5% of 92000) so of course it will run faster

Also the transaction size (for rollback) in the transaction log will be a lot bigger. This needs allocated and may need a log file size increase

Basically, update only the rows you need...

Best Answer

Related Solutions

Mysql – Why I cannot let EXPLAIN update statements

Sql-server – Slow SQL Server query using a DB2 linked server in virtual environment

Related Question