SQL Server – Why Removing LOWER Calls Changes Execution Plan

sql serversql-server-2012

While troubleshooting a slow query I was misled for a while by the actual execution plan telling me a sort late in the query where there should be few rows was taking >80% of the query time. At the time the query was built dynamically in a stored procedure and the bulk of the slowness was being caused by parameter sniffing, but for a long time I tried to figure out why it was taking so long to sort these rows.

Here's the query:

SELECT Id
 , FirstName
 , LastName
 , FullName
 , DateOfBirth
 , CityStateZip
 , DriversLicenseState
 , DriversLicenseNumber
FROM (SELECT c.EntityId Id
       , p.FirstName
       , p.LastName
       , p.FullName
       , c.DateOfBirth
       , c.UpdateDate
       , CityStateZip = CASE a.EntityAddressId
           WHEN NULL THEN NULL
           ELSE CONCAT(a.City, ', ', s.Code, ' ', a.Zip)
           END
       , ds.Value DriversLicenseState
       , dn.Value DriversLicenseNumber
       , ROW_NUMBER() OVER (
               PARTITION BY p.FirstName, p.LastName, c.DateOfBirth
                    , ds.Value, dn.Value 
               ORDER BY c.UpdateDate DESC) RowNum
  FROM Store.Customer c
  INNER JOIN Entity.Person p ON c.EntityId = p.EntityId
  LEFT JOIN Entity.EntityAddress a ON c.EntityId = a.EntityId
  LEFT JOIN Vendor.StateProvince s ON a.StateProvinceId = s.StateProvinceId
  LEFT JOIN (
         SELECT ca.CustomerId, ca.Value
         FROM Store.CustomerAttribute ca
         INNER JOIN Models.Attribute a ON ca.AttributeId = a.AttributeId
         WHERE a.AttributeCode = 'DriversLicenseState') as ds
             ON c.EntityId = ds.CustomerId
  LEFT JOIN (
         SELECT ca.CustomerId, ca.Value
         FROM Store.CustomerAttribute ca
         INNER JOIN Models.Attribute a ON ca.AttributeId = a.AttributeId
         WHERE a.AttributeCode = 'DriversLicenseNumber') as dn 
             ON c.EntityId = dn.CustomerId
  WHERE LOWER(FirstName) LIKE '%bob%'
    AND LOWER(LastName) LIKE '%smith%'
    AND a.EntityAddressTypeId = 0 ) c
WHERE RowNum = 1
ORDER BY UpdateDate DESC

And here's the relevant part of the execution plan:

Execution plan with LOWER

Notice the large disparity between the estimated and actual number of rows.

If I remove the 2 LOWER calls from the inner WHERE clause and change nothing else, the execution plan steps are identical, but the cost of the steps are different. The same bits of the execution plan:

Execution plan without LOWER

I cleared the plan cache between executions. Is this a bug?

Best Answer

SQL Server maintains statistics on substrings in string columns in the form of tries that are usable by LIKE queries.

See the String Summary Statistics section for more about this.

Wrapping the column in a function call quite likely blocks any attempt to sniff the value of the parameters and use the statistics on the column to estimate the number of matching rows and so it falls back on guesses. The differing cardinality estimates can cause different execution plans.