Sql-server – Execution plan shows expensive CONVERT_IMPLICIT operation. Can I fix this with indexing or do I need to change the table

sql serverview

I have a really important, really slow view which includes some really ugly conditions like this in its where clause. I am also aware that the joins are gross and slow joins on varchar(13) instead of integer identity fields, but would like to improve the the simple query below that uses this view:

CREATE VIEW [dbo].[vwReallySlowView]  AS  
AS  
SELECT     
  I.booking_no_v32 AS bkno, 
  I.trans_type_v41 AS trantype, 
  B.Assigned_to_v61 AS Assignbk, 
  B.order_date AS dateo, B.HourBooked AS HBooked,   
  B.MinBooked AS MBooked, B.SecBooked AS SBooked, 
  I.prep_on AS Pon, I.From_locn AS Flocn, 
  I.Trans_to_locn AS TTlocn,   
                      (CASE I.prep_on WHEN 'Y' THEN I.PDate ELSE I.FirstDate END) AS PrDate, I.PTimeH AS PrTimeH, I.PTimeM AS PrTimeM,   
                      (CASE WHEN I.RetnDate < I.FirstDate THEN I.FirstDate ELSE I.RetnDate END) AS RDatev, I.bit_field_v41 AS bitField, I.FirstDate AS FDatev, I.BookDate AS DBooked,   
                      I.TimeBookedH AS TBookH, I.TimeBookedM AS TBookM, I.TimeBookedS AS TBookS, I.del_time_hour AS dth, I.del_time_min AS dtm, I.return_to_locn AS rtlocn,   
                      I.return_time_hour AS rth, I.return_time_min AS rtm, (CASE WHEN I.Trans_type_v41 IN (6, 7) AND (I.Trans_qty < I.QtyCheckedOut)   
                      THEN 0 WHEN I.Trans_type_v41 IN (6, 7) AND (I.Trans_qty >= I.QtyCheckedOut) THEN I.Trans_Qty - I.QtyCheckedOut ELSE I.trans_qty END) AS trqty,   
                      (CASE WHEN I.Trans_type_v41 IN (6, 7) THEN 0 ELSE I.QtyCheckedOut END) AS MyQtycheckedout, (CASE WHEN I.Trans_type_v41 IN (6, 7)   
                      THEN 0 ELSE I.QtyReturned END) AS retqty, I.ID, B.BookingProgressStatus AS bkProg, I.product_code_v42, I.return_to_locn, I.AssignTo, I.AssignType,   
                      I.QtyReserved, B.DeprepOn,  
        (CASE  B.DeprepOn       
        WHEN 1 THEN  B.DeprepDateTime     
        ELSE   I.RetnDate  
           END)  AS DeprepDateTime, I.InRack 
FROM         dbo.tblItemtran AS I 

INNER JOIN  -- booking_no = varchar(13)
         dbo.tblbookings AS B ON B.booking_no = I.booking_no_v32  --  string inner-join

INNER JOIN  -- product_code = varchar(13) 
        dbo.tblInvmas AS M ON I.product_code_v42 = M.product_code  --  string inner-join

WHERE     (I.trans_type_v41 NOT IN (2, 3, 7, 18, 19, 20, 21, 12, 13, 22)) AND (I.trans_type_v41 NOT IN (6, 7)) AND (I.bit_field_v41 & 4 = 0) OR  
                      (I.trans_type_v41 NOT IN (6, 7)) AND (I.bit_field_v41 & 4 = 0) AND (B.BookingProgressStatus = 1) OR  
                      (I.trans_type_v41 IN (6, 7)) AND (I.bit_field_v41 & 4 = 0) AND (I.QtyCheckedOut = 0) OR  
                      (I.trans_type_v41 IN (6, 7)) AND (I.bit_field_v41 & 4 = 0) AND (I.QtyCheckedOut > 0) AND (I.trans_qty - (I.QtyCheckedOut - I.QtyReturned) > 0)  

This view is usually used like this:

select * from vwReallySlowView
where product_code_v42  = 'LIGHTBULB100W'  -- find "100 watt lightbulb" rows

When I run it I get this execution plan item costing 20 to 80% of the total cost of the batch, with predicate CONVERT_IMPLICIT( .... &(4)) showing that it seems to be very slow at doing these bitwise boolean tests like (I.ibitfield & 4 = 0).

I am not an expert in MS SQL or at DBA type work in general as I'm a non-SQL software developer most of the time. But I suspect that such bitwise combinations are a bad idea, and that it would have been better to have discrete boolean fields.

Could I somehow improve this index I have, to better handle this view without changing the schema (which is already in production at thousands of locations) or must I change the underlying table which has several boolean values packed into an integer bit_field_v41, to fix this problem?

Here's my Clustered Index on tblItemtran which is being scanned in this execution plan:

-- goal:  speed up  select * from vwReallySlowView where productcode  = 'X'
CREATE CLUSTERED INDEX [idxtblItemTranProductCodeAndTransType] ON [dbo].[tblItemtran] 
(
    [product_code_v42] ASC,  -- varchar(13)
    [trans_type_v41] ASC     -- int
)WITH ( PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, 
        IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, 
        ALLOW_PAGE_LOCKS  = ON) 
ON [PRIMARY]

Here is the execution plan, for one of the other products that results in a 27% cost on this CONVERT_IMPLICIT predicate. update Note that in this case, my worst-node is now "hash match" on an inner join, which is costing 34% I believe this is a cost I can't avoid unless I could avoid doing joins on strings which I can't currently get rid of. Both the INNER JOIN operations in the view above are on varchar(13) fields.

Zoomed in on bottom right corner:

enter image description here

Entire execution plan as .sqlplan is available on skydrive. This image is just a visual overview. Click here to see the image by itself.

enter image description here

Update posted entire execution plan. I can't seem to find what product_code value was pathologically bad, but one way to do that is to select count(*) from view instead of doing a single product. But products which are only used in 5% of the records in the underlying table or less seem to show a lot lower costs in the CONVERT_IMPLICIT operation. If I was going to fix the SQL here, I think I'd take the gross WHERE clause in the view, and calculate and store the result of that giant where-clause-condition as an "IncludeMeInTheView" bit-field, in the underlying table. Presto, problem solved, right?

Best Answer

You shouldn't rely too much on cost percentages in execution plans. These are always estimated costs, even in post-execution plans with 'actual' numbers for things like row counts. The estimated costs are based on a model that happens to work pretty well for the purpose it is intended for: enabling the optimizer to choose between different candidate execution plans for the same query. The cost information is interesting, and a factor to consider, but it should rarely be a primary metric for query tuning. Interpreting execution plan information requires a broader view of the presented data.

ItemTran Clustered Index Seek Operator

ItemTran Clustered Index Seek

This operator is really two operations in one. First an index seek operation finds all rows that match the predicate product_code_v42 = 'M10BOLT', then each row has the residual predicate bit_field_v41 & 4 = 0 applied. There is an implicit conversion of bit_field_v41 from its base type (tinyint or smallint) to integer.

The conversion occurs because the bitwise-AND operator (&) requires both operands to be of the same type. The implicit type of the constant value '4' is integer and the data type precedence rules mean the lower-priority bit_field_v41 field value is converted.

The problem (such as it is) is easily corrected by writing the predicate as bit_field_v41 & CONVERT(tinyint, 4) = 0 - meaning the constant value has the lower priority and is converted (during constant folding) rather than the column value. If the bit_field_v41 is tinyint no conversions occur at all. Likewise, CONVERT(smallint, 4) could be used if bit_field_v41 is smallint. That said, the conversion is not a performance issue in this case, but it is still good practice to match types and avoid implicit conversions where possible.

The major part of the estimated cost of this seek is down to the size of the base table. While the clustered index key is itself reasonably narrow, the size of each row is large. A definition for the table is not given, but just the columns used in the view add up to a significant row width. Since the clustered index includes all columns, the distance between clustered index keys is the width of the row, not the width of the index keys. The use of version suffixes on some columns suggests the real table has even more columns for previous versions.

Looking at the seek, residual predicate and output columns, the performance of this operator could be checked in isolation by building the equivalent query (the 1 <> 2 is a trick to prevent auto-parameterization, the contradiction is removed by the optimizer and does not appear in the query plan):

SELECT
    it.booking_no_v32,
    it.QtyCheckedOut,
    it.QtyReturned,
    it.Trans_qty,
    it.trans_type_v41
FROM dbo.tblItemTran AS it
WHERE
    1 <> 2
    AND it.product_code_v42 = 'M10BOLT'
    AND it.bit_field_v41 & CONVERT(tinyint, 4) = 0;

The performance of this query with a cold data cache is of interest, since read-ahead would be affected by table (clustered index) fragmentation. The clustering key for this table invites fragmentation, so it could be important to maintain (reorganize or rebuild) this index regularly, and use an appropriate FILLFACTOR to allow space for new rows between index maintenance windows.

I performed a test of the effect of fragmentation on read-ahead using sample data generated using SQL Data Generator. Using the same table row counts as shown in the question's query plan, a highly fragmented clustered index resulted in SELECT * FROM view taking 15 seconds after DBCC DROPCLEANBUFFERS. The same test in the same conditions with a freshly-rebuilt clustered index on the ItemTrans table completed in 3 seconds.

If the table data is typically entirely in cache the fragmentation issue is very much less important. But, even with low fragmentation, the wide table rows might mean the number of logical and physical reads is much higher than might be expected. You could also experiment with adding and removing the explicit CONVERT to validate my expectation that the implicit conversion issue is not important here, except as a best practice violation.

More to the point is the estimated number of rows leaving the seek operator. The optimization-time estimate is 165 rows, but 4,226 were produced at execution time. I will return to this point later, but the main reason for the discrepancy is that the selectivity of the residual predicate (involving the bitwise-AND) is very hard for the optimizer to predict - in fact it resorts to guessing.

Filter Operator

Filter Operator

I am showing the filter predicate here mostly to illustrate how the two NOT IN lists are combined, simplified and then expanded, and also to provide a reference for the following hash match discussion. The test query from the seek can be expanded to incorporate its effects and determine the effect of the Filter operator on performance:

SELECT
    it.booking_no_v32,
    it.trans_type_v41,
    it.Trans_qty,
    it.QtyReturned,
    it.QtyCheckedOut
FROM dbo.tblItemTran AS it
WHERE
    it.product_code_v42 = 'M10BOLT'
    AND it.bit_field_v41 & CONVERT(tinyint, 4) = 0
    AND
    (
        (
            it.trans_type_v41 NOT IN (2, 3, 6, 7, 18, 19, 20, 21, 12, 13, 22)
            AND it.trans_type_v41 NOT IN (6, 7)
        )
        OR
        (
            it.trans_type_v41 NOT IN (6, 7)
        )
        OR 
        (
            it.trans_type_v41 IN (6, 7)
            AND it.QtyCheckedOut = 0
        )
        OR 
        (
            it.trans_type_v41 IN (6, 7)
            AND it.QtyCheckedOut > 0
            AND it.trans_qty - (it.QtyCheckedOut - it.QtyReturned) > 0
        )
    );

The Compute Scalar operator in the plan defines the following expression (the calculation itself is deferred until the result is required by a later operator):

[Expr1016] = (trans_qty - (QtyCheckedOut - QtyReturned))

The Hash Match Operator

Performing a join on character data types is not the reason for the high estimated cost of this operator. The SSMS tooltip shows only a Hash Keys Probe entry, but the important details are in the SSMS Properties window.

The Hash Match operator builds a hash table using the values of the booking_no_v32 column (Hash Keys Build) from the ItemTran table, and then probes for matches using the booking_no column (Hash Keys Probe) from the Bookings table. The SSMS tooltip would also normally show a Probe Residual, but the text is far too long for a tooltip and is simply omitted.

A Probe Residual is similar to the Residual seen after the index seek earlier; the residual predicate is evaluated on all rows that hash match to determine if the row should be passed to the parent operator. Finding hash matches in a well-balanced hash table is extremely fast, but applying a complex residual predicate to each row that matches is quite slow by comparison. The Hash Match tooltip in Plan Explorer shows the details, including the Probe Residual expression:

Hash Match Operator

The residual predicate is complex and includes the booking progress status check now that column is available from the bookings table. The tooltip also shows the same discrepancy between estimated and actual rows counts seen earlier in the index seek. It may seem odd that much of the filtering is performed twice, but this is just the optimizer being optimistic. It does not expect the parts of the filter which can be pushed down the plan from the probe residual to eliminate any rows (the row count estimates are the same before and after the filter) but the optimizer knows it might be wrong about that. The chance of filtering rows early (reducing the cost of the hash join) is worth the small cost of the extra filter. The whole filter cannot be pushed down because it includes a test on a column from the bookings table, but most of it can be. The full test is still required in the hash residual test for correctness.

The row count underestimate is an issue for the Hash Match operator because the amount of memory reserved for the hash table is based on the estimated number of rows. Where the memory is too small for the size of hash table required at run time (due to the larger number of rows) the hash table recursively spills to physical tempdb storage, often resulting in very poor performance. In the worst case, the execution engine stops recursively spilling hash buckets and resorts to a very slow bailout algorithm. Hash spilling (recursive or bailout) is the most likely cause of the performance problems outlined in the question (not character-type join columns or implicit conversions). The root cause would be the server reserving too little memory for the query based on incorrect row count (cardinality) estimation.

Sadly, before SQL Server 2012, there is no indication in the execution plan that a hashing operation exceeded its memory allocation (which cannot dynamically grow after being reserved before execution starts, even if the server has masses of free memory) and had to spill to tempdb. It is possible to monitor the Hash Warning Event Class using Profiler, but it can be difficult to correlate the warnings with a particular query.

Correcting the problems

The three issues are fragmentation, the complex probe residual in the hash match operator and the incorrect cardinality estimate resulting from the guessing at the index seek.

Recommended solution

Check the fragmentation and correct it if necessary, scheduling maintenance to ensure the index stays acceptably organized. The usual way to correct the cardinality estimate is to provide statistics. In this case, the optimizer needs statistics for the combination (product_code_v42, bitfield_v41 & 4 = 0). We cannot create statistics on an expression directly, so we must first create a computed column for the bit field expression, and then create the manual multi-column statistics:

ALTER TABLE dbo.tblItemTran
ADD Bit3 AS bit_field_v41 & CONVERT(tinyint, 4);

CREATE STATISTICS [stats dbo.ItemTran (product_code_v42, Bit3)]
ON dbo.tblItemTran (product_code_v42, Bit3);

The computed column text definition must match the text in the view definition pretty much exactly for the statistics to be used, so correcting the view to eliminate the implicit conversion should be done at the same time, and care taken to ensure a textual match.

The multi-column statistics ought to result in much better estimates, greatly reducing the chance that the hash match operator will use recursive spilling or the bailout algorithm. Adding the computed column (which is a metadata-only operation, and takes no space in the table since it is not marked PERSISTED) and the multi-column statistics is my best guess at a first solution.

When solving query performance problems, it is important to measure things like elapsed time, CPU usage, logical reads, physical reads, wait types and durations...and so on. It can also be useful to run parts of the query separately to validate suspected causes as shown above.

In some environments, where an up-to-the-second view of the data is not important, it can be useful to run a background process that materializes the whole view into a snapshot table every so often. This table is just a normal base table and can be indexed for read queries without worrying about impacting update performance.

View indexing

Do not be tempted to index the original view directly. Read performance will be amazingly fast (a single seek on a view index) but (in this case) all the performance problems in the existing query plans will be transferred to queries that modify any of the table columns referenced in the view. Queries that change base table rows will be impacted very badly indeed.

Advanced solution with a partial indexed view

There is a partial indexed-view solution for this particular query that corrects cardinality estimates and removes the filter and probe residual, but it is based on some assumptions about the data (mostly my guess at the schema) and requires expert implementation, particularly regarding suitable indexes to support the indexed view maintenance plans. I share the code below for interest, I do not propose you implement it without very careful analysis and testing.

-- Indexed view to optimize the main view
CREATE VIEW dbo.V1
WITH SCHEMABINDING
AS
SELECT
    it.ID,
    it.product_code_v42,
    it.trans_type_v41,
    it.booking_no_v32,
    it.Trans_qty,
    it.QtyReturned,
    it.QtyCheckedOut,
    it.QtyReserved,
    it.bit_field_v41,
    it.prep_on,
    it.From_locn,
    it.Trans_to_locn,
    it.PDate,
    it.FirstDate,
    it.PTimeH,
    it.PTimeM,
    it.RetnDate,
    it.BookDate,
    it.TimeBookedH,
    it.TimeBookedM,
    it.TimeBookedS,
    it.del_time_hour,
    it.del_time_min,
    it.return_to_locn,
    it.return_time_hour,
    it.return_time_min,
    it.AssignTo,
    it.AssignType,
    it.InRack
FROM dbo.tblItemTran AS it
JOIN dbo.tblBookings AS tb ON
    tb.booking_no = it.booking_no_v32
WHERE
    (
        it.trans_type_v41 NOT IN (2, 3, 7, 18, 19, 20, 21, 12, 13, 22)
        AND it.trans_type_v41 NOT IN (6, 7)
        AND it.bit_field_v41 & CONVERT(tinyint, 4) = 0
    )
    OR
    (
        it.trans_type_v41 NOT IN (6, 7)
        AND it.bit_field_v41 & CONVERT(tinyint, 4) = 0
        AND tb.BookingProgressStatus = 1
    )
    OR 
    (
        it.trans_type_v41 IN (6, 7)
        AND it.bit_field_v41 & CONVERT(tinyint, 4) = 0
        AND it.QtyCheckedOut = 0
    )
    OR 
    (
        it.trans_type_v41 IN (6, 7)
        AND it.bit_field_v41 & CONVERT(tinyint, 4) = 0
        AND it.QtyCheckedOut > 0
        AND it.trans_qty - (it.QtyCheckedOut - it.QtyReturned) > 0
    );
GO
CREATE UNIQUE CLUSTERED INDEX cuq ON dbo.V1 (product_code_v42, ID);
GO

The existing view tweaked to use the indexed view above:

CREATE VIEW [dbo].[vwReallySlowView2]
AS
SELECT
    I.booking_no_v32 AS bkno,
    I.trans_type_v41 AS trantype,
    B.Assigned_to_v61 AS Assignbk,
    B.order_date AS dateo,
    B.HourBooked AS HBooked,
    B.MinBooked AS MBooked,
    B.SecBooked AS SBooked,
    I.prep_on AS Pon,
    I.From_locn AS Flocn,
    I.Trans_to_locn AS TTlocn,
    CASE I.prep_on 
        WHEN 'Y' THEN I.PDate
        ELSE I.FirstDate
    END AS PrDate,
    I.PTimeH AS PrTimeH,
    I.PTimeM AS PrTimeM,
    CASE
        WHEN I.RetnDate < I.FirstDate 
        THEN I.FirstDate 
        ELSE I.RetnDate
    END AS RDatev,
    I.bit_field_v41 AS bitField,
    I.FirstDate AS FDatev,
    I.BookDate AS DBooked,
    I.TimeBookedH AS TBookH,
    I.TimeBookedM AS TBookM,
    I.TimeBookedS AS TBookS,
    I.del_time_hour AS dth,
    I.del_time_min AS dtm,
    I.return_to_locn AS rtlocn,
    I.return_time_hour AS rth,
    I.return_time_min AS rtm,
    CASE
        WHEN
            I.Trans_type_v41 IN (6, 7) 
            AND I.Trans_qty < I.QtyCheckedOut
            THEN 0 
        WHEN
            I.Trans_type_v41 IN (6, 7)
            AND I.Trans_qty >= I.QtyCheckedOut
            THEN I.Trans_Qty - I.QtyCheckedOut
        ELSE
            I.trans_qty
    END AS trqty,
    CASE
        WHEN I.Trans_type_v41 IN (6, 7)
        THEN 0
        ELSE I.QtyCheckedOut
    END AS MyQtycheckedout,
    CASE
        WHEN I.Trans_type_v41 IN (6, 7)
        THEN 0
        ELSE I.QtyReturned
    END AS retqty,
    I.ID,
    B.BookingProgressStatus AS bkProg,
    I.product_code_v42,
    I.return_to_locn,
    I.AssignTo,
    I.AssignType,
    I.QtyReserved,
    B.DeprepOn,
    CASE B.DeprepOn
        WHEN 1 THEN B.DeprepDateTime
        ELSE I.RetnDate
    END AS DeprepDateTime,
    I.InRack
FROM dbo.V1 AS I WITH (NOEXPAND)
JOIN dbo.tblbookings AS B ON
    B.booking_no = I.booking_no_v32
JOIN dbo.tblInvmas AS M ON
    I.product_code_v42 = M.product_code;

Example query and execution plan:

SELECT
    vrsv.*
FROM dbo.vwReallySlowView2 AS vrsv
WHERE vrsv.product_code_v42 = 'M10BOLT';

New execution plan

In the new plan, the hash match has no residual predicate, there is no complex filter, no residual predicate on the indexed view seek, and the cardinality estimates are exactly correct.

As an example of how insert/update/delete plans would be affected, this is the plan for an insert to the ItemTrans table:

Insert plan

The highlighted section is new and required for indexed view maintenance. The table spool replays inserted base table rows for indexed view maintenance. Each row is joined to the bookings table using a clustered index seek, then a filter applies the complex WHERE clause predicates to see if the row needs to be added to the view. If so, an insert is performed to the view's clustered index.

The same SELECT * FROM view test performed earlier completed in 150ms with the indexed view in place.

Final thing: I notice your 2008 R2 server is still at RTM. It won't fix your performance problems, but Service Pack 2 for 2008 R2 has been available since July 2012, and there are many good reasons to keep as current as possible with service packs.