SQL Server – How to Filter Query on Left Joined Table Without Exact Value

sql server

I had a hard time coming up with an appropriate title and it might be hard for me to explain the issue, but I'll try to be as clear as possible.

I have a dynamic query that is pulling assets based on their asset title or one of many "identifiers" that are attached to that asset. Every asset as a "snapshot" ID that is tied to a asset identifier snapshot table, that is then tied to a version table that holds the actual value. That relationship looks like this:

SELECT
    *
FROM
    vwAsset AS Asset
        INNER JOIN dbo.tblAssetIdentifierSnapshot ON tblAssetIdentifierSnapshot.AssetSnapshotId = Asset.AssetSnapshotId
            INNER JOIN dbo.tblAssetIdentifierVersion AS AssetIdentifierVersion ON AssetIdentifierVersion.AssetIdentifierVersionId = tblAssetIdentifierSnapshot.AssetIdentifierVersionId

The value in the tblAssetIdentifierVersion table is what I'm trying to get at. Here is what the full query looks like:

And here's a pastebin.

If I run this query without the two highlighted joins and remove the OR, it runs instantly. But adding these joins makes the query run for 25+ seconds.

When I run this through Sentry Plan Explorer, it shows that some 274K+ rows are being searched for each join, before it finally filters down to the 137 it returns.

This is the query columns that sentry plan explorer shows:

I'm not an expert, so take this with a grain of salt, but every looks mostly okay. I don't think anything can be done about index scans, right?

Is there any way to filter down the left joins more without knowing the snapshot ID or the versions that are tied to the asset?


Here's the execution plan: https://www.brentozar.com/pastetheplan/?id=HJtsYtGLe

Best Answer

It looks like this query is an example of a problematic use of OR according to SQLCAT:

However, when the conditions on each side of the OR operator are evaluating different tables, problems can arise:

WHERE a.col1 = @val1 OR b.col2 = @val2

The pattern represented by this query predicate is problematic. Breaking down the query into two steps produces a significantly “cheaper” plan. Note that in this pattern, there are two tables involved; on each side of the OR operator is a condition that applies to different tables.

In the query plan with the additional tables filtered out, SQL Server is able to filter out irrelevant rows quickly. Also the filter condition can be applied directly to the table.

In the query plan with the OR clause, SQL Server filters out unnecessary rows only at the very end of the plan, after doing millions of unnecessary index seeks. This is the query that you reported as being slow in your question.

In the query plan with the query that I suggested (UNION instead of OR), SQL Server is more able to directly apply the filters and less overall work is done. There is not a filtering step at the end that eliminates hundreds of thousands of rows.

Note that this approach may not work as well as it usually does because you're querying a view and the logic behind that view may be expensive. It is also true that rewriting the OR as a UNION will result in duplicated code. The best way around this is to use a derived table. Unfortunately, this may not be a good option for you depending on the possible filter conditions that end users can add. If you can use a derived table, one rewrite is as follows:

SELECT DISTINCT TOP 250
     Asset.AssetVersionId
    ,Asset.AssetSnapshotId
    ,Asset.AssetId
    ,Asset.Title
    ,Product.Title AS 'Model'
    ,Brand.Title AS 'Brand'
    ,Customer.Title AS 'Customer'
    ,Customer.CustomerId
    ,CustomerLocation.ServiceLocationId
    ,Asset.AssetSnapshotId
    ,ISNULL(AssetIdentifier_TagNumber.Value, '') AS 'TagNumber'
    ,CustomerLocation.Title AS 'CustomerLocation'
    ,Asset.TagLocationId
    ,ISNULL(TagLocation.TagLocationNumber, '') AS 'TagLocationTitle'
    ,TagLocation.RBIRequired
    ,Asset.AssetApprovalStatusId
    ,AssetManufacturerTag.SerialNumber
    ,Asset.FullModelNumber
    ,Customer.LocalTitle
FROM
    (
    SELECT 
        Asset.AssetVersionId
        ,Asset.AssetSnapshotId
        ,Asset.AssetId
        ,Asset.Title    
        ,Asset.AssetSnapshotId
        ,Asset.TagLocationId
        ,Asset.AssetApprovalStatusId
        ,Asset.FullModelNumber
        ,Asset.ProductId
        ,Asset.CustomerId
        ,Asset.CustomerLocationId
    FROM dbo.vwAsset AS Asset
    WHERE Asset.Title LIKE '%E29%'

    UNION ALL

    SELECT 
        Asset.AssetVersionId
        ,Asset.AssetSnapshotId
        ,Asset.AssetId
        ,Asset.Title    
        ,Asset.AssetSnapshotId
        ,Asset.TagLocationId
        ,Asset.AssetApprovalStatusId
        ,Asset.FullModelNumber
        ,Asset.ProductId
        ,Asset.CustomerId
        ,Asset.CustomerLocationId
        FROM dbo.vwAsset AS Asset
        INNER JOIN dbo.tblAssetIdentifierSnapshot AS AssetIdentifierSnapshot ON AssetIdentifierSnapshot.AssetSnapshotId = Asset.AssetSnapshotId
        INNER JOIN dbo.tblAssetIdentifierVersion AS AssetIdentifierVersion ON AssetIdentifierVersion.AssetIdentifierVersionId = AssetIdentifierSnapshot.AssetIdentifierVersionId
        WHERE Asset.Title NOT LIKE '%E29%' AND AssetIdentifierVersion.Value LIKE '%E29%'
    ) AS Asset    
    INNER JOIN dbo.tblProduct AS Product ON Product.ProductId = Asset.ProductId
    INNER JOIN dbo.tblBrand AS Brand ON Brand.BrandId = Product.BrandId
    INNER JOIN dbo.tblCustomer AS Customer ON Customer.CustomerId = Asset.CustomerId
    INNER JOIN dbo.tblCustomerLocation AS CustomerLocation ON CustomerLocation.CustomerLocationId = Asset.CustomerLocationId
    INNER JOIN dbo.vwUserServiceLocation AS UserServiceLocation ON UserServiceLocation.ServiceLocationId = CustomerLocation.ServiceLocationId
    LEFT JOIN dbo.vwAssetIdentifier AS AssetIdentifier_TagNumber ON AssetIdentifier_TagNumber.AssetSnapshotId = Asset.AssetSnapshotId AND AssetIdentifier_TagNumber.Title = 'Tag Number'
    LEFT JOIN dbo.tblTagLocation AS TagLocation ON TagLocation.TagLocationId = Asset.TagLocationId
    INNER JOIN dbo.tblAssetManufacturerTag AS AssetManufacturerTag ON AssetManufacturerTag.AssetId = Asset.AssetId
    LEFT JOIN dbo.tblStatementOfWork AS SOW ON SOW.AssetId = Asset.AssetId
    LEFT JOIN dbo.tblWorkOrder AS WorkOrder ON WorkOrder.WorkOrderId = SOW.WorkOrderId
    LEFT JOIN dbo.tblProject AS Project ON Project.ProjectId = WorkOrder.ProjectId      
WHERE
    1 = 1
    AND UserServiceLocation.OwnerId = '1A07E201-D832-465D-A1B1-10500A20E16A';

If you can't use a derived table and need to duplicate the full query logic that's okay. To get the TOP behavior that you're looking for you mean need to define the whole query as a CTE and do the TOP filtering as a final step.

As a final note, carefully consider if DISTINCT TOP 250 without an ORDER BY clause is what you want. With that query you will just get an arbitrary 250 rows. If you are returning data to an end user there may be a more meaningful way to order or limit the results.