Sql-server – Top clause slows down query

optimizationperformancesql servert-sql

I already received an answer on similar question, but it's not applicable in the case below.

In last time, I just rewrited my query on two distinct, first was fetching first N rows from one table and then it was joined (if and only if it contains sufficient amount of data). But here there is enough of data, but query still be very slow. Here is simplified example of slow query:

DECLARE @Top int = 1000

SELECT TOP (@Top) *
FROM (
    SELECT [t0].[DateTimeUtc] AS [value], [t2].[SystemName], [t2].[Name], [t0].[Id], [t1].[Discriminator], [t1].[ParentActionTemplateId], [t0].[DateTimeUtc], [t1].[Id] AS [Id2]
    FROM [directcrm].[CustomerActions] AS [t0]
    INNER JOIN [directcrm].[ActionTemplates] AS [t1] ON [t1].[Id] = [t0].[ActionTemplateId]
    LEFT OUTER JOIN [directcrm].[ActionTemplates] AS [t2] ON [t2].[Id] = [t1].[ParentActionTemplateId]
    ) AS [t7]
WHERE (([t7].[Discriminator] = @p9) OR ([t7].[Discriminator] = @p10) OR ([t7].[Discriminator] = @p11) OR ([t7].[Discriminator] = @p12)) AND ([t7].[ParentActionTemplateId] IS NOT NULL) AND (([t7].[DateTimeUtc]) <= @p13) AND (([t7].[DateTimeUtc]) > @p14) AND ([t7].[Id] > @p15)
ORDER BY [t7].[Id]

enter image description here

But if we hack optimizer:

DECLARE @Top int = 1000

SELECT TOP (@Top) *
FROM (
    SELECT [t0].[DateTimeUtc] AS [value], [t2].[SystemName], [t2].[Name], [t0].[Id], [t1].[Discriminator], [t1].[ParentActionTemplateId], [t0].[DateTimeUtc], [t1].[Id] AS [Id2]
    FROM [directcrm].[CustomerActions] AS [t0]
    INNER JOIN [directcrm].[ActionTemplates] AS [t1] ON [t1].[Id] = [t0].[ActionTemplateId]
    LEFT OUTER JOIN [directcrm].[ActionTemplates] AS [t2] ON [t2].[Id] = [t1].[ParentActionTemplateId]
    ) AS [t7]
WHERE (([t7].[Discriminator] = @p9) OR ([t7].[Discriminator] = @p10) OR ([t7].[Discriminator] = @p11) OR ([t7].[Discriminator] = @p12)) AND ([t7].[ParentActionTemplateId] IS NOT NULL) AND (([t7].[DateTimeUtc]) <= @p13) AND (([t7].[DateTimeUtc]) > @p14) AND ([t7].[Id] > @p15)
ORDER BY [t7].[Id]

OPTION (OPTIMIZE FOR (@TOP = 100000000)) -- < here

enter image description here
We get an instant answer, because in first case it uses loop join which is slow as hell, in second – hash join.

Of course I rebuilded all statistics and all indices for all tables in database.

I don't know what can I do with this query, because it's just a simple join without any magic. I tried build some indices, rewrite query and so on, but optimizer always chooses the same plan and runs very slow. I'm unable to use hints because of ORM and because they seems to be really hacky. I use them to determine the bottleneck and then fix it by using DDL modifications. It usually works, but today it failed.

Best Answer

When possible you should update actual execution plans to Paste The Plan. Without the XML we have to make guesses about the operators shown in the images. For the rest of this answer I'm going to assume that the clustered key of all of the tables is id.

As far as I can tell the problem isn't with the nested loop join but with the table access method on the [directcrm].[CustomerActions] table. It looks like SQL Server does a clustered index seek in order starting with [t7].[Id] > @p15. However, it's possible that the storage engine will need to seek many rows until it finds enough that match the filter requirements against the [DateTimeUtc] column. If you're on SQL Server 2016 you can check this by looking at "Number of Rows Read" in the actual query plan for the clustered index seek. It's theoretically possible for the clustered index seek on [directcrm].[ActionTemplates] AS [t1] to also be a problem. However, it looks like the filtering against that table only removes 30% of the rows so I doubt that's an issue.

Based on just how the query is written the biggest problem is that SQL Server is not aware of the values of the local variables. As a result the query optimizer makes cardinality estimates based on hardcoded results. For example, whenever you have an unknown expression in TOP the query optimizer uses a hardcoded guess of 100 for that value. I understand that you are constrained by your ORM but you're essentially fighting with the query optimizer.

I have a few ideas on how to improve performance listed below in rough order of preference. It's likely that you won't be able to implement some of them due to ORM restrictions but I thought I should include them all for completeness.

A good test is try is the RECOMPILE hint. That may be enough to get SQL Server to pick a better plan. I understand that you can't use that one but perhaps you could replace some of the variables with hardcoded values? A TOP statement with a variable is going to be really bad because the query plan will never change as you change the value of the variable. The optimizer will always set a row goal of 100 when creating the plan (based on testing I did tonight).

Since you got better performance when not using the clustered index on [directcrm].[CustomerActions] as the driving index you could try to make the clustered seek more expensive or unavailable. One way to make it more expensive is to introduce some doubt about the order of the rows. Right now you order by t0.id which means that plans that use that clustered index can avoid a sort. If you did something like the following that might be enough:

ORDER BY CASE WHEN @sort = 1 THEN [Discriminator] ELSE [Id] END

Just set @sort always equal to 0 so you always get the sort that you want when the query is executed.

Another trick is to add superfluous operators to parts of the WHERE clause to prevent index use. In some cases just adding 0 won't be optimized away and an index seek will be unavailable. You can strategically add things like that to the WHERE clause to encourage the index usage that you want. For example, if you don't want a clustered index seek on [directcrm].[CustomerActions] you can start with this and make it more complicated if that isn't sufficient:

[t7].[Id] + 0 > @p15

You may be able to create a covering index against [directcrm].[ActionTemplates] to encourage that table to be used as the driving table. Of course you could always drop the clustered index on [directcrm].[CustomerActions] as well but I imagine that will have side effects that you can't ignore.

The nuclear option is to use a plan guide. There are many ways to create plan guides. It's possible to force a query to use a certain plan without changing the query's text. For example, if you can get a good query plan in the cache (such as with your row goal trick) then you can use sp_create_plan_guide_from_handle to force a plan guide to the query. Keep in mind that your query text needs to exactly match what is stored in the plan guide. If you're on SQL Server 2016 you could also try freezing a query plan using the Query Store.