Sql-server – Query pauses after returning a fixed number of rows

sql-server-2008

I have a view which runs quickly (a few seconds) for up to 41 records (e.g., TOP 41) but takes several minutes for 44 or more records, with intermediate results if run with TOP 42 or TOP 43. Specifically, it will return the first 39 records in a few seconds, then halt for almost three minutes before returning the remaining records. This pattern is the same when querying TOP 44 or TOP 100.

This view originally derived from a base view, adding to the base just one filter, the last one in the code below. There seems to be no difference if I chain the child view from the base or if I write the child view with the code from the base in-lined. The base view returns 100 records in just a few seconds. I'd like to think that I can get the child view to run about as quickly as the base, not 50 times slower. Has anyone seen this kind of behavior? Any guesses as to cause or resolution?

This behavior has been consistent for the last few hours as I've tested the queries involved, though the number of rows returned before things start to slow down has bumped up and down slightly. This is not new; I'm looking at it now because the total run time had been acceptable (<2 minutes), but I've seen this pause in related log files for months, at least.

Blocking

I've never seen the query blocked, and the problem exists even when there's no other activity on the database (as validated by sp_WhoIsActive). The base view includes NOLOCK throughout, for what that's worth.

Queries

Here's a cut-down version of the child view, with the base view in-lined for simplicity. It still exhibits the jump in run time at about 40 records.

SELECT TOP 100 PERCENT
    Map.SalesforceAccountID AS Id,
    CAST(C.CustomerID AS NVARCHAR(255)) AS Name,
    CASE WHEN C.StreetAddress = 'Unknown' THEN '' ELSE C.StreetAddress                 END AS BillingStreet,
    CASE WHEN C.City          = 'Unknown' THEN '' ELSE SUBSTRING(C.City,        1, 40) END AS BillingCity,
                                                       SUBSTRING(C.Region,      1, 20)     AS BillingState,
    CASE WHEN C.PostalCode    = 'Unknown' THEN '' ELSE SUBSTRING(C.PostalCode,  1, 20) END AS BillingPostalCode,
    CASE WHEN C.Country       = 'Unknown' THEN '' ELSE SUBSTRING(C.Country,     1, 40) END AS BillingCountry,
    CASE WHEN C.PhoneNumber   = 'Unknown' THEN '' ELSE C.PhoneNumber                   END AS Phone,
    CASE WHEN C.FaxNumber     = 'Unknown' THEN '' ELSE C.FaxNumber                     END AS Fax,
    TransC.WebsiteAddress AS Website,
    C.AccessKey AS AccessKey__c,
    CASE WHEN dbo.ValidateEMail(C.EMailAddress) = 1 THEN C.EMailAddress END,  -- Removing this UDF does not speed things
    TransC.EmailSubscriber
    -- A couple dozen additional TransC fields
FROM
    WarehouseCustomers AS C WITH (NOLOCK)
    INNER JOIN TransactionalCustomers AS TransC WITH (NOLOCK) ON C.CustomerID = TransC.CustomerID
    LEFT JOIN  Salesforce.AccountsMap AS Map WITH (NOLOCK) ON C.CustomerID = Map.CustomerID
WHERE
        C.DateMadeObsolete IS NULL
    AND C.EmailAddress NOT LIKE '%@volusion.%'
    AND C.AccessKey IN ('C', 'R')
    AND C.CustomerID NOT IN (243566)  -- Exclude specific test records
    AND EXISTS (SELECT * FROM Orders AS O WHERE C.CustomerID = O.CustomerID AND O.OrderDate >= '2010-06-28')  -- Only count customers who've placed a recent order
    AND Map.SalesforceAccountID IS NULL  -- Only count customers not already uploaded to Salesforce
-- Removing the ORDER BY clause does not speed things up
ORDER BY
    C.CustomerID DESC

That Id IS NULL filter discards most of the records returned by BaseView; without a TOP clause, they return 1,100 records and 267K, respectively.

Statistics

When running TOP 40:

SQL Server parse and compile time:    CPU time = 234 ms, elapsed time = 247 ms.
SQL Server Execution Times:   CPU time = 0 ms,  elapsed time = 0 ms.
SQL Server Execution Times:   CPU time = 0 ms,  elapsed time = 0 ms.

(40 row(s) affected)
Table 'CustomersHistory'. Scan count 2, logical reads 39112, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Orders'. Scan count 1, logical reads 752, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'AccountsMap'. Scan count 1, logical reads 458, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:   CPU time = 2199 ms,  elapsed time = 7644 ms.

When running TOP 45:

(45 row(s) affected)
Table 'CustomersHistory'. Scan count 2, logical reads 98268, physical reads 1, read-ahead reads 3, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Orders'. Scan count 1, logical reads 1788, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'AccountsMap'. Scan count 1, logical reads 2152, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 41980 ms,  elapsed time = 177231 ms.

I'm surprised to see the number of reads jump ~3x for this modest difference in actual output.

Comparing the execution plans, they're the same other than the number of rows returned. As with the stats above, the actual row counts for the early steps are dramatically higher in the TOP 45 query, not just 12.5% higher.

In outline, it's scanning a covering index from Orders, seeking corresponding records from WarehouseCustomers; loop-joining this to TransactionalCustomers (remote query, exact plan unknown); and merging this with a table scan of AccountsMap. The remote query is 94% of the estimated cost.

Miscellaneous Notes

Earlier, when I executed the expanded content of the view as a stand-alone query, it ran pretty fast: 13 seconds for 100 records. I'm now testing a cut-down version of the query, without subqueries, and this much simpler query takes three minutes to asked to return more than 40 rows, even when run as a stand-alone query.

The child view includes a substantial number of reads (~1M per sp_WhoIsActive), but on this machine (eight cores, 32 GB RAM, 95% dedicated SQL box) that's not normally a problem.

I've dropped and re-created both views several times, with no changes.

The data does not include any TEXT or BLOB fields. One field involves a UDF; removing it does not prevent the pausing.

Times are similar whether querying on the server itself, or on my workstation 1,400 miles away, so the delay seems to be inherent in the query itself rather than sending the results to the client.

Notes Re: the Solution

The fix ended up being simple: replacing the LEFT JOIN to Map with a NOT EXISTS clause. This causes only one tiny difference in the query plan, joining to the TransactionCustomers table (a remote query) after joining to the Map table instead of before. This may mean that it's requesting only needed records from the remote server, which would cut down the volume transmitted ~100-fold.

Ordinarily I'm the first to cheer for NOT EXISTS; it's often faster than a LEFT JOIN...WHERE ID IS NULL construct, and slightly more compact. In this case, it's awkward because the problem query is built on an existing view, and while the field needed for the anti-join is exposed by the base view, it's first cast from integer to text. So for decent performance, I have to drop the two-layer pattern and instead have two nearly-identical views, with the second including the NOT EXISTS clause.

Thank you all for your help in troubleshooting this problem! It may be too specific to my circumstances to be of help to anyone else, but hopefully not. If nothing else, it's an example of NOT EXISTS being more than marginally faster than LEFT JOIN...WHERE ID IS NULL. But the real lesson is probably to ensure that remote queries are being joined as efficiently as possible; the query plan claims that it represents 2% of the cost, but it does not always estimate accurately.

Best Answer

Some things to try:

  1. Check your indexes

    • Are all the JOIN key fields indexed? If you use this view a lot, I would go so far as to add a filtered index for the criteria in the view. For instance...

    • CREATE INDEX ix_CustomerId ON WarehouseCustomers(CustomerId, EmailAddress) WHERE DateMadeObsolete IS NULL AND AccessKey IN ('C', 'R') AND CustomerID NOT IN (243566)

  2. Update statistics

    • There could be issues with out-of-date stats. If you can swing it, I would do a FULLSCAN. If there are a large number of rows, it's possible that data has changed significantly without triggering an automatic recalc.
  3. Clean up the query

    • Make the Map JOIN a NOT EXISTS - You don't need any data from that table, since you only want non-matching records

    • Remove the ORDER BY. I know the comments say that it doesn't matter but I find that very hard to believe. It may not have mattered for your smaller result sets since the data pages are already cached.