SELECT COUNT() on a view is orders of magnitude slower than SELECT on the same view

azure-sql-databaseperformancequery-performanceview

The View

CREATE VIEW [dbo].[vProductList]
WITH SCHEMABINDING
AS 

SELECT
     p.[Id]
    ,p.[Name]
    ,price.[Value] as CalculatedPrice
    ,orders.[Value] as OrdersWithThisProduct
FROM 
    products as p 
    INNER JOIN productMetadata as price ON p.Id = price.ProductId AND price.MetaId = 1
    INNER JOIN productMetadata as orders ON p.Id = orders.ProductId AND orders.MetaId = 2

For the sake of simplicity, assume that productMetadata has columns ProductId, MetaId, Value has ~87m rows, and there are about 400k rows in the products table.

General queries against this view work perfectly:

SELECT * FROM vProductList WHERE CalculatedPrice > 500

Query results in 2-4 seconds (over a vpn and remote, so I'm good with that).

Changing the above to a count is equally fast:

SELECT COUNT(*) from vProductList WHERE CalculatedPrice > 500

runs in about the same time as the raw select, which again I'm OK with. There are roughly 10k products meeting this criteria.

I've run into two separate instances where things get really odd and take FOREVER.

First

Doing a query against one of the columns from the base table in the view:

SELECT * FROM vProductList WHERE Name = 'Hammer'

This query takes a little while to run (20-30 seconds) and returns ~30k results; however, a slight change to said query:

SELECT COUNT(*) FROM vProductList WHERE Name = 'Hammer'

takes thirteen MINUTES to return a count stating ~30k .

Second

Doing a WHERE IN subquery

SELECT * FROM vProductList WHERE Id IN (SELECT ProductId FROM TableThatHasFKToProductId and ColumnInTable = 'Yes')

This query returns ~300k rows and takes two minutes to return (much of that time is simply spent downloading the data into SSMS I believe); however, changing that to a SELECT COUNT(*) results in a query that takes twenty minutes.

SELECT COUNT(*) FROM vProductList WHERE Id IN (SELECT ProductId FROM TableThatHasFKToProductId and ColumnInTable = 'Yes')

Why is it that SELECT * is faster than SELECT COUNT?

I am using the Total Execution Time provided by SSMS for all timings listed here.

Execution plans

Plan for SELECT 1 FROM v WHERE IN (...)

Plan for SELECT COUNT(0) FROM v WHERE IN (...)

Note: I tried using PasteThePlan but it kept telling my the plan was invalid xml.

Best Answer

From the execution plans provided, for the COUNT case the optimizer chooses a local/global aggregation strategy around the final join:

Unfortunately, the optimizer overestimates the effectiveness of the local aggregation. It estimates 136 rows driving the Nested Loops join, whereas 366,115 are encountered at runtime.

The 366,115 index lookups might not be too much of an issue for a local SQL Server instance, but the wait stats included in the plan show the I/O (and possibly memory) limitations of your current Azure SQL Database configuration:

The plan for SELECT 1 shows an exclusively hash and merge join strategy, which produces better results in this case with the very limited memory and/or I/O capabilities.

You might well see better performance for the first query with an OPTION (HASH JOIN, MERGE JOIN) hint, but the fundamental issue is the poor cardinality/data distribution estimate driven by the large number of joins.

Don't be misled by the cost percentages shown against each plan operator - these numbers are currently derived from the optimizer's estimate of cost (using an abstract model). The numbers do not reflect runtime conditions or costs.

Large deviations between estimated and actual row counts can often lead to problems. This is especially true for an underestimate that causes the optimizer to choose a strategy that does not scale up well on a particular hardware setup.

Unfortunately

None of the above is functional with an IF EXISTS conditional statement. It only applies to regular DML. It will work with the alternate SELECT TOP (1) formulation you tried. That may well be better than using COUNT(*), which has to count all qualified rows, as previously mentioned.

That said, there are any number of ways to express this requirement that will allow you to avoid or control the row goal, while terminating the search early. One last example:

DECLARE @Exists bit;

SELECT @Exists =
    CASE
        WHEN EXISTS
        (
            SELECT [dlc].[ID]
            FROM TableDLC [dlc]
            JOIN TableD [d]
            ON [d].[ID] = [dlc].[ID]
            JOIN TableC [c]
            ON [c].[ID] = [d].[ID2]
            WHERE [c].[Name] <> [dlc].[Name]
        )
        THEN CONVERT(bit, 1)
        ELSE CONVERT(bit, 0)
    END
OPTION (QUERYTRACEON 4138);

IF @Exists = 1
BEGIN
    ...
END;

First

Second

Execution plans

Best Answer

Related Solutions

Mysql – Fourth inner join to the same table increases query duration significantly

Sql-server – IF EXISTS taking longer than embedded select statement

Unfortunately

Related Question