Sql-server – Simple view query takes a very long time

performancequery-performancesql serverview

Windows Server 2012, Microsoft SQL Server.

I have a stored procedure (see below) that creates a view that I need to query.
The stored procedure part works great and takes like 5 seconds to finish, and the view is created.

The view has about 30-35k rows.

My problem is that running a simple query against the created view takes about 20 minutes! A simple query such as:

SELECT COUNT(*) FROM MY_VIEW

The above query takes about 20 minutes to finish until it returns me the number of rows. Running the same query against the actual table (that the view contains) returns the results instantly!

I am not sure if the stored procedure is even related since the views are created instantly and querying them is what I am having issue with, but I am posting it just in case.

I'd like to mention that other views created by the same stored procedures, that contains small amount of rows (several hundreds) are responding to queries fairly fast… so the amount of rows is surely a factor here.

What I don't get is why querying a 30k rows table returns results in 2 seconds, and the same query when executed against a 30k rows view, takes 20 minutes.

Stored procedure

USE [QUARTERLY_SEC_REPORT]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

ALTER PROCEDURE [dbo].[DynamicView_QR_VisitsDistSummary] 
AS
BEGIN
DECLARE @CurrentView nvarchar(MAX) = null
DECLARE @SchemaName nvarchar(400)
DECLARE @TableName nvarchar(400)
DECLARE @DynSQL nvarchar(MAX)
DECLARE @DateModifier nvarchar(400)
DECLARE @DynDROP nvarchar(MAX) = 'DROP VIEW Unified_QR_VisitsDistSummary'
DECLARE @InclusionTable nvarchar(MAX) = '[dbo].[QUARTERLY_VIEW]'

Set @DynSQL = 'CREATE VIEW Unified_QR_VisitsDistSummary AS '

set @CurrentView = (select VIEW_DEFINITION from INFORMATION_SCHEMA.VIEWS WHERE TABLE_SCHEMA='dbo' and TABLE_NAME='Unified_QR_VisitsDistSummary')

DECLARE cursor1 CURSOR FOR 
    select TABLE_SCHEMA,TABLE_NAME
    from INFORMATION_SCHEMA.TABLES
    where 
    TABLE_SCHEMA='dbo' AND
    TABLE_NAME like 'visits_dist_summary_ACC_%'

OPEN cursor1

FETCH NEXT FROM cursor1 INTO @SchemaName, @TableName

WHILE @@FETCH_STATUS = 0
BEGIN
    -- Add the select code.
    Set @DateModifier = '( SELECT MAX([retrieved_at]) FROM '+ @SchemaName +'.' + @TableName + ')'
    Set @DynSQL = @DynSQL + 'Select * from ' + @SchemaName +'.' + @TableName +' INNER JOIN '+ @InclusionTable+ ' ON '+ @InclusionTable +'.AccountID  = ' + @SchemaName +'.' + @TableName+ '.Account_ID WHERE ' + @InclusionTable +'.Appear_In_View =''True'' AND (retrieved_at =' + @DateModifier +' OR retrieved_at = DATEADD (MINUTE, -1, '+@DateModifier+ ')'+' OR retrieved_at = DATEADD (MINUTE, -2, '+@DateModifier+ ')' +' OR retrieved_at = DATEADD (MINUTE, -3, '+@DateModifier+ '))'
    FETCH NEXT FROM cursor1
    INTO @SchemaName, @TableName

    -- If the loop continues, add the UNION ALL statement.
    If @@FETCH_STATUS = 0
    BEGIN
        Set @DynSQL = @DynSQL + ' UNION ALL '
    END

END
IF @CurrentView = @DynSQL 
    PRINT 'VIEW IS THE SAME, NEW VIEW WASN''T CREATED'
ELSE
    BEGIN
        if @CurrentView is not null
        BEGIN
        print @DynDROP
            exec sp_executesql @DynDROP
        END
        PRINT @DynSQL
        exec sp_executesql @DynSQL
    END
END

View definition

SELECT        *
FROM            dbo.visits_dist_summary_ACC_12345 INNER JOIN
                         [dbo].[QUARTERLY_VIEW] ON [dbo].[QUARTERLY_VIEW].AccountID = dbo.visits_dist_summary_ACC_12345.Account_ID
WHERE        [dbo].[QUARTERLY_VIEW].Appear_In_View = 'True' AND (retrieved_at =
                             (SELECT        MAX([retrieved_at])
                               FROM            dbo.visits_dist_summary_ACC_12345) OR
                         retrieved_at = DATEADD(MINUTE, - 1,
                             (SELECT        MAX([retrieved_at])
                               FROM            dbo.visits_dist_summary_ACC_12345)) OR
                         retrieved_at = DATEADD(MINUTE, - 2,
                             (SELECT        MAX([retrieved_at])
                               FROM            dbo.visits_dist_summary_ACC_12345)) OR
                         retrieved_at = DATEADD(MINUTE, - 3,
                             (SELECT        MAX([retrieved_at])
                               FROM            dbo.visits_dist_summary_ACC_12345)))
UNION ALL
SELECT        *
FROM            dbo.visits_dist_summary_ACC_22222 INNER JOIN
                         [dbo].[QUARTERLY_VIEW] ON [dbo].[QUARTERLY_VIEW].AccountID = dbo.visits_dist_summary_ACC_22222.Account_ID
WHERE        [dbo].[QUARTERLY_VIEW].Appear_In_View = 'True' AND (retrieved_at =
                             (SELECT        MAX([retrieved_at])
                               FROM            dbo.visits_dist_summary_ACC_22222) OR
                         retrieved_at = DATEADD(MINUTE, - 1,
                             (SELECT        MAX([retrieved_at])
                               FROM            dbo.visits_dist_summary_ACC_22222)) OR
                         retrieved_at = DATEADD(MINUTE, - 2,
                             (SELECT        MAX([retrieved_at])
                               FROM            dbo.visits_dist_summary_ACC_22222)) OR
                         retrieved_at = DATEADD(MINUTE, - 3,
                             (SELECT        MAX([retrieved_at])
                               FROM            dbo.visits_dist_summary_ACC_22222)))
UNION ALL
SELECT        *
FROM            dbo.visits_dist_summary_ACC_77777 INNER JOIN
                         [dbo].[QUARTERLY_VIEW] ON [dbo].[QUARTERLY_VIEW].AccountID = dbo.visits_dist_summary_ACC_77777.Account_ID
WHERE        [dbo].[QUARTERLY_VIEW].Appear_In_View = 'True' AND (retrieved_at =
                             (SELECT        MAX([retrieved_at])
                               FROM            dbo.visits_dist_summary_ACC_77777) OR
                         retrieved_at = DATEADD(MINUTE, - 1,
                             (SELECT        MAX([retrieved_at])
                               FROM            dbo.visits_dist_summary_ACC_77777)) OR
                         retrieved_at = DATEADD(MINUTE, - 2,
                             (SELECT        MAX([retrieved_at])
                               FROM            dbo.visits_dist_summary_ACC_77777)) OR
                         retrieved_at = DATEADD(MINUTE, - 3,
                             (SELECT        MAX([retrieved_at])
                               FROM            dbo.visits_dist_summary_ACC_77777)))

Table Structure (sp_help output)

(The view "aggregates" the same table , for multiple accounts)

Name
visits_dist_summary_ACC_12345

account_id,varchar,no,12,     ,     ,yes,no,yes,SQL_Latin1_General_CP1_CI_AS
siteid,varchar,no,12,     ,     ,yes,no,yes,SQL_Latin1_General_CP1_CI_AS
countryCode,varchar,no,50,     ,     ,yes,no,yes,SQL_Latin1_General_CP1_CI_AS
countryCount,float,no,8,53   ,NULL,yes,(n/a),(n/a),NULL
agentCode,varchar,no,100,     ,     ,yes,no,yes,SQL_Latin1_General_CP1_CI_AS
agentCount,float,no,8,53   ,NULL,yes,(n/a),(n/a),NULL
retrieved_at,varchar,no,100,     ,     ,yes,no,yes,SQL_Latin1_General_CP1_CI_AS
relevant_month,varchar,no,100,     ,     ,yes,no,yes,SQL_Latin1_General_CP1_CI_AS
domain_name,varchar,no,100,     ,     ,yes,no,yes,SQL_Latin1_General_CP1_CI_AS

Identity    Seed    Increment   Not For Replication
No identity column defined. NULL    NULL    NULL

RowGuidCol
No rowguidcol column defined.

Data_located_on_filegroup
PRIMARY

Here is the execution plan.

Best Answer

The view has about 30-35k rows.

A view (without a clustered index) is simply a stored query definition. It does not contain any rows directly.

My problem is that running a simple query against the created view takes about 20 minutes

This requires executing the stored query definition. The base tables (and view query) suffer from some data type issues and a lack of useful indexing, which result in an extraordinary amount of work being performed each time the view is accessed (explained below).

Data type and correctness issues

The column retrieved_at is currently typed as varchar(100). You should use a proper date/time type instead. Aside from the performance considerations, you are almost certainly getting incorrect results right now:

MAX(retrieved_at) will find the string that sorts highest, not the most recent value in datetime terms. The comparisons that involve a DATEADD end up converting to a datetime, but only after the MAX has been found (as a string).

Ideally, you would convert the base tables so that correct data types are used, for example using:

ALTER TABLE dbo.visits_dist_summary_ACC_12345
ALTER COLUMN retrieved_at datetime NOT NULL;

Overall strategy

It's not clear from the question, but it is possible that you are intending to store a snapshot of the data from time to time. If that is the case, it would be much more efficient to write the result of your dynamic query to a permanent table than a view.

Current execution plan

The plan you provided highlights several issues. Some of this may be irrelevant given the points already mentioned, so this is provided for interest.

Plan

The hash join on account ID produces 36,222 rows, where only 1 was expected. This indicates that statistics are out of date on one or both of the tables involved in that join.

Updating statistics may improve that estimate, though you might need to go further, e.g. creating a filtered index (or statistics) on the [QUARTERLY_VIEW] table with an Appear_In_View = 'true' predicate. As a side note, if that column is true/false, a better data type choice than varchar would be bit.

The rest of the plan is driven by a nested loops left semi join. For each of the 36,222 rows coming from the hash join, SQL Server:

  • Reads every row from the base table (Table Scan)
  • Finds the MAX()retrieved_at` (Stream Aggregate)
  • Tests to see if the result matches the outer retrieved_at value.

Note that this process (full scan, aggregate, filter) occurs for every single one of the 36,222 rows produced by the initial hash join.

Worse, if the first scan-aggregate-filter branch does not find a match (satisfying the semi join), SQL Server goes on to run the same process again in full, for the -1, -2, and -3 minute cases.

The numbers shown in the execution plan above (using SQL Sentry Plan Explorer) indicate the total number of rows produced by each operator over all iterations of the nested loops join. In SSMS, you would need to look at the Actual Number of Rows property for each operator.

For the uppermost scan-aggregate-filter branch, this total is 1,312,033,284 rows. The second branch contributes an additional 436,185,324 rows. It would be worse if the -2 and -3 minute cases were ever needed to find a matching row. Hopefully, you can see why the 'simple query' runs for 20 minutes.

Actions

  • Correct the data types
  • Update statistics
  • Use a table to store a static snapshot, if that meets your needs
  • Create an index on retrieved_at e.g.

    CREATE NONCLUSTERED INDEX i2 
    ON dbo.visits_dist_summary_ACC_12345 (retrieved_at);
    
  • Evaluate a clustered index on account_id e.g.

    CREATE CLUSTERED INDEX i1 
    ON dbo.visits_dist_summary_ACC_12345 (account_id);
    

The above steps should improve performance very significantly, especially the index on retrieved_at (typed as datetime).

A query rewrite may be needed to avoid computing the MAX four times, due to optimizer limitations/priorities, but the index should make that operation trivial (reading one row from the end of the index), so ought not to be necessary in practical terms.


In case it is useful, one query rewrite approach is:

SELECT
    QV.AccountID,
    QV.Appear_In_View,
    QV.Report_Name,
    VDSA.account_id,
    VDSA.siteid,
    VDSA.countryCode,
    VDSA.countryCount,
    VDSA.agentCode,
    VDSA.agentCount,
    VDSA.retrieved_at,
    VDSA.relevant_month,
    VDSA.domain_name
FROM dbo.QUARTERLY_VIEW AS QV
JOIN  dbo.visits_dist_summary_ACC_12345 AS VDSA
    ON VDSA.account_id = QV.AccountID
WHERE
    QV.Appear_In_View = 'True' 
    AND VDSA.retrieved_at IN
    (
        SELECT 
            V.max_date_candidates
        FROM 
        (
            -- Compute maximum date once
            SELECT TOP (1)
                VDSA2.retrieved_at
            FROM dbo.visits_dist_summary_ACC_12345 AS VDSA2
            ORDER BY
                VDSA2.retrieved_at DESC
        ) AS Q (max_retrieved_at)
        CROSS APPLY
        (
            -- Generate four rows based on the maximum date
            VALUES 
                (DATEADD(MINUTE, -0, Q.max_retrieved_at)),
                (DATEADD(MINUTE, -1, Q.max_retrieved_at)),
                (DATEADD(MINUTE, -2, Q.max_retrieved_at)),
                (DATEADD(MINUTE, -3, Q.max_retrieved_at))
        ) AS V (max_date_candidates)
    );

The estimated plan for this query with the data type and index changes above:

Improved plan