Windows Server 2012, Microsoft SQL Server.
I have a stored procedure (see below) that creates a view that I need to query.
The stored procedure part works great and takes like 5 seconds to finish, and the view is created.
The view has about 30-35k rows.
My problem is that running a simple query against the created view takes about 20 minutes! A simple query such as:
SELECT COUNT(*) FROM MY_VIEW
The above query takes about 20 minutes to finish until it returns me the number of rows. Running the same query against the actual table (that the view contains) returns the results instantly!
I am not sure if the stored procedure is even related since the views are created instantly and querying them is what I am having issue with, but I am posting it just in case.
I'd like to mention that other views created by the same stored procedures, that contains small amount of rows (several hundreds) are responding to queries fairly fast… so the amount of rows is surely a factor here.
What I don't get is why querying a 30k rows table returns results in 2 seconds, and the same query when executed against a 30k rows view, takes 20 minutes.
Stored procedure
USE [QUARTERLY_SEC_REPORT]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[DynamicView_QR_VisitsDistSummary]
AS
BEGIN
DECLARE @CurrentView nvarchar(MAX) = null
DECLARE @SchemaName nvarchar(400)
DECLARE @TableName nvarchar(400)
DECLARE @DynSQL nvarchar(MAX)
DECLARE @DateModifier nvarchar(400)
DECLARE @DynDROP nvarchar(MAX) = 'DROP VIEW Unified_QR_VisitsDistSummary'
DECLARE @InclusionTable nvarchar(MAX) = '[dbo].[QUARTERLY_VIEW]'
Set @DynSQL = 'CREATE VIEW Unified_QR_VisitsDistSummary AS '
set @CurrentView = (select VIEW_DEFINITION from INFORMATION_SCHEMA.VIEWS WHERE TABLE_SCHEMA='dbo' and TABLE_NAME='Unified_QR_VisitsDistSummary')
DECLARE cursor1 CURSOR FOR
select TABLE_SCHEMA,TABLE_NAME
from INFORMATION_SCHEMA.TABLES
where
TABLE_SCHEMA='dbo' AND
TABLE_NAME like 'visits_dist_summary_ACC_%'
OPEN cursor1
FETCH NEXT FROM cursor1 INTO @SchemaName, @TableName
WHILE @@FETCH_STATUS = 0
BEGIN
-- Add the select code.
Set @DateModifier = '( SELECT MAX([retrieved_at]) FROM '+ @SchemaName +'.' + @TableName + ')'
Set @DynSQL = @DynSQL + 'Select * from ' + @SchemaName +'.' + @TableName +' INNER JOIN '+ @InclusionTable+ ' ON '+ @InclusionTable +'.AccountID = ' + @SchemaName +'.' + @TableName+ '.Account_ID WHERE ' + @InclusionTable +'.Appear_In_View =''True'' AND (retrieved_at =' + @DateModifier +' OR retrieved_at = DATEADD (MINUTE, -1, '+@DateModifier+ ')'+' OR retrieved_at = DATEADD (MINUTE, -2, '+@DateModifier+ ')' +' OR retrieved_at = DATEADD (MINUTE, -3, '+@DateModifier+ '))'
FETCH NEXT FROM cursor1
INTO @SchemaName, @TableName
-- If the loop continues, add the UNION ALL statement.
If @@FETCH_STATUS = 0
BEGIN
Set @DynSQL = @DynSQL + ' UNION ALL '
END
END
IF @CurrentView = @DynSQL
PRINT 'VIEW IS THE SAME, NEW VIEW WASN''T CREATED'
ELSE
BEGIN
if @CurrentView is not null
BEGIN
print @DynDROP
exec sp_executesql @DynDROP
END
PRINT @DynSQL
exec sp_executesql @DynSQL
END
END
View definition
SELECT *
FROM dbo.visits_dist_summary_ACC_12345 INNER JOIN
[dbo].[QUARTERLY_VIEW] ON [dbo].[QUARTERLY_VIEW].AccountID = dbo.visits_dist_summary_ACC_12345.Account_ID
WHERE [dbo].[QUARTERLY_VIEW].Appear_In_View = 'True' AND (retrieved_at =
(SELECT MAX([retrieved_at])
FROM dbo.visits_dist_summary_ACC_12345) OR
retrieved_at = DATEADD(MINUTE, - 1,
(SELECT MAX([retrieved_at])
FROM dbo.visits_dist_summary_ACC_12345)) OR
retrieved_at = DATEADD(MINUTE, - 2,
(SELECT MAX([retrieved_at])
FROM dbo.visits_dist_summary_ACC_12345)) OR
retrieved_at = DATEADD(MINUTE, - 3,
(SELECT MAX([retrieved_at])
FROM dbo.visits_dist_summary_ACC_12345)))
UNION ALL
SELECT *
FROM dbo.visits_dist_summary_ACC_22222 INNER JOIN
[dbo].[QUARTERLY_VIEW] ON [dbo].[QUARTERLY_VIEW].AccountID = dbo.visits_dist_summary_ACC_22222.Account_ID
WHERE [dbo].[QUARTERLY_VIEW].Appear_In_View = 'True' AND (retrieved_at =
(SELECT MAX([retrieved_at])
FROM dbo.visits_dist_summary_ACC_22222) OR
retrieved_at = DATEADD(MINUTE, - 1,
(SELECT MAX([retrieved_at])
FROM dbo.visits_dist_summary_ACC_22222)) OR
retrieved_at = DATEADD(MINUTE, - 2,
(SELECT MAX([retrieved_at])
FROM dbo.visits_dist_summary_ACC_22222)) OR
retrieved_at = DATEADD(MINUTE, - 3,
(SELECT MAX([retrieved_at])
FROM dbo.visits_dist_summary_ACC_22222)))
UNION ALL
SELECT *
FROM dbo.visits_dist_summary_ACC_77777 INNER JOIN
[dbo].[QUARTERLY_VIEW] ON [dbo].[QUARTERLY_VIEW].AccountID = dbo.visits_dist_summary_ACC_77777.Account_ID
WHERE [dbo].[QUARTERLY_VIEW].Appear_In_View = 'True' AND (retrieved_at =
(SELECT MAX([retrieved_at])
FROM dbo.visits_dist_summary_ACC_77777) OR
retrieved_at = DATEADD(MINUTE, - 1,
(SELECT MAX([retrieved_at])
FROM dbo.visits_dist_summary_ACC_77777)) OR
retrieved_at = DATEADD(MINUTE, - 2,
(SELECT MAX([retrieved_at])
FROM dbo.visits_dist_summary_ACC_77777)) OR
retrieved_at = DATEADD(MINUTE, - 3,
(SELECT MAX([retrieved_at])
FROM dbo.visits_dist_summary_ACC_77777)))
Table Structure (sp_help output)
(The view "aggregates" the same table , for multiple accounts)
Name
visits_dist_summary_ACC_12345
account_id,varchar,no,12, , ,yes,no,yes,SQL_Latin1_General_CP1_CI_AS
siteid,varchar,no,12, , ,yes,no,yes,SQL_Latin1_General_CP1_CI_AS
countryCode,varchar,no,50, , ,yes,no,yes,SQL_Latin1_General_CP1_CI_AS
countryCount,float,no,8,53 ,NULL,yes,(n/a),(n/a),NULL
agentCode,varchar,no,100, , ,yes,no,yes,SQL_Latin1_General_CP1_CI_AS
agentCount,float,no,8,53 ,NULL,yes,(n/a),(n/a),NULL
retrieved_at,varchar,no,100, , ,yes,no,yes,SQL_Latin1_General_CP1_CI_AS
relevant_month,varchar,no,100, , ,yes,no,yes,SQL_Latin1_General_CP1_CI_AS
domain_name,varchar,no,100, , ,yes,no,yes,SQL_Latin1_General_CP1_CI_AS
Identity Seed Increment Not For Replication
No identity column defined. NULL NULL NULL
RowGuidCol
No rowguidcol column defined.
Data_located_on_filegroup
PRIMARY
Here is the execution plan.
Best Answer
A view (without a clustered index) is simply a stored query definition. It does not contain any rows directly.
This requires executing the stored query definition. The base tables (and view query) suffer from some data type issues and a lack of useful indexing, which result in an extraordinary amount of work being performed each time the view is accessed (explained below).
Data type and correctness issues
The column
retrieved_at
is currently typed asvarchar(100)
. You should use a proper date/time type instead. Aside from the performance considerations, you are almost certainly getting incorrect results right now:MAX(retrieved_at)
will find the string that sorts highest, not the most recent value indatetime
terms. The comparisons that involve aDATEADD
end up converting to adatetime
, but only after theMAX
has been found (as a string).Ideally, you would convert the base tables so that correct data types are used, for example using:
Overall strategy
It's not clear from the question, but it is possible that you are intending to store a snapshot of the data from time to time. If that is the case, it would be much more efficient to write the result of your dynamic query to a permanent table than a view.
Current execution plan
The plan you provided highlights several issues. Some of this may be irrelevant given the points already mentioned, so this is provided for interest.
The hash join on account ID produces 36,222 rows, where only 1 was expected. This indicates that statistics are out of date on one or both of the tables involved in that join.
Updating statistics may improve that estimate, though you might need to go further, e.g. creating a filtered index (or statistics) on the
[QUARTERLY_VIEW]
table with anAppear_In_View = 'true'
predicate. As a side note, if that column is true/false, a better data type choice thanvarchar
would bebit
.The rest of the plan is driven by a nested loops left semi join. For each of the 36,222 rows coming from the hash join, SQL Server:
MAX()
retrieved_at` (Stream Aggregate)retrieved_at
value.Note that this process (full scan, aggregate, filter) occurs for every single one of the 36,222 rows produced by the initial hash join.
Worse, if the first scan-aggregate-filter branch does not find a match (satisfying the semi join), SQL Server goes on to run the same process again in full, for the -1, -2, and -3 minute cases.
The numbers shown in the execution plan above (using SQL Sentry Plan Explorer) indicate the total number of rows produced by each operator over all iterations of the nested loops join. In SSMS, you would need to look at the Actual Number of Rows property for each operator.
For the uppermost scan-aggregate-filter branch, this total is 1,312,033,284 rows. The second branch contributes an additional 436,185,324 rows. It would be worse if the -2 and -3 minute cases were ever needed to find a matching row. Hopefully, you can see why the 'simple query' runs for 20 minutes.
Actions
Create an index on
retrieved_at
e.g.Evaluate a clustered index on
account_id
e.g.The above steps should improve performance very significantly, especially the index on
retrieved_at
(typed as datetime).A query rewrite may be needed to avoid computing the
MAX
four times, due to optimizer limitations/priorities, but the index should make that operation trivial (reading one row from the end of the index), so ought not to be necessary in practical terms.In case it is useful, one query rewrite approach is:
The estimated plan for this query with the data type and index changes above: