SQL Server 2012 – Optimize Select Distinct from Large Table

optimizationsql-server-2012t-sql

I have a table that contains 320,071,712 number of records, the report that I am currently working on can filter the records by date, after which the number of records comes down to 145,878,852. The number of distinct records are 107,311,357.

select count(*) 
from [BroadcastOpens] with (nolock) 
where [OpenTimeUtc] >= @StartDate 
and  [OpenTimeUtc] <@EndDate

The most time consuming process is SELECT DISTINCT..., which took about 40 minutes to run.

SELECT DISTINCT [SubscriberId],[OpenTimeUtc]  
FROM [BroadcastOpens] WITH (nolock) 
WHERE [OpenTimeUtc] >= @StartDate 
AND [OpenTimeUtc] <@EndDate

I have created two indexes on this table for both columns BroadcastId and OpenTimeUtc. They helped to speed up the process, but seem to be less significant.

CREATE NONCLUSTERED INDEX [IX_BroadcastOpens_BroadcastId_Temp] 
ON [dbo].[BroadcastOpens]([BroadcastId])

CREATE NONCLUSTERED INDEX [IX_BroadcastOpens_OpenTimeUtc_Temp] 
ON [dbo].[BroadcastOpens]([OpenTimeUtc])

As suggested by Martin Smith, I also have created an indexed view on Table BroadcastOpens grouped by [SubscriberId],[OpenTimeUtc],[BroadcastId].

CREATE VIEW dbo.vwBroadcastOpensRecords
WITH SCHEMABINDING
AS

SELECT [SubscriberId],[OpenTimeUtc],[BroadcastId], COUNT_BIG(*) as tmp    
from [dbo].[BroadcastOpens] group by [SubscriberId],[OpenTimeUtc],   [BroadcastId]  


CREATE UNIQUE CLUSTERED INDEX CIX_vwBroadcastOpensRecords_Temp ON    
vwBroadcastOpensRecords(SubscriberId, OpenTimeUtc,BroadcastId);

This is the step in the query that causes problem:

Image of Execution Plan

I am now testing both approaches to see which one yields better performances.

  1. No changes to the existing tsql query, no non clustered indexed created. Running on production server.
    enter image description here

  2. No changes to the existing tsql query, created two non clustered indexes on the table. Running on staging server.
    enter image description here

  3. Create an indexed view, and modified existing tsql query to use this new indexed view to replace table … testing… (I stopped the execution of the query as it was running for 28 minutes already… I suspect there is something wrong with my indexed view…)
    enter image description here

Just wondering: The table BroadcastOpens gets constantly updated, does this have an impact on the indexed view that I created?

Any suggestions on how to improve the performance of this query are appreciated.

Thank you HABO and Martin Smith for your invaluable help!

Best Answer

If you want this query gangsta fast create an indexed view that groups on [OpenTimeUtc], [SubscriberId]. That view will serve the date filter perfectly and it will pre-calculate the distinct as well. The query will result in a plan that does an index range scan on the view. Not possible to do any better than that.