Sql-server – Poor performance on inner join using dates and between

join;performancequery-performancesql serversql-server-2008t-sql

I am having trouble optimizing a query that does an inner join using a date range. The purposes of the query is take daily data and summarize by week.

Select  pcw.EndDate WeekEndDate, h.Store, SUM(h.DeliveryChargesTotal) DeliveryChargesTotal 
from Daily_GC_Headers h
inner join PeriodCalendar_Weeks pcw
on h.SalesDate between pcw.StartDate and pcw.EndDate
where SalesDate between @StartDate and @EndDate and isCanceled = 0 
group by pcw.EndDate, h.Store

Simplified schema of Daily_GC_Headers table (13.8 million rows; about 5.4 million match criteria in WHERE clause):

Store - Varchar(10) (PK)
SalesDate - Date (PK)
TicketNumber - SmallInt (PK; starts over 1 each day at each store.)
IsCanceled - Bit
DeliveryChargesTotal - Decimal(9,2)

Simplified schema of PeriodCalendar_Weeks Table (570 rows; 53 match the criteria):

Year - smallint (PK)
Period - tinyint (PK)
Week - tinyint (PK)
StartDate - Date    
EndDate - Date

This query takes about 15 seconds in SSMS. Querying Daily_GC_Headers by itself (and just grouping by Store) takes 2 seconds. A query against PeriodCalendar_Weeks is "instant".

DBCC SHOW_STATISTICS indicates that the stats are both tables are current (we run a weekly job to update them). I've tried clearing the plan caches.

The execution plan is strange. For example, it is doing an Eager Spool on PeriodCalendar_Weeks. The estimated rows is 156.6 but the actual rows is 153,971. It then filters the results of that first spool and does a Lazy Spool. The estimated/actual rows of that 2nd spool is 5.4 million, even though the underlying table has less than 600 rows in it.

What should I be looking for or doing to optimize this?

Additional Information

For sake of clarity, I initially described an oversimplified PK on the Weeks table. I have update the schema above to show the full key. The PK described for the Headers is (and was) the full key.

Screen shot of some rows from the Weeks table:
enter image description here

Stats from the Weeks table:
enter image description here

Some stats from Headers table. There seems to be an histogram record for about every 5-10 days for the entire history in the table (3 years).
enter image description here

Best Answer

Much more efficient to do this without having to go back and join to the periods table.

DECLARE @StartDate DATE, @EndDate DATE;

Select @StartDate = Min(StartDate), @EndDate = MAX(EndDate) 
from dbo.PeriodCalendar_Weeks pcw
where (pcw.Year = @Year and pcw.Period < @Period) 
  or  (pcw.Year = @Year and pcw.Period = @Period and pcw.Week <= @Week) 
  or (pcw.Year = @Year -1 and pcw.Period >= @Period);

SELECT 
  WeekEndDate = DATEADD(DAY, 6, DATEADD(WEEK, SalesWeek, @StartDate)), 
  Store, 
  DeliveryChargesTotal = dct
FROM 
(
  SELECT DATEDIFF(DAY, @StartDate, SalesDate)/7, Store, SUM(DeliveryChargesTotal)
  FROM dbo.Daily_GC_Headers
  WHERE SalesDate BETWEEN @StartDate AND @EndDate AND isCanceled = 0
  GROUP BY DATEDIFF(DAY, @StartDate, SalesDate)/7, Store
) AS x (SalesWeek, Store, dct)
ORDER BY WeekEndDate, Store;

A filtered index may help, if many rows exist where isCanceled = 1 (these are just possible suggestions, depending on cardinality of Store, and may not be the most optimal):

CREATE INDEX x ON dbo.Daily_GC_Headers
  (SalesDate) INCLUDE (Store, DeliveryChargesTotal)
  WHERE isCanceled = 0;

If there are very few rows where isCanceled = 1, this may be better:

CREATE INDEX x ON dbo.Daily_GC_Headers
  (SalesDate, IsCanceled) INCLUDE (Store, DeliveryChargesTotal);

Both are worth trying on a test system, as well as moving Store into the key in either case, or moving IsCanceled to the INCLUDE list in the latter case. On my system, I found the best results with everything but the date in the INCLUDE list:

CREATE INDEX x ON dbo.Daily_GC_Headers
  (SalesDate) INCLUDE (Store, IsCanceled, DeliveryChargesTotal);

Again, you will need to test if any of these work out, or if the query above gives a different/better recommendation directly from SQL Server.

Related Solutions

Sql-server – TSQL performance – JOIN on value BETWEEN min and max

You need an additional index. In your Fiddle example I added:

CREATE UNIQUE INDEX ix_IP ON Request(CategoryID, IP)

Which covers you for the request table and gets an index seek instead of a clustered index scan.

See how that improves it and let me know. I'm guessing it'll help quite a bit since the scan on that index is I'm sure not cheap.

Sql-server – Will Ola’s default IndexOptimizer job update the stats

By default, the script does not perform statistics maintenance. The documentation for the script's parameters can be found here.

Value   Description
================================
ALL     Update index and column statistics.
INDEX   Update index statistics.
COLUMNS Update column statistics.
NULL    Do not perform statistics maintenance. This is the default.

So if you let the main script create your maintenance jobs for you, and you wanted to update index and column statistics, then you'd need to edit the IndexOptimize job and add , @UpdateStatistics='ALL' into the procedure call.

Best Answer

Related Solutions

Sql-server – TSQL performance – JOIN on value BETWEEN min and max

Sql-server – Will Ola’s default IndexOptimizer job update the stats

Related Question