I am having trouble optimizing a query that does an inner join using a date range. The purposes of the query is take daily data and summarize by week.
Select pcw.EndDate WeekEndDate, h.Store, SUM(h.DeliveryChargesTotal) DeliveryChargesTotal
from Daily_GC_Headers h
inner join PeriodCalendar_Weeks pcw
on h.SalesDate between pcw.StartDate and pcw.EndDate
where SalesDate between @StartDate and @EndDate and isCanceled = 0
group by pcw.EndDate, h.Store
Simplified schema of Daily_GC_Headers
table (13.8 million rows; about 5.4 million match criteria in WHERE clause):
Store - Varchar(10) (PK)
SalesDate - Date (PK)
TicketNumber - SmallInt (PK; starts over 1 each day at each store.)
IsCanceled - Bit
DeliveryChargesTotal - Decimal(9,2)
Simplified schema of PeriodCalendar_Weeks
Table (570 rows; 53 match the criteria):
Year - smallint (PK)
Period - tinyint (PK)
Week - tinyint (PK)
StartDate - Date
EndDate - Date
This query takes about 15 seconds in SSMS. Querying Daily_GC_Headers
by itself (and just grouping by Store
) takes 2 seconds. A query against PeriodCalendar_Weeks
is "instant".
DBCC SHOW_STATISTICS
indicates that the stats are both tables are current (we run a weekly job to update them). I've tried clearing the plan caches.
The execution plan is strange. For example, it is doing an Eager Spool on PeriodCalendar_Weeks
. The estimated rows is 156.6 but the actual rows is 153,971. It then filters the results of that first spool and does a Lazy Spool. The estimated/actual rows of that 2nd spool is 5.4 million, even though the underlying table has less than 600 rows in it.
What should I be looking for or doing to optimize this?
Additional Information
For sake of clarity, I initially described an oversimplified PK on the Weeks table. I have update the schema above to show the full key. The PK described for the Headers is (and was) the full key.
Screen shot of some rows from the Weeks table:
Stats from the Weeks table:
Some stats from Headers table. There seems to be an histogram record for about every 5-10 days for the entire history in the table (3 years).
Best Answer
Much more efficient to do this without having to go back and join to the periods table.
A filtered index may help, if many rows exist where
isCanceled = 1
(these are just possible suggestions, depending on cardinality ofStore
, and may not be the most optimal):If there are very few rows where
isCanceled = 1
, this may be better:Both are worth trying on a test system, as well as moving
Store
into the key in either case, or movingIsCanceled
to theINCLUDE
list in the latter case. On my system, I found the best results with everything but the date in theINCLUDE
list:Again, you will need to test if any of these work out, or if the query above gives a different/better recommendation directly from SQL Server.