I've a table with around 32 million rows having clustered unique index on CountryID,RetailerID,ProductID,DateID,EventID,TypeID and query is
SELECT f.RetailerID
, TypeID AS TypeID
, c.CalendarMonth AS CalendarValue
, SUM(ISNULL([Volume],0) ) AS Volume
, SUM([VolumeBox] ) AS VolumeBox
, SUM([VolumeKG] ) AS VolumeKG
, SUM([VolumeUnit] ) AS VolumeUnit
, SUM(ISNULL([R_Turnover] , 0.0) ) AS R_Turnover
, SUM(ISNULL([R_VAT],0.0) ) AS R_VAT
, SUM(ISNULL([R_TurnoverExVAT],0.0) ) AS R_TurnoverExVAT
, SUM([SupplierRealisation_Amt] ) AS [SupplierRealisation_Amt]
, SUM([SupplierDiscount1_Amt] ) AS [SupplierDiscount1_Amt]
, SUM([Supplier_1NetSales_Amt] ) AS [Supplier_1NetSales_Amt]
, SUM([SupplierDiscount2_Amt] ) AS [SupplierDiscount2_Amt]
, SUM([Supplier_2NetSales_Amt] ) AS [Supplier_2NetSales_Amt]
, SUM([SupplierDiscount3_Amt] ) AS [SupplierDiscount3_Amt]
, SUM([Supplier_3NetSales_Amt] ) AS [Supplier_3NetSales_Amt]
, SUM(ISNULL(S_CostofGoodsSold, 0) ) AS [S_CostofGoodsSold]
, SUM(ISNULL(S_Profit, 0) ) AS S_Profit
, SUM(0.0 ) AS AdditionalCostofGoodsSold
, SUM(ISNULL([R_DistributionCost],0.0) ) AS R_DistributionCost
, SUM(ISNULL([R_Profit],0.0) ) AS R_Profit
FROM [dbo].[EventScenarios] es
JOIN dbo.[Event] e ON es.[EventID] = e.ID
JOIN dbo.EventProduct ep on es.EventID = ep.EventID
JOIN [dbo].Product p ON p.ID=ep.ProductID
JOIN dbo.EventPL f ON e.CountryID = f.CountryID AND f.RetailerID = e.RetailerID AND f.EventID = e.ID
AND ep.ProductID = f.ProductID
INNER JOIN Calendar c ON c.DateID = f.DateID
WHERE f.CountryID= 14 AND c.CalendarMonth BETWEEN 201301 AND 201312
GROUP BY f.RetailerID , c.CalendarMonth ,TypeID
The query plan is showing 88% time on Clustered Index seek on EventPL table but still its taking around 15 seconds to complete. Is there any way I can optimise it to around 1/2 seconds?
Best Answer
(Moved my comment to an answer, and expanded a bit.)
From the plan you provided, the clustered index scan is 3.5x overestimated, and the Sort is an order of magnitude underestimated which caused a spill to
tempdb
. Performance may improve after updating statistics.However, based on your reply, if this schema is servicing "random" queries against that amount of data, it's going to be extremely challenging and/or expensive to get good performance for all the possible combinations.
For this type of workload, I think the only realistic way to meet the performance goal would be to abandon trying to find a database engine solution, and use Analysis Services instead.