I have a query that is used to populate an aggregate table for reporting purposes. The query comes from another developer here at the company I work for but is my job to make it run fast. So far all of my early attempts failed. I've tried several things with this query and this is where I am at the moment. I have shaved about a half an hour of it's load time already but am stuck and thinking I will probably just have to re-do the whole thing. I'm hoping someone on here can see what if I'm missing anything and give me some pointers on how to fix this query.
SELECT P.CompanyID,
P.CompanyName,
P.StoreID,
P.StoreName,
P.ReportDate,
Isnull((SELECT Sum(FT.GrossSales - Isnull(FP.PaymentAmount, 0)) AS PullNet
FROM FactSalesTransaction AS FT
LEFT JOIN (SELECT TransactionID,
DimStoreID,
DimBusinessDateID,
Sum(PaymentAmount) AS PaymentAmount
FROM FactSalesPayment
WHERE DimPaymentTypeID <> 2
AND ModStatusFlg <> 'D'
GROUP BY TransactionID,
DimStoreID,
DimBusinessDateID) AS FP
ON FP.TransactionID = FT.TransactionID
AND FP.DimStoreID = FT.DimStoreID
INNER JOIN DimCalendar AS C
ON FT.DimBusinessDateID = C.DimCalendarID
AND FP.DimBusinessDateID = C.DimCalendarID AND C.CalendarDate >= '12/4/2012'
WHERE FT.DimStoreID = P.DimStoreID
AND FT.DimBusinessDateID = P.DimBusinessDateID
AND FT.ModStatusFlg <> 'D'), 0) AS StoreCash,
SR.CashDeposit AS StoreResp,
SN.StoreNet,
P.DimEmployeeID AS EmpID,
P.EmpName,
P.RegisterID,
P.PullNumber,
Isnull((SELECT Sum(FT.GrossSales - Isnull(FP.PaymentAmount, 0)) AS PullNet
FROM FactSalesTransaction AS FT
LEFT JOIN (SELECT TransactionID,
DimStoreID,
DimBusinessDateID,
Sum(PaymentAmount) AS PaymentAmount
FROM FactSalesPayment
WHERE DimPaymentTypeID <> 2
AND ModStatusFlg <> 'D'
GROUP BY TransactionID,
DimStoreID,
DimBusinessDateID) AS FP
ON FP.TransactionID = FT.TransactionID
AND FP.DimStoreID = FT.DimStoreID
INNER JOIN DimCalendar AS C
ON FT.DimBusinessDateID = C.DimCalendarID
AND FP.DimBusinessDateID = C.DimCalendarID AND C.CalendarDate >= '12/4/2012'
WHERE FT.DimStoreID = P.DimStoreID
AND FT.DimRegisterID = P.DimRegisterID
AND FT.TransactionDateTime BETWEEN P.PullDrawerStartTime AND P.PullDrawerEndTime
AND FT.ModStatusFlg <> 'D'), 0) AS PullCash,
P.PullResp + Isnull((SELECT Sum(SkimAmount)
FROM FactSkims
WHERE DimStoreID = P.DimStoreID
AND DimRegisterID = P.DimRegisterID
AND SkimDateTime BETWEEN P.PullDrawerStartTime AND P.PullDrawerEndTime), 0) AS PullResp,
Isnull((SELECT Sum(NetSales) AS PullNet
FROM FactSalesTransaction AS FT
INNER JOIN DimCalendar AS C
ON FT.DimBusinessDateID = C.DimCalendarID AND C.CalendarDate >= '12/4/2012'
WHERE DimStoreID = P.DimStoreID
AND DimRegisterID = P.DimRegisterID
AND TransactionDateTime BETWEEN P.PullDrawerStartTime AND P.PullDrawerEndTime
AND ModStatusFlg <> 'D'), 0) AS PullNet
FROM (SELECT C.CompanyID,
C.CompanyName,
S.StoreID,
S.StoreName,
F.DimEmployeeID,
E.FirstName + ' ' + E.LastName AS EmpName,
CASE
WHEN F.PullDrawerStartTime <> '1900-01-01' THEN F.PullDrawerStartTime
ELSE Isnull(Cast((SELECT TOP 1 Dateadd(SECOND, 1, PullDrawerEndTime)
FROM FactPullDrawer
WHERE PullDrawerEndTime < F.PullDrawerEndTime
AND DimStoreID = F.DimStoreID
AND DimRegisterID = F.DimRegisterID
AND DimBusinessDateID = F.DimBusinessDateID
ORDER BY PullDrawerEndTime DESC) AS DATETIME), BD.CalendarDate + Isnull(Cast(Cast(ST.SiteSettingValue AS TIME) AS DATETIME), Cast('4:00:00 AM' AS DATETIME)))
END AS PullDrawerStartTime,
F.PullDrawerEndTime,
BD.CalendarDate AS ReportDate,
R.RegisterID,
R.DimRegisterID,
(SELECT Count(PullDrawerEndTime)
FROM FactPullDrawer
WHERE PullDrawerEndTime < F.PullDrawerEndTime
AND DimStoreID = F.DimStoreID
AND DimRegisterID = F.DimRegisterID
AND DimBusinessDateID = F.DimBusinessDateID) + 1 AS PullNumber,
Isnull(F.Amount, 0) AS PullResp,
F.DimStoreID,
F.DimBusinessDateID
FROM FactPullDrawer AS F
INNER JOIN DimCompany AS C
ON C.DimCompanyID = F.DimCompanyID
INNER JOIN DimStore AS S
ON S.DimStoreID = F.DimStoreID
INNER JOIN DimCalendar AS BD
ON BD.DimCalendarID = F.DimBusinessDateID
AND BD.CalendarDate >= '12/4/2012'
INNER JOIN DimEmployee AS E
ON F.DimEmployeeID = E.DimEmployeeID
INNER JOIN DimRegister AS R
ON R.DimRegisterID = F.DimRegisterID
LEFT JOIN DimSiteSettings AS ST
ON S.StoreID = ST.StoreID
AND C.CompanyID = ST.CompanyID
AND ST.SiteSettingFieldID = 1412) AS P
INNER JOIN (SELECT DimStoreID,
DimBusinessDateID,
Sum(NetSales) AS StoreNet
FROM FactSalesTransaction
WHERE ModStatusFlg <> 'D'
GROUP BY DimStoreID,
DimBusinessDateID) AS SN
ON SN.DimStoreID = P.DimStoreID
AND SN.DimBusinessDateID = P.DimBusinessDateID
INNER JOIN (SELECT CompanyID,
StoreID,
ReportDate,
Sum(ValTotal) AS CashDeposit
FROM AgtAccountingReport
WHERE ReportCatOrder = 7
AND ReportElementOrder < 100
AND ReportElementOrder NOT IN ( 7, 9, 10, 16,17, 18, 19, 20, 21 )
AND ReportDate >= '10/28/2012'
GROUP BY CompanyID,
StoreID,
ReportDate) AS SR
ON SR.CompanyID = P.CompanyID
AND SR.StoreID = P.StoreID
AND SR.ReportDate = P.ReportDate
I'm thinking all of the nested SELECT
's which is why I was thinking I would just start from scratch. Any help would be appreciated.
Best Answer
Okay so this is what I actually did to make this query that used to run in about an hour and a half run in under a minute. First I did a little more digging to see exactly what this was doing. There are a few major sub selects in the outer most portion of the query. They look like this.
These select statements are joining two of my largest tables. FactSalesTransaction(105 Million Records) to FactSalesPayment(102 Million records) and it is doing so in a select within a select. Which essentially means that for every row returned it is executing this query. Well this query is usually run for about seven days worth of data and therefore returns about 19,000 Records. That Mean's the 3 sub selects to these massive tables need to be executed 19,000 times. Bingo I think I've found where my performance loss was. So I switched those queries to a left join. Nothing complicated just made it so they only had to join once. The left join to replace looks like this.
As you can see I really didn't change much to the select itself just altered it to not run 19,000 times. The next thing I did was altered the query to a stored procedure where a take the date range the user or in this case the ETL process gives(usually 7 days back) Select from DimCalendar the DimCalendarID for that day so the query is using integers instead of date times and overall has less records to join. Making the final query look like this.
After doing a test for the speed increase to see if it was worth the index space. I added two Non-Clustered indexes on ModStatusFlag and DimBusinessDateID that included the other columns requested by this query. On both FactSalesTransaction and FactSalesPayment. I could probably do a little more to clean this up and make it run faster however the performance gain would be minimal and there are much bigger fish to fry at the moment. Long story short be careful with your sub-select Statements.