Sql-server – Inline Select Causing Incredibly slow query performance

sql serversql-server-2008

I have a query that is used to populate an aggregate table for reporting purposes. The query comes from another developer here at the company I work for but is my job to make it run fast. So far all of my early attempts failed. I've tried several things with this query and this is where I am at the moment. I have shaved about a half an hour of it's load time already but am stuck and thinking I will probably just have to re-do the whole thing. I'm hoping someone on here can see what if I'm missing anything and give me some pointers on how to fix this query.

SELECT P.CompanyID,
       P.CompanyName,
       P.StoreID,
       P.StoreName,
       P.ReportDate,
       Isnull((SELECT Sum(FT.GrossSales - Isnull(FP.PaymentAmount, 0)) AS PullNet
               FROM   FactSalesTransaction AS FT
                      LEFT JOIN (SELECT TransactionID,
                                        DimStoreID,
                                        DimBusinessDateID,
                                        Sum(PaymentAmount) AS PaymentAmount
                                 FROM   FactSalesPayment
                                 WHERE  DimPaymentTypeID <> 2
                                        AND ModStatusFlg <> 'D'
                                 GROUP  BY TransactionID,
                                           DimStoreID,
                                           DimBusinessDateID) AS FP
                        ON FP.TransactionID = FT.TransactionID
                           AND FP.DimStoreID = FT.DimStoreID
                      INNER JOIN DimCalendar AS C
                        ON FT.DimBusinessDateID = C.DimCalendarID
                           AND FP.DimBusinessDateID = C.DimCalendarID  AND C.CalendarDate >= '12/4/2012'
               WHERE  FT.DimStoreID = P.DimStoreID
                      AND FT.DimBusinessDateID = P.DimBusinessDateID

                      AND FT.ModStatusFlg <> 'D'), 0) AS StoreCash,
       SR.CashDeposit AS StoreResp,
       SN.StoreNet,
       P.DimEmployeeID AS EmpID,
       P.EmpName,
       P.RegisterID,
       P.PullNumber,
       Isnull((SELECT Sum(FT.GrossSales - Isnull(FP.PaymentAmount, 0)) AS PullNet
               FROM   FactSalesTransaction AS FT
                      LEFT JOIN (SELECT TransactionID,
                                        DimStoreID,
                                        DimBusinessDateID,
                                        Sum(PaymentAmount) AS PaymentAmount
                                 FROM   FactSalesPayment
                                 WHERE  DimPaymentTypeID <> 2
                                        AND ModStatusFlg <> 'D'
                                 GROUP  BY TransactionID,
                                           DimStoreID,
                                           DimBusinessDateID) AS FP
                        ON FP.TransactionID = FT.TransactionID
                           AND FP.DimStoreID = FT.DimStoreID
                      INNER JOIN DimCalendar AS C
                        ON FT.DimBusinessDateID = C.DimCalendarID
                           AND FP.DimBusinessDateID = C.DimCalendarID AND C.CalendarDate >= '12/4/2012'
               WHERE  FT.DimStoreID = P.DimStoreID
                      AND FT.DimRegisterID = P.DimRegisterID

                      AND FT.TransactionDateTime BETWEEN P.PullDrawerStartTime AND P.PullDrawerEndTime
                      AND FT.ModStatusFlg <> 'D'), 0) AS PullCash,
       P.PullResp + Isnull((SELECT Sum(SkimAmount)
                            FROM   FactSkims
                            WHERE  DimStoreID = P.DimStoreID
                                   AND DimRegisterID = P.DimRegisterID
                                   AND SkimDateTime BETWEEN P.PullDrawerStartTime AND P.PullDrawerEndTime), 0) AS PullResp,
       Isnull((SELECT Sum(NetSales) AS PullNet
               FROM   FactSalesTransaction AS FT
                      INNER JOIN DimCalendar AS C
                        ON FT.DimBusinessDateID = C.DimCalendarID AND C.CalendarDate >= '12/4/2012'
               WHERE  DimStoreID = P.DimStoreID
                      AND DimRegisterID = P.DimRegisterID
                      AND TransactionDateTime BETWEEN P.PullDrawerStartTime AND P.PullDrawerEndTime

                      AND ModStatusFlg <> 'D'), 0)                                                             AS PullNet
FROM   (SELECT C.CompanyID,
               C.CompanyName,
               S.StoreID,
               S.StoreName,
               F.DimEmployeeID,
               E.FirstName + ' ' + E.LastName                           AS EmpName,
               CASE
                 WHEN F.PullDrawerStartTime <> '1900-01-01' THEN F.PullDrawerStartTime
                 ELSE Isnull(Cast((SELECT TOP 1 Dateadd(SECOND, 1, PullDrawerEndTime)
                                   FROM   FactPullDrawer
                                   WHERE  PullDrawerEndTime < F.PullDrawerEndTime
                                          AND DimStoreID = F.DimStoreID
                                          AND DimRegisterID = F.DimRegisterID
                                          AND DimBusinessDateID = F.DimBusinessDateID
                                   ORDER  BY PullDrawerEndTime DESC) AS DATETIME), BD.CalendarDate + Isnull(Cast(Cast(ST.SiteSettingValue AS TIME) AS DATETIME), Cast('4:00:00 AM' AS DATETIME)))
               END AS PullDrawerStartTime,
               F.PullDrawerEndTime,
               BD.CalendarDate AS ReportDate,
               R.RegisterID,
               R.DimRegisterID,
               (SELECT Count(PullDrawerEndTime)
                FROM   FactPullDrawer
                WHERE  PullDrawerEndTime < F.PullDrawerEndTime
                       AND DimStoreID = F.DimStoreID
                       AND DimRegisterID = F.DimRegisterID
                       AND DimBusinessDateID = F.DimBusinessDateID) + 1 AS PullNumber,
               Isnull(F.Amount, 0)                                      AS PullResp,
               F.DimStoreID,
               F.DimBusinessDateID
        FROM   FactPullDrawer AS F
               INNER JOIN DimCompany AS C
                 ON C.DimCompanyID = F.DimCompanyID
               INNER JOIN DimStore AS S
                 ON S.DimStoreID = F.DimStoreID
               INNER JOIN DimCalendar AS BD
                 ON BD.DimCalendarID = F.DimBusinessDateID
                    AND BD.CalendarDate >= '12/4/2012'
               INNER JOIN DimEmployee AS E
                 ON F.DimEmployeeID = E.DimEmployeeID
               INNER JOIN DimRegister AS R
                 ON R.DimRegisterID = F.DimRegisterID
               LEFT JOIN DimSiteSettings AS ST
                 ON S.StoreID = ST.StoreID
                    AND C.CompanyID = ST.CompanyID
                    AND ST.SiteSettingFieldID = 1412) AS P
       INNER JOIN (SELECT DimStoreID,
                          DimBusinessDateID,
                          Sum(NetSales) AS StoreNet
                   FROM   FactSalesTransaction
                   WHERE  ModStatusFlg <> 'D'
                   GROUP  BY DimStoreID,
                             DimBusinessDateID) AS SN
         ON SN.DimStoreID = P.DimStoreID
            AND SN.DimBusinessDateID = P.DimBusinessDateID
       INNER JOIN (SELECT CompanyID,
                          StoreID,
                          ReportDate,
                          Sum(ValTotal) AS CashDeposit
                   FROM   AgtAccountingReport
                   WHERE  ReportCatOrder = 7
                          AND ReportElementOrder < 100
                          AND ReportElementOrder NOT IN ( 7, 9, 10, 16,17, 18, 19, 20, 21 )
                          AND ReportDate >= '10/28/2012'
                   GROUP  BY CompanyID,
                             StoreID,
                             ReportDate) AS SR
         ON SR.CompanyID = P.CompanyID
            AND SR.StoreID = P.StoreID
            AND SR.ReportDate = P.ReportDate 

I'm thinking all of the nested SELECT's which is why I was thinking I would just start from scratch. Any help would be appreciated.

Best Answer

Okay so this is what I actually did to make this query that used to run in about an hour and a half run in under a minute. First I did a little more digging to see exactly what this was doing. There are a few major sub selects in the outer most portion of the query. They look like this.

Isnull((SELECT Sum(FT.GrossSales - Isnull(FP.PaymentAmount, 0)) AS PullNet
               FROM   FactSalesTransaction AS FT
                      LEFT JOIN (SELECT TransactionID,
                                        DimStoreID,
                                        DimBusinessDateID,
                                        Sum(PaymentAmount) AS PaymentAmount
                                 FROM   FactSalesPayment
                                 WHERE  DimPaymentTypeID <> 2
                                        AND ModStatusFlg <> 'D'
                                 GROUP  BY TransactionID,
                                           DimStoreID,
                                           DimBusinessDateID) AS FP
                        ON FP.TransactionID = FT.TransactionID
                           AND FP.DimStoreID = FT.DimStoreID
                      INNER JOIN DimCalendar AS C
                        ON FT.DimBusinessDateID = C.DimCalendarID
                           AND FP.DimBusinessDateID = C.DimCalendarID  AND C.CalendarDate >= '12/4/2012'
               WHERE  FT.DimStoreID = P.DimStoreID
                      AND FT.DimBusinessDateID = P.DimBusinessDateID

                      AND FT.ModStatusFlg <> 'D'), 0) AS StoreCash

These select statements are joining two of my largest tables. FactSalesTransaction(105 Million Records) to FactSalesPayment(102 Million records) and it is doing so in a select within a select. Which essentially means that for every row returned it is executing this query. Well this query is usually run for about seven days worth of data and therefore returns about 19,000 Records. That Mean's the 3 sub selects to these massive tables need to be executed 19,000 times. Bingo I think I've found where my performance loss was. So I switched those queries to a left join. Nothing complicated just made it so they only had to join once. The left join to replace looks like this.

LEFT JOIN (SELECT Sum(FT.GrossSales - Isnull(FP.PaymentAmount, 0)) AS StoreCash, FT.DimStoreID, FT.DimBusinessDateID
               FROM   FactSalesTransaction AS FT
                      LEFT JOIN (SELECT TransactionID,
                                        DimStoreID,
                                        Sum(PaymentAmount) AS PaymentAmount
                                 FROM   FactSalesPayment
                                 WHERE  DimPaymentTypeID <> 2
                                        AND ModStatusFlg <> 'D'
                                 GROUP  BY TransactionID,
                                           DimStoreID) AS FP
                        ON FP.TransactionID = FT.TransactionID
                           AND FP.DimStoreID = FT.DimStoreID
               WHERE   FT.ModStatusFlg <> 'D'
               GROUP BY FT.DimStoreID, FT.DimBusinessDateID)  AS FP
                       ON FP.DimStoreID = P.DimStoreID
                      AND FP.DimBusinessDateID = P.DimBusinessDateID

As you can see I really didn't change much to the select itself just altered it to not run 19,000 times. The next thing I did was altered the query to a stored procedure where a take the date range the user or in this case the ETL process gives(usually 7 days back) Select from DimCalendar the DimCalendarID for that day so the query is using integers instead of date times and overall has less records to join. Making the final query look like this.

CREATE PROCEDURE [dbo].[NoneOfYourBusinessWhatINamedIt] 
@ETLLoadDate DATETIME 
AS
BEGIN
DECLARE @DimCal Int = (SELECT DimCalendarID  FROM DimCalendar WHERE CalendarDate = @ETLLoadDate)
END
SELECT P.CompanyID,
       P.CompanyName,
       P.StoreID,
       P.StoreName,
       P.ReportDate,
       FP.StoreCash,
       SR.CashDeposit AS StoreResp,
       SN.StoreNet,
       P.DimEmployeeID AS EmpID,
       P.EmpName,
       P.RegisterID,
       P.PullNumber,
       ISNULL(SUM(PC.PullCash), 0) AS PullCash,
       P.PullResp  AS PullResp,
       ISNULL(SUM(PN.PullNet),0) AS PullNet
FROM   (SELECT C.CompanyID,
               C.CompanyName,
               S.StoreID,
               S.StoreName,
               F.DimEmployeeID,
               E.FirstName + ' ' + E.LastName                           AS EmpName,
               CASE
                 WHEN F.PullDrawerStartTime <> '1900-01-01' THEN F.PullDrawerStartTime
                 ELSE Isnull(Cast((SELECT TOP 1 Dateadd(SECOND, 1, PullDrawerEndTime)
                                   FROM   FactPullDrawer
                                   WHERE  PullDrawerEndTime < F.PullDrawerEndTime
                                          AND DimStoreID = F.DimStoreID
                                          AND DimRegisterID = F.DimRegisterID
                                          AND DimBusinessDateID = F.DimBusinessDateID
                                          --AND DimBusinessDateID = @DimCal
                                   ORDER  BY PullDrawerEndTime DESC) AS DATETIME), BD.CalendarDate + Isnull(Cast(Cast(ST.SiteSettingValue AS TIME) AS DATETIME), Cast('4:00:00 AM' AS DATETIME)))
               END                                                      AS PullDrawerStartTime,
               F.PullDrawerEndTime,
               BD.CalendarDate                                          AS ReportDate,
               R.RegisterID,
               R.DimRegisterID,
               (SELECT Count(PullDrawerEndTime)
                FROM   FactPullDrawer
                WHERE  PullDrawerEndTime < F.PullDrawerEndTime
                       AND DimStoreID = F.DimStoreID
                       AND DimRegisterID = F.DimRegisterID
                       AND DimBusinessDateID = F.DimBusinessDateID) + 1 AS PullNumber,
               Isnull(F.Amount, 0)                                      AS PullResp,
               F.DimStoreID,
               F.DimBusinessDateID
        FROM   FactPullDrawer AS F
               INNER JOIN DimCompany AS C
                 ON C.DimCompanyID = F.DimCompanyID
               INNER JOIN DimStore AS S
                 ON S.DimStoreID = F.DimStoreID
               INNER JOIN DimCalendar AS BD
                 ON BD.DimCalendarID = F.DimBusinessDateID
               INNER JOIN DimEmployee AS E
                 ON F.DimEmployeeID = E.DimEmployeeID
               INNER JOIN DimRegister AS R
                 ON R.DimRegisterID = F.DimRegisterID
               LEFT JOIN DimSiteSettings AS ST
                 ON S.StoreID = ST.StoreID
                    AND C.CompanyID = ST.CompanyID
                    AND ST.SiteSettingFieldID = 1412
          WHERE F.DimBusinessDateID = @DimCal) AS P
       INNER JOIN (SELECT DimStoreID,
                          DimBusinessDateID,
                          Sum(NetSales) AS StoreNet
                   FROM   FactSalesTransaction
                   WHERE  ModStatusFlg <> 'D' AND DimBusinessDateID = @DimCal
                   GROUP  BY DimStoreID,
                             DimBusinessDateID) AS SN
         ON SN.DimStoreID = P.DimStoreID
            AND SN.DimBusinessDateID = P.DimBusinessDateID
       INNER JOIN (SELECT CompanyID,
                          StoreID,
                          ReportDate,
                          Sum(ValTotal) AS CashDeposit
                   FROM   AgtAccountingReport
                   WHERE  ReportCatOrder = 7
                          AND ReportElementOrder < 100
                          AND ReportElementOrder NOT IN ( 7, 9, 10, 16,
                                                          17, 18, 19, 20, 21 )
                   GROUP  BY CompanyID,
                             StoreID,
                             ReportDate) AS SR
         ON SR.CompanyID = P.CompanyID
            AND SR.StoreID = P.StoreID
            AND SR.ReportDate = P.ReportDate 
        LEFT JOIN (SELECT Sum(FT.GrossSales - Isnull(FP.PaymentAmount, 0)) AS StoreCash, FT.DimStoreID, FT.DimBusinessDateID
               FROM   FactSalesTransaction AS FT
                      LEFT JOIN (SELECT TransactionID,
                                        DimStoreID,
                                        Sum(PaymentAmount) AS PaymentAmount
                                 FROM   FactSalesPayment
                                 WHERE  DimPaymentTypeID <> 2
                                        AND ModStatusFlg <> 'D'
                                        AND DimBusinessDateID = @DimCal
                                 GROUP  BY TransactionID,
                                           DimStoreID) AS FP
                        ON FP.TransactionID = FT.TransactionID
                           AND FP.DimStoreID = FT.DimStoreID
               WHERE   FT.ModStatusFlg <> 'D' AND DimBusinessDateID = @DimCal
               GROUP BY FT.DimStoreID, FT.DimBusinessDateID)  AS FP
                       ON FP.DimStoreID = P.DimStoreID
                      AND FP.DimBusinessDateID = P.DimBusinessDateID


         LEFT JOIN(SELECT Sum(FT.GrossSales - Isnull(FP.PaymentAmount, 0)) AS PullCash, FT.DimStoreID, FT.DimRegisterID, FT.TransactionDateTime
               FROM   FactSalesTransaction AS FT
                      LEFT JOIN (SELECT TransactionID,
                                        DimStoreID,
                                        Sum(PaymentAmount) AS PaymentAmount
                                 FROM   FactSalesPayment
                                 WHERE  DimPaymentTypeID <> 2
                                        AND ModStatusFlg <> 'D'
                                        AND DimBusinessDateID = @DimCal
                                 GROUP  BY TransactionID,
                                           DimStoreID) AS FP
                        ON FP.TransactionID = FT.TransactionID
                           AND FP.DimStoreID = FT.DimStoreID
               WHERE  FT.ModStatusFlg <> 'D' AND DimBusinessDateID = @DimCal
               GROUP BY FT.DimStoreID, FT.DimRegisterID, FT.TransactionDateTime) AS PC
               ON PC.DimStoreID = P.DimStoreID
                      AND PC.DimRegisterID = P.DimRegisterID
                      AND PC.TransactionDateTime BETWEEN P.PullDrawerStartTime AND P.PullDrawerEndTime     
         LEFT JOIN (SELECT Sum(SkimAmount) as SkimAmount, DimStoreID, DimRegisterID, SkimDateTime
                            FROM   FactSkims
                            GROUP BY DimStoreID, DimRegisterID, SkimDateTime) AS FS
                            ON  FS.DimStoreID = P.DimStoreID
                                   AND FS.DimRegisterID = P.DimRegisterID
                                   AND FS.SkimDateTime BETWEEN P.PullDrawerStartTime AND P.PullDrawerEndTime     
         LEFT JOIN (SELECT Sum(NetSales) AS PullNet, DimStoreID, DimRegisterID, TransactionDateTime
               FROM   FactSalesTransaction
               WHERE  ModStatusFlg <> 'D' AND DimBusinessDateID = @DimCal
               GROUP BY DimStoreID, DimRegisterID, TransactionDateTime) AS PN                              
               ON PN.DimStoreID = P.DimStoreID
                      AND PN.DimRegisterID = P.DimRegisterID
                      AND PN.TransactionDateTime BETWEEN P.PullDrawerStartTime AND P.PullDrawerEndTime
                      AND PC.TransactionDateTime = PN.TransactionDateTime
       group by P.CompanyID, P.CompanyName, P.StoreID,P.StoreName,P.ReportDate,FP.StoreCash,SR.CashDeposit,SN.StoreNet,P.DimEmployeeID,P.EmpName,P.RegisterID,P.PullNumber,P.PullResp

After doing a test for the speed increase to see if it was worth the index space. I added two Non-Clustered indexes on ModStatusFlag and DimBusinessDateID that included the other columns requested by this query. On both FactSalesTransaction and FactSalesPayment. I could probably do a little more to clean this up and make it run faster however the performance gain would be minimal and there are much bigger fish to fry at the moment. Long story short be careful with your sub-select Statements.