Sql-server – SQL Server index / performance help needed (index scan and a sort taking 40 minutes)

sql serversql-server-2008sql-server-2008-r2

I finally have the logic figured out for a query, now I need to speed it up… dramatically, if possible (it runs for 40 minutes +/-). It's on two tables, one with just a couple hundred rows (tblTradingDays) and the other with over a million (tblDailySMA). It returns 48 million rows.

My basic logic is: I'm returning a row for each combination of symbol, trade date, and period values, creating a "slow v fast" pair of periods. So in the original table, there is a symbol, a trade date, a period (5,10,20,…) and the actual value for that combination. I want to compare the values for all combinations of periods (except period p1 = period p2). I'm numbering the records to be used for further processing. Hope that makes sense.

I've tried various indexes and combinations of the fields involved in the joins and predicates. In the estimated execution plan, it looks like a sort is the most expensive operation. The only index on tblDailySMA that appears in the plan, despite my adding others is nonclustered, unique, on: Symbol, TradeDate, Period, and includes Value. It is scanned. The query is below, hopefully someone can help me out. Thanks in advance..

SELECT 
   ROW_NUMBER() OVER 
   (
        ORDER BY t1.Symbol, t1.Period, t2.Period, t.TradingDate DESC
   ) RowNum,
   t1.Symbol, t.TradingDate, t1.Period, t2.Period, t1.Value FastValue, 
   t2.Value SlowValue, (t1.Value - t2.Value) SlowFastDiff,
   ChgSign = CASE WHEN t1.Value < t2.Value THEN 0 
                  WHEN t1.Value >= t2.Value THEN 1 
                  WHEN t1.Value IS NULL OR t2.Value IS NULL THEN NULL 
             END
FROM 
    tblTradingDays t 
RIGHT JOIN 
    tblDailySMA t1 ON t.TradingDate = t1.TradeDate
INNER JOIN 
    tblDailySMA t2 ON t1.Symbol = t2.Symbol AND t1.TradeDate  = t2.TradeDate
WHERE 
    t1.Period < t2.Period

Here's the execution plan:

  |--Compute Scalar(DEFINE:([Expr1007]=[Market].[dbo].[tblDailySMA].[Value] as [t1].[Value]-[Market].[dbo].[tblDailySMA].[Value] as [t2].[Value], [Expr1008]=CASE WHEN [Market].[dbo].[tblDailySMA].[Value] as [t1].[Value]<[Market].[dbo].[tblDailySMA].[Value] as [t2].[Value] THEN (0) ELSE CASE WHEN [Market].[dbo].[tblDailySMA].[Value] as [t1].[Value]>=[Market].[dbo].[tblDailySMA].[Value] as [t2].[Value] THEN (1) ELSE NULL END END))
       |--Sequence Project(DEFINE:([Expr1006]=row_number))
            |--Segment
                 |--Parallelism(Gather Streams, ORDER BY:([t1].[Symbol] ASC, [t1].[Period] ASC, [t2].[Period] ASC, [t].[TradingDate] DESC))
                      |--Sort(ORDER BY:([t1].[Symbol] ASC, [t1].[Period] ASC, [t2].[Period] ASC, [t].[TradingDate] DESC))
                           |--Hash Match(Right Outer Join, HASH:([t].[TradingDate])=([t1].[TradeDate]), RESIDUAL:([Market].[dbo].[tblTradingDays].[TradingDate] as [t].[TradingDate]=[Market].[dbo].[tblDailySMA].[TradeDate] as [t1].[TradeDate]))
                                |--Parallelism(Distribute Streams, Hash Partitioning, PARTITION COLUMNS:([t].[TradingDate]))
                                |    |--Index Scan(OBJECT:([Market].[dbo].[tblTradingDays].[PK_tblTradingDays] AS [t]))
                                |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([t1].[TradeDate]))
                                     |--Merge Join(Inner Join, MANY-TO-MANY MERGE:([t1].[Symbol], [t1].[TradeDate])=([t2].[Symbol], [t2].[TradeDate]), RESIDUAL:([Market].[dbo].[tblDailySMA].[Symbol] as [t1].[Symbol]=[Market].[dbo].[tblDailySMA].[Symbol] as [t2].[Symbol] AND [Market].[dbo].[tblDailySMA].[TradeDate] as [t1].[TradeDate]=[Market].[dbo].[tblDailySMA].[TradeDate] as [t2].[TradeDate] AND [Market].[dbo].[tblDailySMA].[Period] as [t1].[Period]<[Market].[dbo].[tblDailySMA].[Period] as [t2].[Period]))
                                          |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([t1].[Symbol], [t1].[TradeDate]), ORDER BY:([t1].[Symbol] ASC, [t1].[TradeDate] ASC))
                                          |    |--Index Scan(OBJECT:([Market].[dbo].[tblDailySMA].[IX_tblDailySMA_Noncl_SymbTrDatePer] AS [t1]), ORDERED FORWARD)
                                          |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([t2].[Symbol], [t2].[TradeDate]), ORDER BY:([t2].[Symbol] ASC, [t2].[TradeDate] ASC))
                                               |--Index Scan(OBJECT:([Market].[dbo].[tblDailySMA].[IX_tblDailySMA_Noncl_SymbTrDatePer] AS [t2]), ORDERED FORWARD)

Best Answer

When I have this kind of performance problems in a big query I split it into small queries with temporary tables. For me it is a solution and performance ratio may be 10 to 1 or more.

First step:

with t1 as (
SELECT 
  t1.Symbol, 
  t1.Period, 
  t.TradingDate,
  t1.Value as FastValue
FROM      tblDailySMA t1 
LEFT JOIN tblTradingDays t 
        ON t.TradingDate = t1.TradeDate
) 
select * 
into #t1
from t1;

--I include period into idex to avoid table access on next query
create index t1_idx on ( Symbol, TradeDate, Period ) 

Secon step:

with t2 as (
SELECT
  t1.Symbol, 
  t1.Period as period_t1, 
  t.TradingDate,
  t1.Value as FastValue
  t2.Period as period_t2,
  t2.Value as SlowValue,
  t2.TradeDate
FROM       #t1 as t1
INNER JOIN tblDailySMA t2 
   ON t1.Symbol = t2.Symbol AND t1.TradeDate  = t2.TradeDate
WHERE t1.Period < t2.Period
)
select 
  *
into #t2
from t2;

--Here create indexes for t2
--Here next and final query 

And so on. One benefit of this system is that you can improve queries step by step.