SQL Server Performance – Most Efficient Way to Join Huge Tables

performancequery-performancesql servertransaction-logupdate

I have a table with 20M rows, and each row has 3 columns: time, id, and value. For each id and time, there is a value for the status. I want to know the values of the last and the next periods for a specific time and id, and have the following query to get the values:

update a1
set  a1.value_last = b1.value,   
     a1.value_next = c1.value
from tab1 a1
left join tab1 b1
on a1.id = b1.id
and a1.period = b1.period + 1
left join tab1 c1
on a1.id = c1.id
and a1.period = c1.period - 1

It seems that the query takes forever and the log file increased by more than 10 GB. I'm wondering what's the most efficient way to write this query? I know using index will speed up the joining process, but how can I reduce the logging?

I'm using SQL Server 2016 on Win10 64bit.

Best Answer

update a1
set  a1.value_last = LAG(value, 1,0)  OVER (partition by id ORDER BY period)
  ,  a1.value_next = LEAD(value, 1,0) OVER (partition by id ORDER BY period)
from tab1 a1

index on id, period

or just use a view - that might surprise you

CREATE VIEW tab1LastNext  
AS  
select a1.id, a1.period, a1.value
     , LAG(value, 1,0)  OVER (partition by id ORDER BY period) as value_last
     , LEAD(value, 1,0) OVER (partition by id ORDER BY period) as value_next
from tab1 a1;

for logging
need to hope value is not null or this gets messy

select 1;
while @@rowcount > 0
begin 
  update top (10000) a1
  set a1.value_last = LAG(value, 1,0) OVER (partition by id ORDER BY period)
  from tab1 a1 
  where LAG(value, 1,0) OVER (partition by id ORDER BY period) is not null 
    and LAG(value, 1,0) OVER (partition by id ORDER BY period) != a1.value_last
end 
select 1;
while @@rowcount > 0
begin 
  update top (10000) a1
  set a1.value_next = LEAD(value, 1,0) OVER (partition by id ORDER BY period)
  from tab1 a1 
  where LEAD(value, 1,0) OVER (partition by id ORDER BY period) is not null 
    and LEAD(value, 1,0) OVER (partition by id ORDER BY period) != a1.value_next
end

Related Solutions

Sql-server – SQL Server index / performance help needed (index scan and a sort taking 40 minutes)

When I have this kind of performance problems in a big query I split it into small queries with temporary tables. For me it is a solution and performance ratio may be 10 to 1 or more.

First step:

with t1 as (
SELECT 
  t1.Symbol, 
  t1.Period, 
  t.TradingDate,
  t1.Value as FastValue
FROM      tblDailySMA t1 
LEFT JOIN tblTradingDays t 
        ON t.TradingDate = t1.TradeDate
) 
select * 
into #t1
from t1;

--I include period into idex to avoid table access on next query
create index t1_idx on ( Symbol, TradeDate, Period )

Secon step:

with t2 as (
SELECT
  t1.Symbol, 
  t1.Period as period_t1, 
  t.TradingDate,
  t1.Value as FastValue
  t2.Period as period_t2,
  t2.Value as SlowValue,
  t2.TradeDate
FROM       #t1 as t1
INNER JOIN tblDailySMA t2 
   ON t1.Symbol = t2.Symbol AND t1.TradeDate  = t2.TradeDate
WHERE t1.Period < t2.Period
)
select 
  *
into #t2
from t2;

--Here create indexes for t2
--Here next and final query

And so on. One benefit of this system is that you can improve queries step by step.

Mysql – What’s the most efficient way to batch UPDATE queries in MySQL

Since you're using InnoDB tables, the most obvious optimization would be to group multiple UPDATEs into a transaction.

With InnoDB, being a transactional engine, you pay not just for the UPDATE itself, but also for all the transactional overhead: managing the transaction buffer, transaction log, flushing the log to disk.

If you are logically comfortable with the idea, try and group 100-1000 UPDATEs at a time, each time wrapped like this:

START TRANSACTION;
UPDATE ...
UPDATE ...
UPDATE ...
UPDATE ...
COMMIT;

Possible downsides:

One error will collapse the entire transaction (but would be easily fixed in code)
You might wait for a long time to accumulate your 1000 UPDATEs, so you might also want to have some timeout
More complexity on your application code.

Best Answer

Related Solutions

Sql-server – SQL Server index / performance help needed (index scan and a sort taking 40 minutes)

Mysql – What’s the most efficient way to batch UPDATE queries in MySQL

Related Question