SQL Server Performance – Most Efficient Way to Join Huge Tables

performancequery-performancesql servertransaction-logupdate

I have a table with 20M rows, and each row has 3 columns: time, id, and value. For each id and time, there is a value for the status. I want to know the values of the last and the next periods for a specific time and id, and have the following query to get the values:

update a1
set  a1.value_last = b1.value,   
     a1.value_next = c1.value
from tab1 a1
left join tab1 b1
on a1.id = b1.id
and a1.period = b1.period + 1
left join tab1 c1
on a1.id = c1.id
and a1.period = c1.period - 1

It seems that the query takes forever and the log file increased by more than 10 GB. I'm wondering what's the most efficient way to write this query? I know using index will speed up the joining process, but how can I reduce the logging?

I'm using SQL Server 2016 on Win10 64bit.

Best Answer

update a1
set  a1.value_last = LAG(value, 1,0)  OVER (partition by id ORDER BY period)
  ,  a1.value_next = LEAD(value, 1,0) OVER (partition by id ORDER BY period)
from tab1 a1

index on id, period

or just use a view - that might surprise you

CREATE VIEW tab1LastNext  
AS  
select a1.id, a1.period, a1.value
     , LAG(value, 1,0)  OVER (partition by id ORDER BY period) as value_last
     , LEAD(value, 1,0) OVER (partition by id ORDER BY period) as value_next
from tab1 a1;

for logging
need to hope value is not null or this gets messy

select 1;
while @@rowcount > 0
begin 
  update top (10000) a1
  set a1.value_last = LAG(value, 1,0) OVER (partition by id ORDER BY period)
  from tab1 a1 
  where LAG(value, 1,0) OVER (partition by id ORDER BY period) is not null 
    and LAG(value, 1,0) OVER (partition by id ORDER BY period) != a1.value_last
end 
select 1;
while @@rowcount > 0
begin 
  update top (10000) a1
  set a1.value_next = LEAD(value, 1,0) OVER (partition by id ORDER BY period)
  from tab1 a1 
  where LEAD(value, 1,0) OVER (partition by id ORDER BY period) is not null 
    and LEAD(value, 1,0) OVER (partition by id ORDER BY period) != a1.value_next
end