Sql-server – Slow Trigger performance when big batches

sql servert-sqltrigger

I have an update trigger that inserts into auditing tables. We had no problem until someone decides to update over 1 million records… (That's my bad. I didn't think it would be a problem when developing).
Now facing reality, I need to find a solution…

I've been doing many tests and researches to try to figure out how to solve my issue of having a trigger perform poorly…
I've come to the conclusion that to minimize the bad performance of the "Table Insert" in the execution plan, I need to insert in smaller batches.

The question is: Since I'm not sure of where all the different updates can come from, I'm trying to figure out how I can insert the auditing records in batches within the trigger?

example, The update of the main table for 1 million records would happen and call the trigger, which would insert 100 thousand records at a time in some type of loop.

Is this possible? If so, how do you suggest?
If not, how else can I improve the table insert of the execution plan?

Addition of test scripts to reproduce:

This is a simplified version of the real thing

-- drop trigger PriceHist_trig_U 
-- drop table MyPriceTable
-- drop table price_history
Create Table MyPriceTable (SKU varchar(13), PriceGroup varchar(5), PriceLevel int, Price float, Qty float, ManyOtherColumns Varchar(100)
CONSTRAINT [PRICE_TAB_P01] PRIMARY KEY CLUSTERED 
(
    SKU ASC,
    PriceGroup ASC,
    PriceLevel ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

Declare @Id int
Set @Id = 1

While @Id <= 1000000
Begin 
   insert into MyPriceTable values (right('000000000000' + CAST(@Id as nvarchar(10)),13),'Grp ' + CAST(@Id%10 as nvarchar(10)), @id%3, RAND()*(25-10)+10, 1, 'there are many other columns')
   Print @Id
   Set @Id = @Id + 1
End

-- Drop table   price_history 
create table price_history (SKU varchar(13), PriceGroup varchar(5), PriceLevel int, Price float, Qty float, ManyOtherColumns Varchar(100), historyDate datetime, ChangedColumns varchar(Max))
CREATE NONCLUSTERED INDEX price_history_nc1 ON price_history
(
    HistoryDate ASC,
    SKU ASC,
    PriceGroup ASC,
    PriceLevel ASC
)

go
Create TRIGGER PriceHist_trig_U ON MyPriceTable FOR UPDATE 
AS 
INSERT INTO price_history (SKU, PriceGroup, PriceLevel, price, Qty, ManyOtherColumns, HistoryDate, ChangedColumns) 
            SELECT INS.SKU,INS.PriceGroup,INS.PriceLevel,INS.Price,INS.Qty,INS.ManyOtherColumns, getdate(),  
CASE WHEN update(Price) and INS.Price<>DEL.Price THEN 'Price-' ELSE '' END +
CASE WHEN update(Qty) and INS.Qty<>DEL.Qty THEN 'Qty-' ELSE '' END +
CASE WHEN update(ManyOtherColumns) and INS.ManyOtherColumns<>DEL.ManyOtherColumns THEN 'other-' ELSE '' END 
FROM INSERTED INS 
JOIN DELETED DEL ON DEL.sku=INS.sku AND DEL.PriceGroup=INS.PriceGroup AND DEL.PriceLevel=INS.PriceLevel 
WHERE  (update(Price) and INS.Price<>DEL.Price) 
    OR (update(Qty) and INS.Qty<>DEL.Qty) 
    OR (update(ManyOtherColumns) and INS.ManyOtherColumns<>DEL.ManyOtherColumns)

/* tests */ 
update MyPriceTable set price = price-1

When I run this with the trigger disabled, it runs in 2 seconds.
When the Trigger is enabled, it took 32 seconds to complete.
The Execution Plan shows 98% on the "Table Insert"

I've been trying to figure out how to improve the table insert, but can't find anything concrete…

I've tried with a Clustered index and the performance is worse.

Any help would be appreciated

Best Answer

I'm putting this here since it's a bit long, but I don't think it should qualify as an answer. There are no answers here, just observations and advice.

Short version, there isn't anything that can be done to make the query go faster and achieve the same results. You need to change the process that is feeding data into the main table if you want batching to solve the issue. Otherwise, you have to change the history process.

First, it's not the table insert that's slowing you down, it's the query that pulls the INSERTED and DELETED tables together.

Why Not?

The INSERTED and DELETED tables are heaps with no indexes. Joining them together as you are doing requires two table scans and a sort. The larger the operation, the more expensive this gets.

Batching Inside the Trigger

This won't help you here because the source tables are heaps. You can't walk them without creating some sort of key to work with, and adding anything will just increase (at worst) or just increase complexity (best case) without improving anything.

Batching Outside the Trigger

If you can rearrange so that you are doing smaller update statements at a shot outside the trigger then the INSERTED/DELETED tables will be smaller, making the operation faster and less blocking, although total cost will be the same.

Solution(s)?

Any solution that addresses this will require a change in some fashion. You don't mention your version of SQL, but if you are on 2016 or better, you could look into temporal tables. https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver15

Alternatively, for this type of history table, where you only want to catch the UPDATES, I would do a straight insert of the DELETED table contents. No additional comparisons or joins with the INSERTED table. Your cost should be roughly the same as the insert, so minimal increase (I mean, double the I/O, but that's as minimal as you can get).

Then for looking at it, you just grab all the history records + the live record and you can see what changed and when. It won't have that "ChangedColumns" list that your current version has, but you could put something like that together if you want.

Good luck.