Sql-server – Slow Trigger performance when big batches

sql servert-sqltrigger

I have an update trigger that inserts into auditing tables. We had no problem until someone decides to update over 1 million records… (That's my bad. I didn't think it would be a problem when developing).
Now facing reality, I need to find a solution…

I've been doing many tests and researches to try to figure out how to solve my issue of having a trigger perform poorly…
I've come to the conclusion that to minimize the bad performance of the "Table Insert" in the execution plan, I need to insert in smaller batches.

The question is: Since I'm not sure of where all the different updates can come from, I'm trying to figure out how I can insert the auditing records in batches within the trigger?

example, The update of the main table for 1 million records would happen and call the trigger, which would insert 100 thousand records at a time in some type of loop.

Is this possible? If so, how do you suggest?
If not, how else can I improve the table insert of the execution plan?

Addition of test scripts to reproduce:

This is a simplified version of the real thing

-- drop trigger PriceHist_trig_U 
-- drop table MyPriceTable
-- drop table price_history
Create Table MyPriceTable (SKU varchar(13), PriceGroup varchar(5), PriceLevel int, Price float, Qty float, ManyOtherColumns Varchar(100)
CONSTRAINT [PRICE_TAB_P01] PRIMARY KEY CLUSTERED 
(
    SKU ASC,
    PriceGroup ASC,
    PriceLevel ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

Declare @Id int
Set @Id = 1

While @Id <= 1000000
Begin 
   insert into MyPriceTable values (right('000000000000' + CAST(@Id as nvarchar(10)),13),'Grp ' + CAST(@Id%10 as nvarchar(10)), @id%3, RAND()*(25-10)+10, 1, 'there are many other columns')
   Print @Id
   Set @Id = @Id + 1
End

-- Drop table   price_history 
create table price_history (SKU varchar(13), PriceGroup varchar(5), PriceLevel int, Price float, Qty float, ManyOtherColumns Varchar(100), historyDate datetime, ChangedColumns varchar(Max))
CREATE NONCLUSTERED INDEX price_history_nc1 ON price_history
(
    HistoryDate ASC,
    SKU ASC,
    PriceGroup ASC,
    PriceLevel ASC
)

go
Create TRIGGER PriceHist_trig_U ON MyPriceTable FOR UPDATE 
AS 
INSERT INTO price_history (SKU, PriceGroup, PriceLevel, price, Qty, ManyOtherColumns, HistoryDate, ChangedColumns) 
            SELECT INS.SKU,INS.PriceGroup,INS.PriceLevel,INS.Price,INS.Qty,INS.ManyOtherColumns, getdate(),  
CASE WHEN update(Price) and INS.Price<>DEL.Price THEN 'Price-' ELSE '' END +
CASE WHEN update(Qty) and INS.Qty<>DEL.Qty THEN 'Qty-' ELSE '' END +
CASE WHEN update(ManyOtherColumns) and INS.ManyOtherColumns<>DEL.ManyOtherColumns THEN 'other-' ELSE '' END 
FROM INSERTED INS 
JOIN DELETED DEL ON DEL.sku=INS.sku AND DEL.PriceGroup=INS.PriceGroup AND DEL.PriceLevel=INS.PriceLevel 
WHERE  (update(Price) and INS.Price<>DEL.Price) 
    OR (update(Qty) and INS.Qty<>DEL.Qty) 
    OR (update(ManyOtherColumns) and INS.ManyOtherColumns<>DEL.ManyOtherColumns)

/* tests */ 
update MyPriceTable set price = price-1

When I run this with the trigger disabled, it runs in 2 seconds.
When the Trigger is enabled, it took 32 seconds to complete.
The Execution Plan shows 98% on the "Table Insert"

I've been trying to figure out how to improve the table insert, but can't find anything concrete…

I've tried with a Clustered index and the performance is worse.

Any help would be appreciated

Best Answer

I'm putting this here since it's a bit long, but I don't think it should qualify as an answer. There are no answers here, just observations and advice.

Short version, there isn't anything that can be done to make the query go faster and achieve the same results. You need to change the process that is feeding data into the main table if you want batching to solve the issue. Otherwise, you have to change the history process.

First, it's not the table insert that's slowing you down, it's the query that pulls the INSERTED and DELETED tables together.

Why Not?

The INSERTED and DELETED tables are heaps with no indexes. Joining them together as you are doing requires two table scans and a sort. The larger the operation, the more expensive this gets.

Batching Inside the Trigger

This won't help you here because the source tables are heaps. You can't walk them without creating some sort of key to work with, and adding anything will just increase (at worst) or just increase complexity (best case) without improving anything.

Batching Outside the Trigger

If you can rearrange so that you are doing smaller update statements at a shot outside the trigger then the INSERTED/DELETED tables will be smaller, making the operation faster and less blocking, although total cost will be the same.

Solution(s)?

Any solution that addresses this will require a change in some fashion. You don't mention your version of SQL, but if you are on 2016 or better, you could look into temporal tables. https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver15

Alternatively, for this type of history table, where you only want to catch the UPDATES, I would do a straight insert of the DELETED table contents. No additional comparisons or joins with the INSERTED table. Your cost should be roughly the same as the insert, so minimal increase (I mean, double the I/O, but that's as minimal as you can get).

Then for looking at it, you just grab all the history records + the live record and you can see what changed and when. It won't have that "ChangedColumns" list that your current version has, but you could put something like that together if you want.

Good luck.

Related Solutions

Sql-server – Slow deletion of records when a trigger is enabled

The row-versioning framework introduced in SQL Server 2005 is used to support a number of features, including the new transaction isolation levels READ_COMMITTED_SNAPSHOT and SNAPSHOT. Even when neither of these isolation levels are enabled, row-versioning is still used for AFTER triggers (to facilitate generation of the inserted and deleted pseudo-tables), MARS, and (in a separate version store) online indexing.

As documented, the engine may add a 14-byte postfix to each row of a table that is versioned for any of these purposes. This behaviour is relatively well-known, as is the addition of the 14-byte data to every row of an index that is rebuilt online with a row-versioning isolation level enabled. Even where the isolation levels are not enabled, one extra byte is added to non-clustered indexes only when rebuilt ONLINE.

Where an AFTER trigger is present, and versioning would otherwise add 14 bytes per row, an optimization exists within the engine to avoid this, but where a ROW_OVERFLOW or LOB allocation cannot occur. In practice, this means the maximum possible size of a row must be less than 8060 bytes. In calculating maximum possible row sizes, the engine assumes for example that a VARCHAR(460) column could contain 460 characters.

The behaviour is easiest to see with an AFTER UPDATE trigger, though the same principle applies to AFTER DELETE. The following script creates a table with a maximum in-row length of 8060 bytes. The data fits on a single page, with 13 bytes of free space on that page. A no-op trigger exists, so the page is split and versioning information added:

USE Sandpit;
GO
CREATE TABLE dbo.Example
(
    ID          integer NOT NULL IDENTITY(1,1),
    Value       integer NOT NULL,
    Padding1    char(42) NULL,
    Padding2    varchar(8000) NULL,

    CONSTRAINT PK_Example_ID
    PRIMARY KEY CLUSTERED (ID)
);
GO
WITH
    N1 AS (SELECT 1 AS n UNION ALL SELECT 1),
    N2 AS (SELECT L.n FROM N1 AS L CROSS JOIN N1 AS R),
    N3 AS (SELECT L.n FROM N2 AS L CROSS JOIN N2 AS R),
    N4 AS (SELECT L.n FROM N3 AS L CROSS JOIN N3 AS R)
INSERT TOP (137) dbo.Example
    (Value)
SELECT
    ROW_NUMBER() OVER (ORDER BY (SELECT 0))
FROM N4;
GO
ALTER INDEX PK_Example_ID 
ON dbo.Example 
REBUILD WITH (FILLFACTOR = 100);
GO
SELECT
    ddips.index_type_desc,
    ddips.alloc_unit_type_desc,
    ddips.index_level,
    ddips.page_count,
    ddips.record_count,
    ddips.max_record_size_in_bytes
FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID(N'dbo.Example', N'U'), 1, 1, 'DETAILED') AS ddips
WHERE
    ddips.index_level = 0;
GO
CREATE TRIGGER ExampleTrigger
ON dbo.Example
AFTER DELETE, UPDATE
AS RETURN;
GO
UPDATE dbo.Example
SET Value = -Value
WHERE ID = 1;
GO
SELECT
    ddips.index_type_desc,
    ddips.alloc_unit_type_desc,
    ddips.index_level,
    ddips.page_count,
    ddips.record_count,
    ddips.max_record_size_in_bytes
FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID(N'dbo.Example', N'U'), 1, 1, 'DETAILED') AS ddips
WHERE
    ddips.index_level = 0;
GO
DROP TABLE dbo.Example;

The script produces the output shown below. The single-page table is split into two pages, and the maximum physical row length has increased from 57 to 71 bytes (= +14 bytes for the row-versioning information).

Update example

DBCC PAGE shows that the single updated row has Record Attributes = NULL_BITMAP VERSIONING_INFO Record Size = 71, whereas all other rows in the table have Record Attributes = NULL_BITMAP; record Size = 57.

The same script, with the UPDATE replaced by a single row DELETE produces the output shown:

DELETE dbo.Example
WHERE ID = 1;

Delete example

There is one fewer row in total (of course!), but the maximum physical row size has not increased. Row versioning information is only added to rows needed for the trigger pseudo-tables, and that row was ultimately deleted. The page split remains, however. This page-splitting activity is responsible for the slow performance observed when the trigger was present. If the definition of the Padding2 column is changed from varchar(8000) to varchar(7999), the page no longer splits.

Also see this blog post by SQL Server MVP Dmitri Korotkevitch, which also discusses the impact on fragmentation.

Sql-server – Will one nvarchar column affect query performance

Without knowing more details, it's impossible to tell, but you might be suffering from implicit conversion. There are a number of sites with information on the topic that you can use to help troubleshoot your particular query.

The short answer is that no, you should not need to convert all columns to nvarchar. If you are trying to join an nvarchar column to a varchar column, however, you could see a major performance impediment.

Best Answer

Related Solutions

Sql-server – Slow deletion of records when a trigger is enabled

Sql-server – Will one nvarchar column affect query performance

Related Question