Sql-server – How does the log reader process bulk updates

replicationsql server

Key Problem

The latency we are experiencing isn't applying the changes from the Distributor to the Subscribers (if we watch Replication Monitor it's latency is usually <1 second), it is the time that a transaction committed to the publisher database takes to get to the distributor.

Part of the issue is that we have several filters applied to one of our most volatile tables, and on every update/insert/delete LogReader.exe needs to check which publication to put that record in based on any filters applied.

From tests we've done it seems that each additional filter that is applied it can exponentially increase the time that the LogReader takes to process each transaction.

(edit 23/06/2011: added more detail about the filters)

Background

In our setup of Replication we have several publication with filters on a highly volatile table (average of 1.5million transactions in a 2hr period). During heavy periods this can result in the LogReader running at a latency of 20 seconds (usually < 1 second)

We have identified several areas of improvement (reducing number of filters, reducing number of updates, farming out the processing etc..), one potential area of improvement is changing how the updates are applied.

An example table (to aid explanation)

myTable
----------------
myID         int
myGroupID1   int
myGroupID2   int
Suspended    bit
FilterFlag1  int
FilterFlag2  int
FilterFlag3  int

For replication to illustrate this we would have 5 publications of [myTable]:

Publication  Filter
-----------  ------
NoFilter1    [all records]*  
NoFilter2    [all records]*  
Filter1      FilterFlag1 = 1
Filter2      FilterFlag2 = 1
Filter3      FilterFlag1 = 1 AND FilterFlag2 = 1  
  • Other tables are combined into these publications which is why the same table with identical filters (none) are in more than one publication.

Current update process

The majority of applications that update this table do so by looping through their collection of objects, applying the change to that single object then committing their change to the database before moving onto the next object.

From a DB trace perspective this means we get up to 120 update statements when the change occurs

Example

update [myTable] set Suspended = @Suspended 
where myID = @pID

Proposed update process

As the object collections are actually based around the group ID's, one potential improvement is to do a bulk update (rather than individual updates) and then refresh the object collection. Reducing the number of update statements to 1 or 2 (depending on the business scenarios).

Example

update [myTable] set Suspended = @Suspended 
where myGroupID1 = @groupID1 
      and myGroupID1 = @groupID2

Impact on LogReader?

From an app processing perspective doing a single update makes sense to me (fewer round trips between the app and the database = quicker), however I'm not sure how the LogReader will treat both scenarios, as it needs to process each record updated by the transaction..

Will the LogReader process those records faster or slower on a bulk update?

Best Answer

In the log records those are separate statements that are written to the log. The log reader would then batch those up and run them as a single transaction against the database.

If your database uses stored procedures, you can replicate the stored procedures, which would then allow the stored procedure call to be replicated instead of the actual data updates.