My project is to audit 5 to 10 existing tables in our system without prolonging transactions. Whatever method is used, it has to work on SQL Server Express 2005 to (eventually) 2016.
I've done research on Change Data Capture (CDC) and Change Tracking. Change Tracking doesn't capture the specific changes, and CDC is only available in Enterprise Edition.
Then I stumbled upon Service Broker. I was intrigued by Service Broker, so I started creating a prototype. Service Broker is working fine, but answers received in two of my other posts lead me to believe that this might not be the right way to go. Too complicated for nothing. I'm still in the analysis stage and trying different things as part of my analysis.
Right now the results of the service broker are not convincing… a bulk update of 105000 items to a price table, takes 38 seconds while the processing of the Queue (the auditing part) takes 17 seconds… but the 38 seconds includes the double processing of inserting into 2 #temp tables that are then used to insert into the TMPins and TMPdel. So I figure that I can cut that in half… I'm now questioning the use of the service broker… logically the trigger would probably take the same amount of time just by inserting the info straight into the audit table…
To clarify, when I say Bulk insert, It Isn't the "Bulk Insert" Function. I'm talking about a big chunk of data that is inserted or updated at once. when updating 105000 items in the price table, I want to audit the changes that happened. When I say changes that happen, I decided to insert the new values in the audit table (if it's an insert or update) or insert the primary key with all other fields null (for records that have been deleted)… So yes! it can be after the data is loaded in but I don't want to lose any audit (I don't want a transactions to pass out of order)
The two other posts will help get the context of what I'm trying to do and what I've tried:
- Trigger to create a variable Table to be sent to service broker
- triggers – using the inserted / deleted tables in Dynamic SQL
I value every idea.
Best Answer
I am guessing that the following post is the basis of what you are currently using: Centralized Asynchronous Auditing with Service Broker.
While I really like Service Broker, I don't feel that it is the best fit to address this particular situation. The main concerns that I have with Service Broker, at least in this particular scenario, are:
INSERTED
andDELETED
tables in the DML trigger.My preference is for dumping the changes into a queue table, and then in a separate process that is scheduled to run every X minutes, read Y number of rows and process them.
Create a queue table to hold the audit data for a particular table (not for each individual DML operation). The table should have:
LOCK_ESCALATION
option set toDISABLE
(viaALTER TABLE {name} SET (LOCK_ESCALATION = DISABLE)
) to avoid conflict between the logging of new data by the trigger and the removal of data from the audit processing. This option was introduced in SQL Server 2008, so it cannot be use on 2005 instances, but no reason to not use it in the 2008 and newer instances since it doesn't alter the functionality in either case.AuditID
PK that is one of the following:INT IDENTITY
starting at -2147483648BIGINT IDENTITY
INT
with its value coming from aSEQUENCE
that is set toCYCLE
INSERTED
table since the values in theDELETED
table should already be in your prior audit data.Create a trigger that simply inserts into the queue table. If you need to pass in both "old" and "new" values, JOIN the
INSERTED
andDELETED
tables here rather than trying to maintain two separate queue tables, one for each of those pseudo tables. JOINing them at this point and inserting both the "old" and "new" values into a single row is a slight performance hit, but will guarantee that each operation stays together and in chronological order (via the incrementing PK).If you are not tracking changes on all fields, then use either UPDATE() or COLUMNS_UPDATED() to determine if the columns being audited have indeed been updated (for
UPDATE
operations; these functions return true for all columns inINSERT
operations). Please keep in mind that theUPDATE()
andCOLUMNS_UPDATED()
functions do not determine if a change has been made in the value of the column(s)!! They only report if the columns were present in theSET
clause of theUPDATE
statement. The only way to determine if the value(s) actually changed is to JOIN theINSERTED
andDELETED
tables. But if not tracking all columns, those functions are great for exiting the Trigger without doing any work if no tracked columns have changed. Meaning, at the beginning of the Trigger, you would do:If you are not capturing both "old" and "new" values into the queue table, then this is the only opportunity to eliminate "updated" records where no columns actually changed. When inserting into the queue table, you just filter on
WHERE IntColOld <> IntColNew OR StringFieldOld <> StringFieldNew COLLATE Latin1_General_BIN2 OR ISNULL(DateFieldOld, '1900-01-01') <> ISNULL(DateFieldNew, '1900-01-01') OR ...
. It is important to use a_BIN2
collation on string fields to ensure that the two fields are in fact identical. And you need to useINSULL
for just the nullable fields so that they can equate if both areNULL
.Create a Stored Procedure that will run for X seconds, and in that time will process sets of
TOP(@BatchSize)
rows, usingORDER BY AuditID ASC
. You do this by aWHILE
loop that checksGETDATE()
against@StartTime
which was set at the beginning of the stored procedure. Depending on what processing you need to do, it is sometimes easier to create a local temporary table to eitherINSERT INTO #TempTable SELECT...
the working set (then you will have toDELETE
those rows at the end of each loop) orDELETE FROM
usingOUTPUT INTO #TempTable
.Here you can remove duplicate modifications. If you are tracking both "old" and "new" values per each row, you should also eliminate rows where none of the columns actually changed. You can do that by testing for
WHERE IntColOld = IntColNew AND StringFieldOld = StringFieldNew COLLATE Latin1_General_BIN2 AND ...
. It is important to use a_BIN2
collation on string fields to ensure that the two fields are in fact identical. Non-binary collations, even if case-sensitive and accent-sensitive, etc, still allow for linguistic normalizations that should compare the same for regular comparisons, but not when auditing. Don't use_BIN
collations as they have been deprecated since SQL Server 2005, which is when the_BIN2
collations came out.Everything done inside of the loop needs to be within an explicit
BEGIN TRAN
/COMMIT
/ROLLBACK
. Use a TRY / CATCH block to manage that. This will prevent either losing some records or processing some twice.Schedule the stored procedure to run every X minutes. Typically this is done via a SQL Server Agent job, but SQL Server Agent is not available for the Express editions. In that case, you can either use the Windows Task Scheduler or get something like Quartz.NET.
Setting up a process in this fashion allows you to have a more consistent, and tuneable, process that should steadily process X records every Y minutes. So it doesn't matter if you have 1 million single-row DML operations or a single 1 million row DML operation; this process will just keep chuggin' along, doin' what it does.