Sql-server – Is Service Broker the best choice for auditing data changes on SQL Server Express

auditsql serversql-server-expresstrigger

My project is to audit 5 to 10 existing tables in our system without prolonging transactions. Whatever method is used, it has to work on SQL Server Express 2005 to (eventually) 2016.

I've done research on Change Data Capture (CDC) and Change Tracking. Change Tracking doesn't capture the specific changes, and CDC is only available in Enterprise Edition.

Then I stumbled upon Service Broker. I was intrigued by Service Broker, so I started creating a prototype. Service Broker is working fine, but answers received in two of my other posts lead me to believe that this might not be the right way to go. Too complicated for nothing. I'm still in the analysis stage and trying different things as part of my analysis.

Right now the results of the service broker are not convincing… a bulk update of 105000 items to a price table, takes 38 seconds while the processing of the Queue (the auditing part) takes 17 seconds… but the 38 seconds includes the double processing of inserting into 2 #temp tables that are then used to insert into the TMPins and TMPdel. So I figure that I can cut that in half… I'm now questioning the use of the service broker… logically the trigger would probably take the same amount of time just by inserting the info straight into the audit table…

To clarify, when I say Bulk insert, It Isn't the "Bulk Insert" Function. I'm talking about a big chunk of data that is inserted or updated at once. when updating 105000 items in the price table, I want to audit the changes that happened. When I say changes that happen, I decided to insert the new values in the audit table (if it's an insert or update) or insert the primary key with all other fields null (for records that have been deleted)… So yes! it can be after the data is loaded in but I don't want to lose any audit (I don't want a transactions to pass out of order)

The two other posts will help get the context of what I'm trying to do and what I've tried:

I value every idea.

Best Answer

I am guessing that the following post is the basis of what you are currently using: Centralized Asynchronous Auditing with Service Broker.

While I really like Service Broker, I don't feel that it is the best fit to address this particular situation. The main concerns that I have with Service Broker, at least in this particular scenario, are:

  • It is probably over-complicated if used mainly to separate the DML event from the audit event. If not moving the data to another system, it doesn't seem to offer much benefit since you will still need to persist the INSERTED and DELETED tables in the DML trigger.
  • It is event-based, and events vary greatly in their size and frequency. You could have a million 1-record operations, or one or two 1 million record operations. And since each event is each individual DML operation, you don't have any opportunity to remove duplicate and/or negating entries across several DML operations.

My preference is for dumping the changes into a queue table, and then in a separate process that is scheduled to run every X minutes, read Y number of rows and process them.

  1. Create a queue table to hold the audit data for a particular table (not for each individual DML operation). The table should have:

    • the LOCK_ESCALATION option set to DISABLE (via ALTER TABLE {name} SET (LOCK_ESCALATION = DISABLE) ) to avoid conflict between the logging of new data by the trigger and the removal of data from the audit processing. This option was introduced in SQL Server 2008, so it cannot be use on 2005 instances, but no reason to not use it in the 2008 and newer instances since it doesn't alter the functionality in either case.
    • an AuditID PK that is one of the following:
      • an INT IDENTITY starting at -2147483648
      • a BIGINT IDENTITY
      • an INT with its value coming from a SEQUENCE that is set to CYCLE
    • only the fields that you are tracking changes for, if not all of them
    • fields for both "old" and "new" if you need to keep track of the before and after values each time you process. Depending on how you are auditing changes, if you have an archive of prior values, then you should only need the "new" values from the INSERTED table since the values in the DELETED table should already be in your prior audit data.
  2. Create a trigger that simply inserts into the queue table. If you need to pass in both "old" and "new" values, JOIN the INSERTED and DELETED tables here rather than trying to maintain two separate queue tables, one for each of those pseudo tables. JOINing them at this point and inserting both the "old" and "new" values into a single row is a slight performance hit, but will guarantee that each operation stays together and in chronological order (via the incrementing PK).

    If you are not tracking changes on all fields, then use either UPDATE() or COLUMNS_UPDATED() to determine if the columns being audited have indeed been updated (for UPDATE operations; these functions return true for all columns in INSERT operations). Please keep in mind that the UPDATE() and COLUMNS_UPDATED() functions do not determine if a change has been made in the value of the column(s)!! They only report if the columns were present in the SET clause of the UPDATE statement. The only way to determine if the value(s) actually changed is to JOIN the INSERTED and DELETED tables. But if not tracking all columns, those functions are great for exiting the Trigger without doing any work if no tracked columns have changed. Meaning, at the beginning of the Trigger, you would do:

    IF (NOT UPDATE(TrackedColumn1) AND NOT UPDATE(TrackedColumn2))
    BEGIN
      RETURN; -- no changes so just exit
    END;
    

    If you are not capturing both "old" and "new" values into the queue table, then this is the only opportunity to eliminate "updated" records where no columns actually changed. When inserting into the queue table, you just filter on WHERE IntColOld <> IntColNew OR StringFieldOld <> StringFieldNew COLLATE Latin1_General_BIN2 OR ISNULL(DateFieldOld, '1900-01-01') <> ISNULL(DateFieldNew, '1900-01-01') OR .... It is important to use a _BIN2 collation on string fields to ensure that the two fields are in fact identical. And you need to use INSULL for just the nullable fields so that they can equate if both are NULL.

  3. Create a Stored Procedure that will run for X seconds, and in that time will process sets of TOP(@BatchSize) rows, using ORDER BY AuditID ASC. You do this by a WHILE loop that checks GETDATE() against @StartTime which was set at the beginning of the stored procedure. Depending on what processing you need to do, it is sometimes easier to create a local temporary table to either INSERT INTO #TempTable SELECT... the working set (then you will have to DELETE those rows at the end of each loop) or DELETE FROM using OUTPUT INTO #TempTable.

    Here you can remove duplicate modifications. If you are tracking both "old" and "new" values per each row, you should also eliminate rows where none of the columns actually changed. You can do that by testing for WHERE IntColOld = IntColNew AND StringFieldOld = StringFieldNew COLLATE Latin1_General_BIN2 AND .... It is important to use a _BIN2 collation on string fields to ensure that the two fields are in fact identical. Non-binary collations, even if case-sensitive and accent-sensitive, etc, still allow for linguistic normalizations that should compare the same for regular comparisons, but not when auditing. Don't use _BIN collations as they have been deprecated since SQL Server 2005, which is when the _BIN2 collations came out.

    Everything done inside of the loop needs to be within an explicit BEGIN TRAN / COMMIT / ROLLBACK. Use a TRY / CATCH block to manage that. This will prevent either losing some records or processing some twice.

  4. Schedule the stored procedure to run every X minutes. Typically this is done via a SQL Server Agent job, but SQL Server Agent is not available for the Express editions. In that case, you can either use the Windows Task Scheduler or get something like Quartz.NET.

Setting up a process in this fashion allows you to have a more consistent, and tuneable, process that should steadily process X records every Y minutes. So it doesn't matter if you have 1 million single-row DML operations or a single 1 million row DML operation; this process will just keep chuggin' along, doin' what it does.