Sql-server – Is Service Broker the best choice for auditing data changes on SQL Server Express

auditsql serversql-server-expresstrigger

My project is to audit 5 to 10 existing tables in our system without prolonging transactions. Whatever method is used, it has to work on SQL Server Express 2005 to (eventually) 2016.

I've done research on Change Data Capture (CDC) and Change Tracking. Change Tracking doesn't capture the specific changes, and CDC is only available in Enterprise Edition.

Then I stumbled upon Service Broker. I was intrigued by Service Broker, so I started creating a prototype. Service Broker is working fine, but answers received in two of my other posts lead me to believe that this might not be the right way to go. Too complicated for nothing. I'm still in the analysis stage and trying different things as part of my analysis.

Right now the results of the service broker are not convincing… a bulk update of 105000 items to a price table, takes 38 seconds while the processing of the Queue (the auditing part) takes 17 seconds… but the 38 seconds includes the double processing of inserting into 2 #temp tables that are then used to insert into the TMPins and TMPdel. So I figure that I can cut that in half… I'm now questioning the use of the service broker… logically the trigger would probably take the same amount of time just by inserting the info straight into the audit table…

To clarify, when I say Bulk insert, It Isn't the "Bulk Insert" Function. I'm talking about a big chunk of data that is inserted or updated at once. when updating 105000 items in the price table, I want to audit the changes that happened. When I say changes that happen, I decided to insert the new values in the audit table (if it's an insert or update) or insert the primary key with all other fields null (for records that have been deleted)… So yes! it can be after the data is loaded in but I don't want to lose any audit (I don't want a transactions to pass out of order)

The two other posts will help get the context of what I'm trying to do and what I've tried:

I value every idea.

Best Answer

I am guessing that the following post is the basis of what you are currently using: Centralized Asynchronous Auditing with Service Broker.

While I really like Service Broker, I don't feel that it is the best fit to address this particular situation. The main concerns that I have with Service Broker, at least in this particular scenario, are:

It is probably over-complicated if used mainly to separate the DML event from the audit event. If not moving the data to another system, it doesn't seem to offer much benefit since you will still need to persist the INSERTED and DELETED tables in the DML trigger.
It is event-based, and events vary greatly in their size and frequency. You could have a million 1-record operations, or one or two 1 million record operations. And since each event is each individual DML operation, you don't have any opportunity to remove duplicate and/or negating entries across several DML operations.

My preference is for dumping the changes into a queue table, and then in a separate process that is scheduled to run every X minutes, read Y number of rows and process them.

Create a queue table to hold the audit data for a particular table (not for each individual DML operation). The table should have:
- the LOCK_ESCALATION option set to DISABLE (via ALTER TABLE {name} SET (LOCK_ESCALATION = DISABLE) ) to avoid conflict between the logging of new data by the trigger and the removal of data from the audit processing. This option was introduced in SQL Server 2008, so it cannot be use on 2005 instances, but no reason to not use it in the 2008 and newer instances since it doesn't alter the functionality in either case.
- an AuditID PK that is one of the following:
  - an INT IDENTITY starting at -2147483648
  - a BIGINT IDENTITY
  - an INT with its value coming from a SEQUENCE that is set to CYCLE
- only the fields that you are tracking changes for, if not all of them
- fields for both "old" and "new" if you need to keep track of the before and after values each time you process. Depending on how you are auditing changes, if you have an archive of prior values, then you should only need the "new" values from the INSERTED table since the values in the DELETED table should already be in your prior audit data.
Create a trigger that simply inserts into the queue table. If you need to pass in both "old" and "new" values, JOIN the INSERTED and DELETED tables here rather than trying to maintain two separate queue tables, one for each of those pseudo tables. JOINing them at this point and inserting both the "old" and "new" values into a single row is a slight performance hit, but will guarantee that each operation stays together and in chronological order (via the incrementing PK).

If you are not tracking changes on all fields, then use either UPDATE() or COLUMNS_UPDATED() to determine if the columns being audited have indeed been updated (for UPDATE operations; these functions return true for all columns in INSERT operations). Please keep in mind that the UPDATE() and COLUMNS_UPDATED() functions do not determine if a change has been made in the value of the column(s)!! They only report if the columns were present in the SET clause of the UPDATE statement. The only way to determine if the value(s) actually changed is to JOIN the INSERTED and DELETED tables. But if not tracking all columns, those functions are great for exiting the Trigger without doing any work if no tracked columns have changed. Meaning, at the beginning of the Trigger, you would do:
```
IF (NOT UPDATE(TrackedColumn1) AND NOT UPDATE(TrackedColumn2))
BEGIN
  RETURN; -- no changes so just exit
END;
```
If you are not capturing both "old" and "new" values into the queue table, then this is the only opportunity to eliminate "updated" records where no columns actually changed. When inserting into the queue table, you just filter on WHERE IntColOld <> IntColNew OR StringFieldOld <> StringFieldNew COLLATE Latin1_General_BIN2 OR ISNULL(DateFieldOld, '1900-01-01') <> ISNULL(DateFieldNew, '1900-01-01') OR .... It is important to use a _BIN2 collation on string fields to ensure that the two fields are in fact identical. And you need to use INSULL for just the nullable fields so that they can equate if both are NULL.
Create a Stored Procedure that will run for X seconds, and in that time will process sets of TOP(@BatchSize) rows, using ORDER BY AuditID ASC. You do this by a WHILE loop that checks GETDATE() against @StartTime which was set at the beginning of the stored procedure. Depending on what processing you need to do, it is sometimes easier to create a local temporary table to either INSERT INTO #TempTable SELECT... the working set (then you will have to DELETE those rows at the end of each loop) or DELETE FROM using OUTPUT INTO #TempTable.

Here you can remove duplicate modifications. If you are tracking both "old" and "new" values per each row, you should also eliminate rows where none of the columns actually changed. You can do that by testing for WHERE IntColOld = IntColNew AND StringFieldOld = StringFieldNew COLLATE Latin1_General_BIN2 AND .... It is important to use a _BIN2 collation on string fields to ensure that the two fields are in fact identical. Non-binary collations, even if case-sensitive and accent-sensitive, etc, still allow for linguistic normalizations that should compare the same for regular comparisons, but not when auditing. Don't use _BIN collations as they have been deprecated since SQL Server 2005, which is when the _BIN2 collations came out.

Everything done inside of the loop needs to be within an explicit BEGIN TRAN / COMMIT / ROLLBACK. Use a TRY / CATCH block to manage that. This will prevent either losing some records or processing some twice.
Schedule the stored procedure to run every X minutes. Typically this is done via a SQL Server Agent job, but SQL Server Agent is not available for the Express editions. In that case, you can either use the Windows Task Scheduler or get something like Quartz.NET.

Setting up a process in this fashion allows you to have a more consistent, and tuneable, process that should steadily process X records every Y minutes. So it doesn't matter if you have 1 million single-row DML operations or a single 1 million row DML operation; this process will just keep chuggin' along, doin' what it does.

Related Solutions

Mysql – Select To XML

It appears this has already been answered here.

The answer references this article.

From the answer:

From the article:

use strict;
use DBI;
use XML::Generator::DBI;
use XML::Handler::YAWriter;

my $dbh = DBI->connect ("DBI:mysql:test",
                       "testuser", "testpass",
                       { RaiseError => 1, PrintError => 0});
my $out = XML::Handler::YAWriter->new (AsFile => "-");
my $gen = XML::Generator::DBI->new (
                               Handler => $out,
                               dbh => $dbh
                           );
$gen->execute ("SELECT name, category FROM animal");
$dbh->disconnect ();

Sql-server – Passing info on who deleted record onto a Delete trigger

Is there a way to pass information onto the Delete trigger such that it could know who deleted the record?

Yes: by using a very cool (and under utilized feature) called CONTEXT_INFO. It is essentially session memory that exists in all scopes and is not bound by transactions. It can be used to pass info (any info--well, any that fits into the limited space) to triggers as well as back and forth between sub-proc / EXEC calls. And I have used it before for this exact same situation.

Context info is a VARBINARY(128)
Set via: SET CONTEXT_INFO
Get via: CONTEXT_INFO()

Test with the following to see how it works. Notice that I am converting to CHAR(128) before the CONVERT(VARBINARY(128), ... This is to force blank-padding to make it easier to convert back to VARCHAR when getting it out of CONTEXT_INFO() since VARBINARY(128) is right-padded with 0x00s.

SELECT CONTEXT_INFO();
-- Initially = NULL

DECLARE @EncodedUser VARBINARY(128);
SET @EncodedUser = CONVERT(VARBINARY(128),
                            CONVERT(CHAR(128), 'I deleted ALL your records! HA HA!')
                          );
SET CONTEXT_INFO @EncodedUser;

SELECT CONTEXT_INFO() AS [RawContextInfo],
       RTRIM(CONVERT(VARCHAR(128), CONTEXT_INFO())) AS [DecodedUser];

Results:

0x492064656C6574656420414C4C20796F7572207265636F7264732120484120484121202020202020...
I deleted ALL your records! HA HA!

PUTTING IT ALL TOGETHER:

The app should call a "Delete" stored procedure that passes in the UserName (or whatever) that is deleting the record. I assume this is already the model being used since it sounds like you are already tracking Insert and Update operations.

The "Delete" stored procedure does:

DECLARE @EncodedUser VARBINARY(128);
SET @EncodedUser = CONVERT(VARBINARY(128),
                            CONVERT(CHAR(128), @UserName)
                          );
SET CONTEXT_INFO @EncodedUser;

-- DELETE STUFF HERE

The audit trigger does:

-- Set the INT value in LEFT (currently 50) to the max size of [UserWhoMadeChanges]
INSERT INTO AuditTable (IdOfRecordedAffected, UserWhoMadeChanges) 
   SELECT del.ID, COALESCE(
                     LEFT(RTRIM(CONVERT(VARCHAR(128), CONTEXT_INFO())), 50),
                     '<unknown>')
   FROM DELETED del;

Please note that, as @SeanGallardy pointed out in a comment, due to other procedures and/or ad hoc queries deleting records from this table, it is possible that either:
- CONTEXT_INFO has not been set and is still NULL:
  
  For this reason I have updated the above INSERT INTO AuditTable to use a COALESCE to default the value. Or, if you don't want a default and require a name, then you could do something similar to:
```
DECLARE @UserName VARCHAR(50); -- set to the size of AuditTable.[UserWhoMadeChanges]
SET @UserName = LEFT(RTRIM(CONVERT(VARCHAR(128), CONTEXT_INFO())), 50);

IF (@UserName IS NULL)
BEGIN
   ROLLBACK TRAN; -- cancel the DELETE operation
   RAISERROR('Please set UserName via "SET CONTEXT_INFO.." and try again.', 16 ,1);
END;

-- use @UserName in the INSERT...SELECT
```
- CONTEXT_INFO has been set to a value that is not a valid UserName, and hence might exceed the size of the AuditTable.[UserWhoMadeChanges] field:
  
  For this reason I added a LEFT function to ensure that whatever is grabbed out of CONTEXT_INFO will not break the INSERT. As noted in the code, you just need to set the 50 to the actual size of the UserWhoMadeChanges field.

UPDATE FOR SQL SERVER 2016 AND NEWER

SQL Server 2016 added an improved version of this per-session memory: Session Context. The new Session Context is essentially a hash table of Key-Value pairs with the "Key" being of type sysname (i.e. NVARCHAR(128) ) and the "Value" being SQL_VARIANT. Meaning:

There is now a separation of values so less likely to conflict with other uses
You can store various types, no longer needing to worry about the odd behavior when getting the value back out via CONTEXT_INFO() (for details, please see my post: Why Doesn’t CONTEXT_INFO() Return the Exact Value Set by SET CONTEXT_INFO?)
You get a lot more space: 8000 bytes max per "Value", up to 256kb total across all keys (compared to the 128 bytes max of CONTEXT_INFO)

For details, please see the following documentation pages:

Best Answer

Related Solutions

Mysql – Select To XML

Sql-server – Passing info on who deleted record onto a Delete trigger

Related Question