Polling for new records in a table

change-trackingidentitysequence

We have a table that is append+read only. We'd like to have multiple applications consume all new rows that show up in that database, in a cheap and easy way (without change tracking).

The simple suggestion of having each app record the last ID doesn't seem safe. Suppose 3 connections (2 writers and 1 readers) do this:

W1: BEGIN TRAN, INSERT 1000 Records (IDs 1-1000)
W2: BEGIN TRAN, INSERT 1000 Records (IDs 1001-2000)
W2: COMMIT
R1: SELECT * WHERE Id > 0 (last sync value)
W1: COMMIT (or possibly ROLLBACK)

In this case, the reader would get records 1001-2000 and save 2000 as the highest ID. Obviously that's not correct. Assuming the various writers are running at varying speeds (so we can never assume they'll commit in order, due to latency or perf), what options are there?

Must we resort to something like change tracking, or require every application to save the last (batch*writers*x) IDs and re-read them to make sure they didn't miss any?

Rollback and other gaps in the sequence (MSSQL can jump the IDs by 1000; pretty sure other DBs do that under certain situations) make it complicated to try to guess if rows were skipped. Plus the batch inserts might not be a proper batch; they could consist of multiple INSERT statements, meaning each tx might not even have a contiguous block of IDs assigned.

We're currently using MSSQL 2012 if there's a hack specific to that system, but I'm more interested in general.

Best Answer

Some of the implementation specifics depend on exactly what type of behaviour you need from this table (e.g. FIFO), but a typical design for a queue table uses DELETE with an OUTPUT clause:

DELETE TOP (n)
FROM QueueTable WITH (ROWLOCK, READPAST)
OUTPUT deleted.RecordID;

Or if ordered removal is required:

DELETE FROM
(
    SELECT TOP (n) RecordID
    FROM QueueTable WITH (ROWLOCK, READPAST)
    ORDER BY RecordID
) AS DF
OUTPUT deleted.RecordID;

If the records should remain in the table after processing, the table would have an extra columns to record the status (e.g. New, Processing, Processed) and probably the process id owning the row (if Processing or Processed). The de-queuing code is then modified to choose only records with the desired status. An alternative design is to DELETE rows and move them to a separate audit/archive table at the same time. As I mentioned, the specific solution depends on the precise requirements.