Polling for new records in a table

change-trackingidentitysequence

We have a table that is append+read only. We'd like to have multiple applications consume all new rows that show up in that database, in a cheap and easy way (without change tracking).

The simple suggestion of having each app record the last ID doesn't seem safe. Suppose 3 connections (2 writers and 1 readers) do this:

W1: BEGIN TRAN, INSERT 1000 Records (IDs 1-1000)
W2: BEGIN TRAN, INSERT 1000 Records (IDs 1001-2000)
W2: COMMIT
R1: SELECT * WHERE Id > 0 (last sync value)
W1: COMMIT (or possibly ROLLBACK)

In this case, the reader would get records 1001-2000 and save 2000 as the highest ID. Obviously that's not correct. Assuming the various writers are running at varying speeds (so we can never assume they'll commit in order, due to latency or perf), what options are there?

Must we resort to something like change tracking, or require every application to save the last (batch*writers*x) IDs and re-read them to make sure they didn't miss any?

Rollback and other gaps in the sequence (MSSQL can jump the IDs by 1000; pretty sure other DBs do that under certain situations) make it complicated to try to guess if rows were skipped. Plus the batch inserts might not be a proper batch; they could consist of multiple INSERT statements, meaning each tx might not even have a contiguous block of IDs assigned.

We're currently using MSSQL 2012 if there's a hack specific to that system, but I'm more interested in general.

Best Answer

Some of the implementation specifics depend on exactly what type of behaviour you need from this table (e.g. FIFO), but a typical design for a queue table uses DELETE with an OUTPUT clause:

DELETE TOP (n)
FROM QueueTable WITH (ROWLOCK, READPAST)
OUTPUT deleted.RecordID;

Or if ordered removal is required:

DELETE FROM
(
    SELECT TOP (n) RecordID
    FROM QueueTable WITH (ROWLOCK, READPAST)
    ORDER BY RecordID
) AS DF
OUTPUT deleted.RecordID;

If the records should remain in the table after processing, the table would have an extra columns to record the status (e.g. New, Processing, Processed) and probably the process id owning the row (if Processing or Processed). The de-queuing code is then modified to choose only records with the desired status. An alternative design is to DELETE rows and move them to a separate audit/archive table at the same time. As I mentioned, the specific solution depends on the precise requirements.

Related Solutions

SQL Server SSMS – Why New Rows Insert at the Top of a Table

In the SQL world, order is not an inherent property of a set of data. Thus, you get no guarantees from your RDBMS that your data will come back in a certain order -- or even in a consistent order -- unless you query your data with an ORDER BY clause.

From Craig Freedman:

Combining TOP with ORDER BY adds determinism to the set of rows returned. Without the ORDER BY, the set of rows returned depends on the query plan and may even vary from execution to execution.

Always use ORDER BY if you expect a well-defined and consistent order in your result set. Never rely on how your database may store the rows on disk (e.g. via a clustered index) to guarantee a certain ordering of data in your queries.

Sql-server – the rationale for requiring a column list when inserting into a table with identity column

If you don't specify the column list then this implicitly assumes a column list including all user insert-able columns (non identity, rowversion, computed) as below.

CREATE TABLE #T
(
C INT IDENTITY,
D INT,
E INT
)

INSERT INTO #T
VALUES (2,3)

This is far more generally useful as the most common reason for having an IDENTITY column is to allow SQL Server to manage the values.

I suppose it could imply that if there was an extra column in the insert source on an identity table that it should generate an execution plan for the explicit identity_insert case but not much benefit to this IMO.

Best Answer

Related Solutions

SQL Server SSMS – Why New Rows Insert at the Top of a Table

Sql-server – the rationale for requiring a column list when inserting into a table with identity column

Related Question