Sql-server – Increase a counter for each changed row

sql serversql-server-2008

I'm using SQL Server 2008 Standard, which doesn't have a SEQUENCE feature.

An external system reads data from several dedicated tables of the main database.
External system keeps a copy of data and periodically checks for changes in the data and refreshes its copy.

To make the sync efficient I want to transfer only rows that were updated or inserted since the previous sync. (The rows are never deleted).
To know which rows were updated or inserted since the last sync there is a bigint column RowUpdateCounter in each table.

The idea is that whenever a row is inserted or updated, the number in its RowUpdateCounter column would change.
The values that go into the RowUpdateCounter column should be taken from an ever increasing sequence of numbers.
Values in the RowUpdateCounter column should be unique and each new value stored in a table should be greater than any previous value.

Please see the scripts that show the desired behaviour.

Schema

CREATE TABLE [dbo].[Test](
    [ID] [int] NOT NULL,
    [Value] [varchar](50) NOT NULL,
    [RowUpdateCounter] [bigint] NOT NULL,
CONSTRAINT [PK_Test] PRIMARY KEY CLUSTERED
(
    [ID] ASC
))
GO

CREATE UNIQUE NONCLUSTERED INDEX [IX_RowUpdateCounter] ON [dbo].[Test]
(
    [RowUpdateCounter] ASC
)
GO

INSERT some rows

INSERT INTO [dbo].[Test]
    ([ID]
    ,[Value]
    ,[RowUpdateCounter])
VALUES
(1, 'A', ???),
(2, 'B', ???),
(3, 'C', ???),
(4, 'D', ???);

Expected result

+----+-------+------------------+
| ID | Value | RowUpdateCounter |
+----+-------+------------------+
|  1 | A     |                1 |
|  2 | B     |                2 |
|  3 | C     |                3 |
|  4 | D     |                4 |
+----+-------+------------------+

The generated values in RowUpdateCounter can be different, say, 5, 3, 7, 9. They should be unique and they should be greater than 0, since we started from empty table.

INSERT and UPDATE some rows

DECLARE @NewValues TABLE (ID int NOT NULL, Value varchar(50));
INSERT INTO @NewValues (ID, Value) VALUES
(3, 'E'),
(4, 'F'),
(5, 'G'),
(6, 'H');

MERGE INTO dbo.Test WITH (HOLDLOCK) AS Dst
USING
(
    SELECT ID, Value
    FROM @NewValues
)
AS Src ON Dst.ID = Src.ID
WHEN MATCHED THEN
UPDATE SET
     Dst.Value            = Src.Value
    ,Dst.RowUpdateCounter = ???
WHEN NOT MATCHED BY TARGET THEN
INSERT
    (ID
    ,Value
    ,RowUpdateCounter)
VALUES
    (Src.ID
    ,Src.Value
    ,???)
;

Expected result

+----+-------+------------------+
| ID | Value | RowUpdateCounter |
+----+-------+------------------+
|  1 | A     |                1 |
|  2 | B     |                2 |
|  3 | E     |                5 |
|  4 | F     |                6 |
|  5 | G     |                7 |
|  6 | H     |                8 |
+----+-------+------------------+

RowUpdateCounter for rows with ID 1,2 should remain as is, because these rows were not changed.
RowUpdateCounter for rows with ID 3,4 should change, because they were updated.
RowUpdateCounter for rows with ID 5,6 should change, because they were inserted.
RowUpdateCounter for all changed rows should be greater than 4 (the last RowUpdateCounter from the sequence).

The order in which new values (5,6,7,8) are assigned to changed rows doesn't really matter.
The new values can have gaps, e.g. 15,26,47,58, but they should never decrease.

There are several tables with such counters in the database.
It doesn't matter if all of them use the single global sequence for their numbers, or each table has its own individual sequence.

I don't want to use a column with a datetime stamp instead of an integer counter, because:

The clock on the server can jump both forward and backward. Especially when it is on a virtual machine.
The values returned by system functions like SYSDATETIME are the same for all affected rows.
The sync process should be able to read changes in batches.
For example, if batch size is 3 rows, then after the MERGE step above the sync process would read only rows E,F,G.
When the sync process is run next time it would continue from row H.

The way I'm doing it now is rather ugly.

Since there is no SEQUENCE in SQL Server 2008, I emulate the SEQUENCE by a dedicated table with IDENTITY as shown in this answer. This in itself is pretty ugly and exacerbated by the fact that I need to generate not a single, but a batch of numbers at once.

Then, I have an INSTEAD OF UPDATE, INSERT trigger on each table with the RowUpdateCounter and generate required sets of numbers there.

In the INSERT, UPDATE and MERGE queries I set RowUpdateCounter to 0, which is replaced by the correct values in the trigger. The ??? in the queries
above are 0.

It works, but is there an easier solution?

Best Answer

You can use a ROWVERSION column for this.

The documentation states that

Each database has a counter that is incremented for each insert or update operation that is performed on a table that contains a rowversion column within the database.

The values are BINARY(8) and you should consider them as BINARY rather than BIGINT as after 0x7FFFFFFFFFFFFFFF it goes on to 0x80... and starts working up from -9223372036854775808if treated as a signed bigint.

A full worked example is below. Maintaining the index on the ROWVERSION column will be expensive if you have lots of updates so you might want to test your workload both with and without to see if it is worth the cost.

CREATE TABLE [dbo].[Test]
  (
     [ID]               [INT] NOT NULL CONSTRAINT [PK_Test] PRIMARY KEY,
     [Value]            [VARCHAR](50) NOT NULL,
     [RowUpdateCounter] [ROWVERSION] NOT NULL UNIQUE NONCLUSTERED
  )

INSERT INTO [dbo].[Test]
            ([ID],
             [Value])
VALUES     (1,'Foo'),
            (2,'Bar'),
            (3,'Baz');

DECLARE @RowVersion_LastSynch ROWVERSION = MIN_ACTIVE_ROWVERSION();

UPDATE [dbo].[Test]
SET    [Value] = 'X'
WHERE  [ID] = 2;

DECLARE @RowVersion_ThisSynch ROWVERSION = MIN_ACTIVE_ROWVERSION();

SELECT *
FROM   [dbo].[Test]
WHERE  [RowUpdateCounter] >= @RowVersion_LastSynch
       AND RowUpdateCounter < @RowVersion_ThisSynch;

/*TODO: Store @RowVersion_ThisSynch somewhere*/

DROP TABLE [dbo].[Test]

Related Solutions

Sql-server – Update Table A row if Table B row is changed

I don't think you need to use a trigger for this at all. With an index on B(UserID) INCLUDE(Sales) (or if there is already a clustered index leading with UserID), this query will get what you need, pretty efficiently, without having unnecessary maintenance happening all the time (even when no queries are running to calculate sums):

SELECT a.UserID, a.Name, SalesSum = SUM(b.Sales)
  FROM dbo.[Table A] AS a
  INNER JOIN dbo.[Table B] AS b
  ON a.UserID = b.UserID
  GROUP BY a.UserID, a.Name;

Look ma, no subquery!

This will generate a different plan, likely, but certainly not a more expensive one. I don't think you're going to get as much benefit maintaining this sum using triggers as you think you will. Make sure you test your entire workload, not just the query that gets the sum, if you implement triggers.

That said, here is what a trigger would look like.

CREATE TRIGGER dbo.MaintainRedundantSums
ON dbo.[Table B]
FOR INSERT, UPDATE, DELETE
AS
BEGIN
  SET NOCOUNT ON;

  UPDATE a 
    SET a.SalesSum = COALESCE(SUM(b.Sales), 0)
    FROM dbo.[Table A] AS a 
    LEFT OUTER JOIN dbo.[Table B] AS b
    ON a.UserID = b.UserID
    WHERE EXISTS (SELECT 1 FROM inserted WHERE UserID = a.UserID)
       OR EXISTS (SELECT 1 FROM deleted  WHERE UserID = a.UserID);
END
GO

Please test this trigger in a test environment; this is fairly off the cuff, partly because it's dinner time and partly because I genuinely don't want you to use a trigger for this. If you experience an actual performance problem when calculating the sums at runtime, let's talk about that, instead of premature optimization.

EDIT

Regarding your "slow" join, how does this perform in comparison?

;WITH d AS 
(
  SELECT OrderID, SalesAmount = SUM(Sales), SalesPriceSum = SUM(Sales*Price)
    FROM dbo.OrderDetails
    GROUP BY OrderID
)
SELECT
  u.UserID, 
  u.Name,  
  o.OrderID, 
  o.Comment,
  d.SalesAmount, 
  d.SalesPriceSum
FROM
  dbo.Users AS u
INNER JOIN 
  dbo.Orders AS o -- meaningful aliases please, not A, B, C etc.
  ON u.UserID = o.UserID
INNER JOIN
  d ON d.OrderID = o.OrderID
-- WHERE ...

-- no GROUP BY needed here
ORDER BY
  u.Name,
  o.Period;

You should look at the query plan and see if the cost is mostly in the sorting. I expect that you don't have an index to support ordering by u.Name first, for example. Also I would verify that you have indexes to support your WHERE clauses and JOINs, and I certainly hope that OrderDetails.OrderID is indexed appropriately. If you want help improving the performance of a query (or set of queries), post the queries and their actual (not estimated) execution plans. Jumping to the conclusion that a trigger must be the way to fix it is kind of like buying a new car when you have a flat tire. Fix the problem, don't try to outfit a solution for the whole system.

Sql-server – Merge Replication identity field issues

You could turn off automatic identity management and allocate your ranges manually. Another option could be forcing a range re-allocation by issuing sp_msrefresh_publisher_idrange and specifying the range boundaries.

However, both options would not guarantee that the lTID column is populated with ever-increasing values: you will always have at least one range for the publisher and one range for the subscriber and if users are inserting at both sites you can't have ever-increasing IDs.

This is by design and there's no way around it.

If your goal is keeping the identity values generated at the subscribers ever-increasing, that's a whole different story and it can be achieved with one of the above options.

Best Answer

Related Solutions

Sql-server – Update Table A row if Table B row is changed

Sql-server – Merge Replication identity field issues

Related Question