Sql-server – SQL Server Delta Records Pull & Push

availability-groupsoptimizationreplicationsql-server-2008-r2

In our SQL Server DB, we have about some 800+ tables and there are 40 – 50 tables are business critical tables. MIS team needs to generate reports based on those 50 business tables.

Day by day, records are getting huge in those business tables. Daily MIS team is pulling those millions of records directly from Production like select * from Table1 and pushing those records into their environment. Table1 may have 30 million records.

Those 50 tables gets updated frequently. MIS team requires those delta records (update/inserted/delete)

Our Senior DBA advice us to use the below 4th approach. He says replication is a failure model and it will leads I/O problem.

What would be the best solution?

We have few approaches here

  1. Always On
  2. Replication
  3. Mirroring
  4. Introducing new column (LastModifiedDate & creating index) in those 50 tables and pulling those records periodically and populating it to MIS environment.

There will be huge code change for the new column LastModifiedDate approach.

Based on those 50 tables, we have huge number of stored procedures which it has Insert/Update statements. In those stored procedures, we need to do code change for LastModifiedDate.

Since we have SQL Server 2008 R2 only, we cant got for Always On approach. If it is the only approach, then we can approach management to upgrade the 2008 to 2014/2016.

What would be the best solution from the above approaches?

Please let me know if any other approach to do.

Best Answer

"Best" is fairly subjective since it depends on your requirements.

Your listed approaches are:

  1. Always On - If you could implement a readable secondary, this might provide a good way for your MIS team to be able to see up-to-the minute data. Downside to this - you need to upgrade your SQL Server, which might be costly, and it will take a fair amount of technical knowledge to get right.

  2. Replication could be setup to replicate the desired tables to a second SQL Server instance that could then be used by the MIS team in a read-only fashion. Replication can be tricky to setup correctly, and can be difficult to maintain.

  3. Mirroring won't allow your MIS team to read data from the mirror database. The mirror database is kept in a perpetual state of "recovery" until (if) the primary becomes unavailable.

  4. Adding a column to the database. If you have code written using SELECT *, adding a column can be problematic since you'll need to revisit all that code so it can handle the new column - the takeaway from that exercise would be "never use SELECT *". Use a rowversion datatype for the new column since it will automatically increment every time any insert/update/delete occurs in the entire database. At the end of each "import" operation, the MIS team would record the current rowversion value, to limit the rows being imported during the next run to only those rows that have been modified.

Another possible solution would be to use triggers to insert modified rows into holding tables that become the source of the MIS import job.

A simple example of how that might work. We'll create this test in tempdb:

USE tempdb;
IF OBJECT_ID(N'dbo.MainTable_Holding', N'U') IS NOT NULL
DROP TABLE dbo.MainTable_Holding;
IF OBJECT_ID(N'dbo.MainTable', N'U') IS NOT NULL
DROP TABLE dbo.MainTable;

The main table:

CREATE TABLE dbo.MainTable
(
    MainTableID int NOT NULL
        CONSTRAINT PK_MainTable
        PRIMARY KEY
        CLUSTERED
        IDENTITY(1,1)
    , someval varchar(100) NOT NULL
);

The "holding" table, where rows will exist until they are processed by the MIS job:

CREATE TABLE dbo.MainTable_Holding
(
    op_id int NOT NULL IDENTITY(1,1)
    , MainTableID int NOT NULL
    , someval varchar(100) NOT NULL
    , op tinyint NOT NULL
        CONSTRAINT CK_MainTable_Holding_op 
        CHECK (op IN (1, 2, 3, 4))
);

CREATE CLUSTERED INDEX CX_MainTable_Holding
ON dbo.MainTable_Holding (op_id);
GO

The trigger that will capture modifications from the main table into the holding table:

CREATE TRIGGER MainTableHoldingTrigger
ON dbo.MainTable
AFTER INSERT, UPDATE, DELETE
AS
BEGIN
    DECLARE @op_ins tinyint;
    DECLARE @op_del tinyint;
    SET @op_ins = 1;
    SET @op_del = 2;
    IF EXISTS (SELECT 1 FROM inserted) AND EXISTS (SELECT 1 FROM deleted)
    BEGIN
        SET @op_ins = 3; --updated to
        SET @op_del = 4; --updated from
    END

    INSERT INTO dbo.MainTable_Holding (MainTableID, someval, op)
    SELECT d.MainTableID
        , d.someval
        , @op_del --delete
    FROM deleted d;
    INSERT INTO dbo.MainTable_Holding (MainTableID, someval, op)
    SELECT i.MainTableID
        , i.someval
        , @op_ins --insert
    FROM inserted i;
END
GO

A couple of test statements. First we insert 10 rows:

INSERT INTO dbo.MainTable (someval)
VALUES (REPLICATE(CHAR((CRYPT_GEN_RANDOM(1) % 26) + 65), 10));
GO 10 --insert 10 rows

Here, we delete the first 5 rows:

DELETE 
FROM dbo.MainTable
WHERE dbo.MainTable.MainTableID <= 5;

This updates the 6th row:

UPDATE dbo.MainTable
SET dbo.MainTable.someval = 'XXXXX'
WHERE dbo.MainTable.MainTableID = 6;

Here we can see the contents of the holding table:

SELECT mth.MainTableID
    , mth.someval
    , Operation = CASE mth.op 
        WHEN 1 THEN 'INSERT'
        WHEN 2 THEN 'DELETE'
        WHEN 3 THEN 'UPDATE TO'
        WHEN 4 THEN 'UPDATE FROM'
      END
FROM dbo.MainTable_Holding mth;

The results:

╔═════════════╦════════════╦═════════════╗
║ MainTableID ║  someval   ║  Operation  ║
╠═════════════╬════════════╬═════════════╣
║           1 ║ XXXXXXXXXX ║ INSERT      ║
║           2 ║ VVVVVVVVVV ║ INSERT      ║
║           3 ║ NNNNNNNNNN ║ INSERT      ║
║           4 ║ CCCCCCCCCC ║ INSERT      ║
║           5 ║ RRRRRRRRRR ║ INSERT      ║
║           6 ║ GGGGGGGGGG ║ INSERT      ║
║           7 ║ CCCCCCCCCC ║ INSERT      ║
║           8 ║ HHHHHHHHHH ║ INSERT      ║
║           9 ║ VVVVVVVVVV ║ INSERT      ║
║          10 ║ KKKKKKKKKK ║ INSERT      ║
║           5 ║ RRRRRRRRRR ║ DELETE      ║
║           4 ║ CCCCCCCCCC ║ DELETE      ║
║           3 ║ NNNNNNNNNN ║ DELETE      ║
║           2 ║ VVVVVVVVVV ║ DELETE      ║
║           1 ║ XXXXXXXXXX ║ DELETE      ║
║           6 ║ GGGGGGGGGG ║ UPDATE FROM ║
║           6 ║ XXXXX      ║ UPDATE TO   ║
╚═════════════╩════════════╩═════════════╝

The MIS job would simply read all rows from dbo.MainTable_Holding, performing the appropriate insert, update, or delete. Rows that have been inserted into the MIS database could be removed from the holding table, limiting the amount of space required by it.

Another possible solution would be to use SQL Server's queue mechanism, Service Broker. Service Broker could be setup to send messages into the MIS Server; each message would be an insert/update/delete operation from one of the affected source tables. Service Broker on the MIS Server would then pickup those messages, processing them asynchronously.