Sql-server – Copy Millions of Records from one table to another with the same schema, in the same database

I am trying to find the best way to copy a large amount of rows ( ~70million) from one table to another with the same schema inside the same database.

Our system currently runs with a set of A/B tables that are referenced elsewhere via a View. Our clients have the option to send us data either as a Full file or an Incremental file. When full files are ingested, the "inactive" table is truncated and loaded straight with the new data. For incremental files, we first copy all the data from "active" into "inactive" and then perform the incremental update. In both cases, when data loading is done we switch the view to the appropriate table.

This has been working fine for many years; typically clients send us data files at night time and their data is processed when nobody is using the system. However, one of our larger clients now wants to send us incremental files throughout the day. For a smaller sized client, the A->B copy operation is fairly quick (~30 seconds for 600k rows), but this large client's copy is taking 30-60+ minutes depending on load.

The schema for both tables is as follows (replace TableB with TableA):

CREATE TABLE [dbo].[TableA](
    [ClientId] NCHAR(10) NOT NULL,
    [ItemId] INT NOT NULL,
    [Amount] DECIMAL(19,6) NOT NULL,
    [Loaded] DATETIME NOT NULL,
 CONSTRAINT [PK_TableA] PRIMARY KEY CLUSTERED ( [ItemId] ASC, [ClientId] ASC) ON [PRIMARY])

ALTER TABLE [dbo].[TableA]  WITH NOCHECK ADD  CONSTRAINT [FK_TableA_Items] FOREIGN KEY([ItemId])
REFERENCES [dbo].[Items] ([ItemId])

ALTER TABLE [dbo].[TableA] CHECK CONSTRAINT [FK_TableA_Items]

ALTER TABLE [dbo].[TableA]  WITH NOCHECK ADD  CONSTRAINT [FK_TableA_Clients] FOREIGN KEY([ClientId])
REFERENCES [dbo].[Clients] ([ClientId])

ALTER TABLE [dbo].[TableA] CHECK CONSTRAINT [FK_TableA_Clients]

CREATE NONCLUSTERED INDEX [IX_TableA_ClientId_ItemID_INC_Amount] ON [dbo].[TableA] ([ClientId] ASC, [ItemId] ASC )  INCLUDE ([Amount])

The insert/select statement we are currently using is a simple:

INSERT INTO [dbo].[TableB]  ( ClientId, [ItemId], [Amount], [Loaded] )
SELECT ClientId, ItemId, Amount, Loaded FROM [dbo].[TableA]

I've tried various methods for optimizing this copy; I've added TabLock to the target table, tried setting Trace Flag 610 and tried many different batch sizes, but nothing seems to work. One variation of the queries I've tried is:

DECLARE @BatchSize BIGINT = 50000;
WHILE 1 = 1
BEGIN
      INSERT INTO TableB WITH ( TABLOCK ) ( [ClientId], [ItemId], [Amount], [Loaded] )
      SELECT TOP ( @BatchSize ) ClientId, ItemId,Amount, Loaded
      FROM      TableA t1
      WHERE     NOT EXISTS ( SELECT 1 FROM TableB WHERE ClientId = t1.ClientId AND ItemId = t1.ItemId );

      IF @@ROWCOUNT < @BatchSize
         BREAK;
END;

I've tried dropping/recreating indexesbut this too was proving costly. The best time I've come up with is about 5 minutes, but this involves dropping the target table and performing a select into, then adding the indexes and check constraints. This was purely "educational"–I can't go this route due to schema-bound views and some other items preventing this approach (in addition to our normal view, we have a "B" view which points to the other table for use during this process only)

While I know that SSIS is probably the best approach here, it is out of the question for now–I'm tasked with speeding up the existing process, if possible. I do have control over what procedures are called, etc from C#.

With that said, do I have any other options? I can do pretty much whatever I want to the inactive table as nothing is using it except these procedures. One thing I was considering is to somehow export the data with BCP from C# and then re-ingest it. Aside from not being sure if its possible, I've no idea if it would be any better (or worse for that matter)

Best Answer

How about keeping all the incremental data inside the inactive table until the end of the day, then load it into active and truncate inactive? Your view could reference both tables using a UNION ALL, that way you aren't switching between the two tables.

You could use partition swapping with the same method. You'd partition both tables, load inactive until 11 PM at night, then swap inactive's loaded partition into active. Your view would still need to reference both tables. This way, you have no downtime for loading besides the time it takes to initially import the file, and incremental files should be loading into an empty table every day.

Best Answer

Related Solutions

SQL Server Hierarchical Order – Parent-Child Tree Hierarchical ORDER BY

Sql-server – Publishing database changes with SSDT that include altering data types of columns

Related Question