SQL Server Log Space – Moving Data from Staging to Production Uses 200GB

sql serverstored-procedurestransaction-log

I have multiple staging tables containing data that needs to be moved to production tables. The process of moving this data is done within a transaction to make it possible to rollback everything in case the process fails.
The most log consuming operations are:

Deleting old data from production tables
Inserting new data into production tables

The query used to delete the old data is:

DELETE FROM PROD_TABLE WHERE CONDITION

A simple truncate is not possible since not all the data can be deleted.

The data to be deleted and the data to be moved is about 200 Million records.

Is there anyway to improve the process to reduce the log consomption.

Best Answer

You might like to look at partition switching - the staging table could be switched in to the production table, and the production table switched out to a history table. https://docs.microsoft.com/en-us/sql/t-sql/statements/alter-table-transact-sql?view=sql-server-ver15#c-switching-partitions-between-tables gives an example of a switch out. There are quite a few blogs about the technique, look up a few and see if they match what you are trying to do.

A system I used to work on had a monthly data load. The data was staged to a table, which was then switched in to the live table. The live table had a partition per month. It was very quick as I think it's a metadata operation. HTH

Related Solutions

Running out of Transaction Log space during Alter Table

You will want to load your data into a new table, doing this in small batches, then drop the existing table. I put together a quick example using the Sales.Customer table in AdventureWorks, something similar should work for you also.

First, create your new table, complete with the new datatype you want to use:

CREATE TABLE [Sales].[Currency_New](
    [CurrencyCode] [nchar](4) NOT NULL,
    [Name] [varchar](128) NOT NULL,
    [ModifiedDate] [datetime] NOT NULL,
 CONSTRAINT [PK_Currency_New_CurrencyCode] PRIMARY KEY CLUSTERED 
(
    [CurrencyCode] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
)
GO

Then, insert your records and define your batch. I am using 10 here, but you will likely want to use something larger, say 10,000 rows at a time. For 30MM rows I'd even suggest you go to 100k row batch size at a time, that's the limit I typically used with larger tables:

DECLARE @RowsInserted INT, @InsertVolume INT
SET @RowsInserted = 1
SET @InsertVolume = 10 --Set to # of rows

WHILE @RowsInserted > 0
BEGIN       

INSERT INTO [Sales].[Currency_New] ([CurrencyCode]
           ,[Name]
           ,[ModifiedDate])
SELECT TOP (@InsertVolume)
       SC.[CurrencyCode]
           ,SC.[Name]
           ,SC.[ModifiedDate]
FROM [Sales].[Currency] AS SC
LEFT JOIN [Sales].[Currency_New] AS SCN 
    ON SC.[CurrencyCode] = SCN.[CurrencyCode] 
WHERE SCN.[CurrencyCode] IS NULL

SET @RowsInserted = @@ROWCOUNT
END

I usually do a sanity check and verify the rowcounts are the same before cleaning up:

SELECT COUNT(*) FROM [Sales].[Currency] 
SELECT COUNT(*) FROM [Sales].[Currency_New]

Once you are confident you have migrated your data, you can drop the original table:

DROP TABLE [Sales].[Currency]

Last step, rename the new table, so that users don't have to change any code:

EXEC sp_rename '[Sales].[Currency_New]', '[Sales].[Currency]';
GO

I don't know how long this will take. I'd suggest you try doing this when you have a clear maintenance window and users aren't connected.

HTH

SQL Server 2005 – Deleting Rows with Limited Expansion Space

As things stand, the query:

DELETE TOP (4000)
FROM   [OLD_TABLE] WITH (TABLOCKX)
WHERE  [Date] < '2014-01-01 00:00:00'

...has to scan the whole heap table, testing each row it finds, until it eventually finds 4000 to delete. On the next iteration, the whole business starts again from square one. Assuming the scanning process is performed in the same (allocation unit) order each time, these scans will take longer and longer to find 4000 rows as time goes by.

Creating an index on the [Date] column will allow SQL Server to find the 4000 rows much more efficiently. It does add a small overhead to each delete (as the new nonclustered index needs to be maintained as well) but this is nothing compared to the effort that will be saved by not performing a scan each time. The index will also require a certain amount of space, but it should not be too large. You should create the following index before resuming your data removal process:

CREATE NONCLUSTERED INDEX <index_name>
ON dbo.OLD_TABLE ([Date]);

By the way, if the database has snapshot isolation or read committed snapshot isolation enabled (even if not actively used!), the TABLOCKX hint will not be enough to ensure that empty heap pages are deallocated. Your heap table may therefore contain many empty pages - a concern since you are so low on space.

The standard way to address this space management issue is to create a clustered index (or issue an ALTER TABLE REBUILD statement, but that requires SQL Server 2008). It seems that creating a clustered index might also not be an option for you, due to space constraints. There isn't an obvious way to resolve this right now, given the space issue.

One thing to keep an eye on while the delete is progressing is the space used by the transaction log. If the database is in FULL or BULK_LOGGED recovery, you will need to keep on top of backing up the log. If the database is using SIMPLE recovery, you may need to issue a manual CHECKPOINT from time to time to release transaction log space for reuse (otherwise the physical file might grow).

If you can get hold of some additional temporary storage, a better way might be to bulk export the data you want to keep to a safe location, drop the table, recreate it (preferably with a clustered index!) and reload the saved data. This is generally faster than the incremental-delete process, but it depends on your objectives and priorities.

Best Answer

Related Solutions

Running out of Transaction Log space during Alter Table

SQL Server 2005 – Deleting Rows with Limited Expansion Space

Related Question