Sql-server – Minimal logging for bulk inserts

sql serversql-server-2008

I have many scripts running that truncate entire tables and rewrite data daily, however this fills my transaction logs very quickly. All these scripts use INSERT INTO .. SELECT. So recently I've changed my database to the Bulk-Logged recovery model, and I added the with (TABLOCK) to all my insert statements. This doesn't seem to be doing anything, because according to https://www.mssqltips.com/sqlservertip/1185/minimally-logging-bulk-load-inserts-into-sql-server/ if the table has a clustered or non-clustered index, it still gets fully logged. Should I then drop all the indexes before inserting, and recreate them after the table is populated? Or is there a better method to do this?

Best Answer

Paul White has a really detailed post about a way to get minimal logging for INSERT...SELECT statements that target clustered tables with secondary indexes. Check out the full article here:

Minimal Logging with INSERT…SELECT and Fast Load Context

The main thing you're probably missing is that you need to enable trace flag 610.

Read the whole thing to see make sure you get past all the conditions and edge cases - especially around estimated number of rows to be inserted, existing number of pages in indexes, and the estimated data size of rows being inserted.

Related Solutions

Sql-server – Why “Select * into targettable from sourcetable “ is faster than “insert into targettable select * from sourcetable

A couple of ideas/theories:

SELECT INTO... lets the RDBMS determine sort order based on order of your original table. If you insert into an existing table, there may be a sort needed to match a clustered or nonclustered index(es).

No Indexes - when you SELECT INTO... the RDBMS knows for certain there are no pre-existing indexes to update.

No Contention - since the table you are inserting into does not exist, SQL Server doesn't need to worry about row-level locking or contention handling. Nothing else can reference the table you create since it doesn't exist.

All that being said, there are other ways to insert into a table very quickly.

Make sure your clustered index keys match when possible. This means there is no on-the-fly sorting
Disable all non-clustered indexes. Self-explanatory.
Set recovery mode to simple and trace flag 610 to ON. Use the TABLOCK hint on your target table and NOLOCK hint on your source table.

For example, assume tablea and tableb have the same clustered index:

INSERT INTO TableB WITH (TABLOCK)
SELECT <Columns>
FROM TableA WITH (NOLOCK)

In my experience this is faster than using SELECT INTO... and then creating the clustered index afterwards. Please note this can also work on a table that already has data in it which is a much more useful scenario.

EDIT:

Here's a fantastically detailed whitepaper from MS for data load performance in Sql Server 2008.

Sql-server – Minimal Logging Conditions in SQL

A few of these numbers do not seem to match the table on the technet page.

There are small differences in the sizes of the log records generated in your tests, but these are due to other internal logging behaviours, not whether minimal logging is occurring or not.

A good definition of minimal logging is provided by Sunil Agarwal of the Storage Engine team:

Individual rows are not logged and only the changes to page allocation structures are logged

Any test where you see individual row changes logged (e.g. LOP_INSERT_ROWS) is not using minimal logging for the associated allocation unit. Some operations can be minimally logged with respect to one allocation unit (e.g. an index) and not minimally logged against another. Also, in some circumstances, inserts to existing pages may not be minimally-logged but changes to newly allocated page may be.

Most of the details can be found in a series of Storage Engine team blog posts:

One detail not explored there is that to be minimally-logged (in SQL Server 2008 or later) INSERT...SELECT changes to b-tree structures must have the DMLRequestSort query plan operator property set to true. This applies to the circumstances where the Data Loading Performance Guide shows 'Depends': the query plan must use wide (per-index) maintenance with DMLRequestSort=true.

I wrote more about this in Minimal Logging with INSERT…SELECT and Fast Load Context.

Best Answer

Related Solutions

Sql-server – Why “Select * into targettable from sourcetable “ is faster than “insert into targettable select * from sourcetable

Sql-server – Minimal Logging Conditions in SQL

Related Question