A couple of ideas/theories:
SELECT INTO... lets the RDBMS determine sort order based on order of your original table. If you insert into an existing table, there may be a sort needed to match a clustered or nonclustered index(es).
No Indexes - when you SELECT INTO...
the RDBMS knows for certain there are no pre-existing indexes to update.
No Contention - since the table you are inserting into does not exist, SQL Server doesn't need to worry about row-level locking or contention handling. Nothing else can reference the table you create since it doesn't exist.
All that being said, there are other ways to insert into a table very quickly.
Make sure your clustered index keys match when possible. This means there is no on-the-fly sorting
Disable all non-clustered indexes. Self-explanatory.
Set recovery mode to simple and trace flag 610 to ON
. Use the TABLOCK
hint on your target table and NOLOCK
hint on your source table.
For example, assume tablea and tableb have the same clustered index:
INSERT INTO TableB WITH (TABLOCK)
SELECT <Columns>
FROM TableA WITH (NOLOCK)
In my experience this is faster than using SELECT INTO...
and then creating the clustered index afterwards. Please note this can also work on a table that already has data in it which is a much more useful scenario.
EDIT:
Here's a fantastically detailed whitepaper from MS for data load performance in Sql Server 2008.
A few of these numbers do not seem to match the table on the technet page.
There are small differences in the sizes of the log records generated in your tests, but these are due to other internal logging behaviours, not whether minimal logging is occurring or not.
A good definition of minimal logging is provided by Sunil Agarwal of the Storage Engine team:
Individual rows are not logged and only the changes to page allocation structures are logged
Any test where you see individual row changes logged (e.g. LOP_INSERT_ROWS
) is not using minimal logging for the associated allocation unit. Some operations can be minimally logged with respect to one allocation unit (e.g. an index) and not minimally logged against another. Also, in some circumstances, inserts to existing pages may not be minimally-logged but changes to newly allocated page may be.
Most of the details can be found in a series of Storage Engine team blog posts:
One detail not explored there is that to be minimally-logged (in SQL Server 2008 or later) INSERT...SELECT
changes to b-tree structures must have the DMLRequestSort
query plan operator property set to true. This applies to the circumstances where the Data Loading Performance Guide shows 'Depends': the query plan must use wide (per-index) maintenance with DMLRequestSort=true
.
I wrote more about this in Minimal Logging with INSERT…SELECT and Fast Load Context.
Best Answer
Paul White has a really detailed post about a way to get minimal logging for
INSERT...SELECT
statements that target clustered tables with secondary indexes. Check out the full article here:Minimal Logging with INSERT…SELECT and Fast Load Context
The main thing you're probably missing is that you need to enable trace flag 610.
Read the whole thing to see make sure you get past all the conditions and edge cases - especially around estimated number of rows to be inserted, existing number of pages in indexes, and the estimated data size of rows being inserted.