Sql-server – SQL Server: Performance Insert Into vs Select Into

performanceperformance-tuningsql serversql-server-2016

I am hearing different things from colleagues/research. What are good guidelines in performance for, Select Into vs Insert into when creating a temp table? I know difference is minimal for small tables.

Eg: Table has 20 columns, 50 million rows.

I've had DBAs state, Insert into is faster, since compiler/parser, does not need to find Column data types on the fly. Others stating Select into is faster.
We conducted performance testing, and seems select into is slightly faster.

What are good principles in figuring which is faster and why? I would think Microsoft would optimize to make insert into , just as fast, for careful programming.

Article states following.

SQL Server Performance of SELECT INTO vs INSERT INTO for temporary tables

The INSERT…INTO command will reuse data pages which are created in
cache for insert/update/delete operations. It will also truncate the
table when it is dropped. The SELECT…INTO command will create new
pages for table creation similar to regular tables and will physically
remove them when the temporary table is dropped.

Question is, why wouldn't Microsoft optimize to make insert into as fast as select into?

We have over 500 stored procedures to write for data warehouse, and require good guidelines for temp usage.

This article does not really focus on performance and reasons:

INSERT INTO vs SELECT INTO

Person in article mentioned good point:

that's mostly because SQL Server knows that there is no contention for
the destination table. The performance for insert into #temp
with(tablock) select * from .. is roughly the same as the performance
for select * into #temp from

Best Answer

You cited two different articles that discuss two different things.

The first article compares insert..select with select into for temporary tables, and the second compares these two in general.

In general insert..select is slower because it's a fully logged operation. select into is minimally logged in the simpleand bulk logged recovery models.

The last comment you cited is about insert into with(tablock), this with(tablock) can make insert into minimally logged under some additional conditions: it should be a heap and have no indexes.

You can find the complete guide here: The Data Loading Performance Guide

It can be summarized in this table:

enter image description here

Note the updates for SQL Server 2016 and later in SQL Server 2016, Minimal logging and Impact of the Batchsize in bulk load operations by Parik Savjani (a Senior Program Manager with the Microsoft SQL Server Tiger team). The updated table is:

updated chart for SQL Server 2016 onward

Regarding the first article. It discusses the particular case for temporary tables.

The tempdb database is special because it's always in the simple recovery model, and because logging in tempdb is different. On every server restart tempdb is recreated, this means no crash recovery is made for tempdb, and this means that logging in tempdb does not need any "after" image of modified data, only the "before" image to be able to do a rollback if there is a need. This leads to the fact that insert into..select is also minimally logged in tempdb even without tablock hint (in case of a heap that was discussed in the first article).

Conclusion "select into vs insert..select under logging aspect":

In case of a heap, insert into..select performs similarly to select into in case of temporary tables, and in general is slower when tablock hint is not used.

The second aspect is the possibility of parallel execution.

Select into can be executed in parallel starting with SQL Server 2014, and parallel insert...select was first implemented in SQL Server 2016.

I did not reproduce any performance difference between select into and insert into ..select for temporary tables on SQL Server 2012, all executed in serial.