I am hearing different things from colleagues/research. What are good guidelines in performance for, Select Into vs Insert into when creating a temp table? I know difference is minimal for small tables.
Eg: Table has 20 columns, 50 million rows.
I've had DBAs state, Insert into is faster, since compiler/parser, does not need to find Column data types on the fly. Others stating Select into is faster.
We conducted performance testing, and seems select into is slightly faster.
What are good principles in figuring which is faster and why? I would think Microsoft would optimize to make insert into , just as fast, for careful programming.
Article states following.
SQL Server Performance of SELECT INTO vs INSERT INTO for temporary tables
The INSERT…INTO command will reuse data pages which are created in
cache for insert/update/delete operations. It will also truncate the
table when it is dropped. The SELECT…INTO command will create new
pages for table creation similar to regular tables and will physically
remove them when the temporary table is dropped.
Question is, why wouldn't Microsoft optimize to make insert into as fast as select into?
We have over 500 stored procedures to write for data warehouse, and require good guidelines for temp usage.
This article does not really focus on performance and reasons:
Person in article mentioned good point:
that's mostly because SQL Server knows that there is no contention for
the destination table. The performance for insert into #temp
with(tablock) select * from .. is roughly the same as the performance
for select * into #temp from
Best Answer
You cited two different articles that discuss two different things.
The first article compares
insert..select
withselect into
for temporary tables, and the second compares these two in general.In general
insert..select
is slower because it's a fully logged operation.select into
is minimally logged in thesimple
andbulk logged
recovery models.The last comment you cited is about
insert into with(tablock)
, thiswith(tablock)
can makeinsert into
minimally logged under some additional conditions: it should be a heap and have no indexes.You can find the complete guide here: The Data Loading Performance Guide
It can be summarized in this table:
Note the updates for SQL Server 2016 and later in SQL Server 2016, Minimal logging and Impact of the Batchsize in bulk load operations by Parik Savjani (a Senior Program Manager with the Microsoft SQL Server Tiger team). The updated table is:
Regarding the first article. It discusses the particular case for temporary tables.
The
tempdb
database is special because it's always in the simple recovery model, and because logging intempdb
is different. On every server restarttempdb
is recreated, this means no crash recovery is made fortempdb
, and this means that logging intempdb
does not need any "after" image of modified data, only the "before" image to be able to do a rollback if there is a need. This leads to the fact thatinsert into..select
is also minimally logged intempdb
even withouttablock
hint (in case of aheap
that was discussed in the first article).Conclusion "
select into
vsinsert..select
under logging aspect":In case of a
heap
,insert into..select
performs similarly toselect into
in case of temporary tables, and in general is slower whentablock
hint is not used.The second aspect is the possibility of parallel execution.
Select into
can be executed in parallel starting with SQL Server 2014, and parallelinsert...select
was first implemented in SQL Server 2016.I did not reproduce any performance difference between
select into
andinsert into ..select
for temporary tables on SQL Server 2012, all executed in serial.