SQL Server – Massive Joins vs Updating Temp Table

join;sql servertemporary-tables

I'm working on a query that essentially aggregates a slew of tables together.

The first method I tried was to inner join all of the tables into one nice big query that pretty much looked like this:

select
  a.col1
, a.col2
, a.col3
...
, b.col1
...
...
, l.col1
, l.col2
from myprimarytable a
join secondtable b on a.key = b.key
join third table c on a.key = c.key
...
join thelasttable l on a.key = b.key

This took forever to run. However, a second solution that looked like this:

create table #output (
  primkey char(40)
, mycol1 char(1) null
, mycol2 decimal(20,1) null
...
, mylastcol varchar(max) null
)

insert into #output
select
  a.col1
, a.col2
...
from myprimarytable a

update #output set
 mycol3 = b.col1
, mycol4 = b.col2
...
from #output a
join secondtable b on a.primkey = b.key
...

My question:

Why does the temp table with updates take so much less time ( about 30x faster in my particular case), than one with all of the joins in one go?

Best Answer

SQL Server has an upper bound on creating efficient query plans given a moderately complex query involving a lot of joins - there isn't a single upper bound or magic formula to determine when a query is complex enough to cause a problem; it is very case-by-case and involves baselining from some known expectation (people sometimes think a certain query shouldn't take 30 seconds, but sometimes that's really the best and most optimal outcome for that context). In other cases, SQL Server is simply overwhelmed and can't (at least when not given infinite time) quickly come up with an optimal plan. So it will spend some time coming up with a "best plan I can find in this time" and then it will show "Reason for early termination: timeout" and you get what you get. You can force SQL Server to spend more time coming up with a plan using trace flag 2301 (which Paul White mentions and links to a more thorough piece in this answer), but more time may not prove to help that much (and more compile time may not be a good thing for end user performance on the whole). Another alternative is to break up the logic so SQL Server can focus on optimizing smaller, less complex queries.

Think of it like eating a pie - you can probably eat a pretty good-sized piece in one sitting, but if your goal is to eat the whole pie, you might want to spread it out across multiple settings.

For SQL Server, in some cases it can do better with chunks of that logic separated. This doesn't mean you should simplify all joins into #temp table waterfalls, but in some extreme cases it is a valid workaround.