The first thing that I noticed is that the query plan compilation time was over 3 seconds for each query. Wow, this is a really complex query!
Because the solution space of potential execution plans is so large (it grows exponentially with the number of the number of objects involved in the query), SQL Server is only going to be able to explore a tiny fraction of the potential query plans when coming up with a plan for these queries. Remember that SQL Server's job isn't to create the best query plan possible, but instead to create a query plan that is good enough and to do so as quickly as possible.
I have often found that small changes in the way a query is formulated, even if they don't impact the logic of the query, can have a significant impact on the query plan. Anecdotally, this grows more and more likely as the query grows more and more complex. One possible reason that this could happen is that a tweak to the query might cause SQL Server to begin cost-based optimization with a different initial plan. As cost-based optimization proceeds, this different starting point could yield a different exploration of the space of potential query plans--kind of like a different random seed impacts random number generation. Note that the query plans you provided are significantly different (compare images of the plan shape below!) and SQL Server actually does estimate that the @table variable plan is slightly cheaper.
In terms of why the table variable vs. temp table would have such an impact on cost-based optimization, I'll hazard an only-partially-educated guess: inserting into a table variable forces a serial plan (see the NonParallelPlanReason
of CouldNotGenerateValidParallelPlan
that appears in the table variable plan, but not the temp table plan), and this may impact the code path that the query optimizer takes either generating an initial plan or in some phase of plan optimization.
If possible, the next step that I would try is to simplify the query so that fewer tables are used and/or the query is split into multiple queries (with intermediate #temp tables) so that each query is simpler and has better statistics available. If that's not possible, you could also try more hacky options such as using query hints (e.g., force MAXDOP 1 on the temp table query, and see if the plan comes out more like the table variable query).
Query plan with #temp table:
Query plan with @table variable:
And finally, if you are interested in going a little bit deeper into how the query optimizer works, I have found Paul White's blog to be a great resource!
If you want to avoid a union, you can loop using a cursor.
So, first you define your cursor that will list all the tables you want to query (let's assume you have all your data tables listed somewhere (system tables or config)).
And then loop doing :
- create a dynamic SQL statement with the name of the table you retrieved
- execute this statement
So something like:
declare @sql varchar(max)
declare @data_table varchar(50)
declare data cursor for
select data_table -- name of the table you have to query
from somewhere -- where these table names are stored
open data
fetch next from data into @data_table
while (@@fetch_status = 0)
begin
-- result_table = name of the table you want the results to be stored
-- control_table = name of the control table you mentioned
set @sql = 'insert into result_table
select *
from '+@data_table+'
where AccountId in (select AccountId
from control_table
where Appear_In_View = 1)
and Most_Recent_Data = 1'
exec(@sql)
fetch next from data into @data_table
end
close data
deallocate data
EDIT : with new inputs from Ariel Hayoun
Try this kind of where
clause if (and only if) there is one line te retrieve:
set @sql = 'insert into result_table
select *
from '+@data_table+'
where AccountId in (select AccountId
from control_table
where Appear_In_View = 1)
and Retrieved_At = (select max(Retrieved At) from '+@data_table +')'
If you have multiple lines per tables (one line per account id for instance), you may be able to achieve what you want using row_number() function
Best Answer
This is not a complete answer (I'm sorry), but maybe a starting point for further thinking.
For a low number of tables UNION is the best approach IMHO. In special if the list of tables are not changing.
For a bigger number of tables you might can think of using some T-SQL and iterate through a list of tables and building up your query with dynamic SQL -- with this you can select your line(s) into a temporary table maybe.
As I'm not sure where you will get your list of tables from, you could get it from some cursor out of system tables or from some hard coded list maybe.