Your problem begins and ends with statistics and estimates. I have reproduced your situation on my servers and found some interesting hints, and a workaround solution.
First things first, let's take a look at your execution plan:
When a view is used we can see that a filter is applied after the Remote Query is executed, while without the view there was no filter applied at all. The truth is that the filter was applied inside the Remote Query, at the remote server, before retrieving the data over the network.
Well, obviously applying the filter at the remote server and thus retrieving less data is a better option, and obviously that only happens when not using a view.
So... what is so intersting...?
Surprisingly, when I changed the filter from cognome = 'test'
to cognome = N'test'
(unicode representation of the string) the view used the same execution plan as the first query did.
I guess the reason is that somehow when using the view SQL Server estimated that there will be a small number of rows returning from the (remote) query, and that a local filtering will be cheaper, but when SQL Server had to implicit convert NVARCHAR
to VARCHAR
, statistics could no longer be used and the decision to filter locally was not taken.
I have looked for the statistics locally, but the view had no statistics, so my guess is that the view uses the remote statistics in a way that ad-hoc query does not, and than takes the wrong decision.
OK, so what solves the problem?
I stated earlier that there is a workaround (at least until someone comes up with a better solution), and no, I don't mean using unicode for your strings.
I wanted to give an answer first, I still have to find why, but when using an Inline Function
SQL Server behaves exactly the same as with the query (without view), so replacing the view with the function will give the same result, in a simple query, and with good peformance (at least in my environment).
My code suggestion for you is:
CREATE FUNCTION fn_anagrafiche2()
RETURNS table
AS
RETURN
(
SELECT *
FROM dolph2.agendasdn.dbo.vistaanagraficagrp
UNION
SELECT *
FROM dolph2.acampanet.dbo.vistaanagraficagrp
UNION
SELECT *
FROM municipio2.dbnet.dbo.vistaanagraficagrp
)
GO
The query will then be:
SELECT *
FROM fn_anagrafiche2()
WHERE cognome = 'prova'
This works on my servers, but of course test it first.
Note: I do not recommend using SELECT *
at all, as it is prone to future errors, I simply used it because it was in your question and there was no need for me to change that when I can add this remark instead :)
The first thing that I noticed is that the query plan compilation time was over 3 seconds for each query. Wow, this is a really complex query!
Because the solution space of potential execution plans is so large (it grows exponentially with the number of the number of objects involved in the query), SQL Server is only going to be able to explore a tiny fraction of the potential query plans when coming up with a plan for these queries. Remember that SQL Server's job isn't to create the best query plan possible, but instead to create a query plan that is good enough and to do so as quickly as possible.
I have often found that small changes in the way a query is formulated, even if they don't impact the logic of the query, can have a significant impact on the query plan. Anecdotally, this grows more and more likely as the query grows more and more complex. One possible reason that this could happen is that a tweak to the query might cause SQL Server to begin cost-based optimization with a different initial plan. As cost-based optimization proceeds, this different starting point could yield a different exploration of the space of potential query plans--kind of like a different random seed impacts random number generation. Note that the query plans you provided are significantly different (compare images of the plan shape below!) and SQL Server actually does estimate that the @table variable plan is slightly cheaper.
In terms of why the table variable vs. temp table would have such an impact on cost-based optimization, I'll hazard an only-partially-educated guess: inserting into a table variable forces a serial plan (see the NonParallelPlanReason
of CouldNotGenerateValidParallelPlan
that appears in the table variable plan, but not the temp table plan), and this may impact the code path that the query optimizer takes either generating an initial plan or in some phase of plan optimization.
If possible, the next step that I would try is to simplify the query so that fewer tables are used and/or the query is split into multiple queries (with intermediate #temp tables) so that each query is simpler and has better statistics available. If that's not possible, you could also try more hacky options such as using query hints (e.g., force MAXDOP 1 on the temp table query, and see if the plan comes out more like the table variable query).
Query plan with #temp table:
Query plan with @table variable:
And finally, if you are interested in going a little bit deeper into how the query optimizer works, I have found Paul White's blog to be a great resource!
Best Answer
Based on the plan you posted: no. Grabbing the info from the tables is 98% of your query time and actually concatenating them together is less than 1% so the UNION ALL is not actually taking up much of the query time.