Linked servers can work very well as long as you have thought out the implications:
Security: a key consideration is that if you have linked servers, if one gets compromised they are all at significant risk. Even if you have different credentials for each user different servers (which would stop an attacker getting at other resources if the only attack vector was leaked/discovered/guessed credentials) the link can effectively bypass all that. The link will also bypass protections that are hiding the other databases from the public network, such as a circumstance where one or more of the servers are not supplying data to a public interface so would not normally be visible through your firewalls by any means. You might think "well, isn't that same risk an issue with replication?" to which the answer is yes, but replication is between individual application databases and the linked server route could possible compromise other databases on the same server(s) as the link is at server level not DB level (of course you may be able to mitigate this risk by careful control of user access rights, but you at least need to be aware of it in your planning). As a side note on security: if the servers are not on the same site make sure that you use some form of VPN to link them over, rather than making SQL Server available on a public interface.
Bandwidth: If all the servers are in the same DC with nice, fast, unmetered connectivity between themselves then you might not need to worry about this one, but be more careful with more distant connections especially if your users will be able to run ad-hoc queries of some variety. Compression at the VPN link level will help greatly here for most data sets, but be aware that this will be at the expense of greater latency which could exacerbate the efficiency issue (see below).
Efficiency: If you are simply pulling chunks of data down the line then this is not a massive issue (but consider locking: see my next point), but as soon as you do anything by way of joins and so forth there are limits to what the query planner can do to optimise your requests. If it needs to make many index seeks that will create very slow-running queries if the servers are not local to each other due to network latency (the same problem is definitely present for local servers too, but to a lesser extent of course), and it may instead use an index scan (trading off bandwidth use to gain latency benefits) eating bandwidth and if it is holding locks (to avoid dirty read issues and so forth) this will affect other parts of the application too.
Locking/Concurrency: Going off-server will increase the run-time of queries, which will exacerbate locking issues you may not yet know you have and thereby severely reducing your application's concurrency and scalability. You need to be very careful if using regular and/or long-running cross server queries that you keep an eye on the locking issue and give planner hints as appropriate.
As long as you have sufficient provisions in place to manage the security and performance issues, I would not see a problem with using linked servers, though there may be better/safer/more-reliable/easier-to-secure ways to achieve the same result.
Your problem begins and ends with statistics and estimates. I have reproduced your situation on my servers and found some interesting hints, and a workaround solution.
First things first, let's take a look at your execution plan:
When a view is used we can see that a filter is applied after the Remote Query is executed, while without the view there was no filter applied at all. The truth is that the filter was applied inside the Remote Query, at the remote server, before retrieving the data over the network.
Well, obviously applying the filter at the remote server and thus retrieving less data is a better option, and obviously that only happens when not using a view.
So... what is so intersting...?
Surprisingly, when I changed the filter from cognome = 'test'
to cognome = N'test'
(unicode representation of the string) the view used the same execution plan as the first query did.
I guess the reason is that somehow when using the view SQL Server estimated that there will be a small number of rows returning from the (remote) query, and that a local filtering will be cheaper, but when SQL Server had to implicit convert NVARCHAR
to VARCHAR
, statistics could no longer be used and the decision to filter locally was not taken.
I have looked for the statistics locally, but the view had no statistics, so my guess is that the view uses the remote statistics in a way that ad-hoc query does not, and than takes the wrong decision.
OK, so what solves the problem?
I stated earlier that there is a workaround (at least until someone comes up with a better solution), and no, I don't mean using unicode for your strings.
I wanted to give an answer first, I still have to find why, but when using an Inline Function
SQL Server behaves exactly the same as with the query (without view), so replacing the view with the function will give the same result, in a simple query, and with good peformance (at least in my environment).
My code suggestion for you is:
CREATE FUNCTION fn_anagrafiche2()
RETURNS table
AS
RETURN
(
SELECT *
FROM dolph2.agendasdn.dbo.vistaanagraficagrp
UNION
SELECT *
FROM dolph2.acampanet.dbo.vistaanagraficagrp
UNION
SELECT *
FROM municipio2.dbnet.dbo.vistaanagraficagrp
)
GO
The query will then be:
SELECT *
FROM fn_anagrafiche2()
WHERE cognome = 'prova'
This works on my servers, but of course test it first.
Note: I do not recommend using SELECT *
at all, as it is prone to future errors, I simply used it because it was in your question and there was no need for me to change that when I can add this remark instead :)
Best Answer
@cicik, if you are looping through the databases at your local server like you mentioned, assuming you already have the list of remote databases stored somewhere in a local table, you could loop through the databases (e.g. cursor) and execute,
SELECT * FROM [YourLinkedServer].[YourRemoteDatabase].[dbo].[view];
This should put the remote database context at the database you specified, and hence it works for FILEPROPERTY function and sys.database_files dmv that return values only for the current database.
Also, you might want to consider extracting the list of databases on the remote server and run the script on each execution at the remote server (client side) so you won't have to create the view on each database, and won't miss any new databases created without your knowledge.
UPDATE If you want to use script to dynamically extract the data from the linked server without adding the view on every database,
Make sure your login has the access to all the databases on the remote server.