SQL Server – How to Replicate Data Between Hosted and Local Databases

linked-serversql serversql-server-2012

I have a Cloud hosted Database, maintained by a different company. Here is my scenario:

  • Cloud Hosted Database SQL Server 2012 – Need to connect through a VPN.
  • We setup a local Server with with Dual NIC cards. This allows us to have the VPN connection, and the local network connection.
  • I would like to have a local copy of the cloud hosted database on our server.
  • I cannot setup replication inside SQL server, I do not have permissions on the cloud server.
  • I cannot do a back-up and restore of the database – I do not have permissions on the cloud server.
  • I cannot use triggers
  • I cannot use logs.
  • I cannot use back-ups

How can I bring down the entire database, schema and data, and keep my Local database updated real-time?

Edit: I was able to get a script run from the hosting company so I could create the database along with the schema and initial data.

The most important question is how can I keep the data synchronized real-time? Right now, i'm thinking of doing a series of Jobs in SQL to query the Linked Server database with something like this:

insert into local_table_1
select * from linkedserver_table1
except
select * from local_table_1

The problem is, it's not efficient, and it will not pull any of the updated records. We do have an "Updated_date" field. How can I update the local server, selecting all records from the Linked server, and update the local database?

Best Answer

There's no replication or out-of-the-box solution to your problem, particularly since you don't have that type of access to the remote machine. As other commenters here have noted, you're probably going to have to build your own ETL logic, perhaps in one of the following ways:

  • A linked server and stored procedures using OPENQUERY() or OPENROWSET(),
  • An SSIS package, or
  • A custom-built local application.

The "last updated" columns in the remote tables will help you considerably, in that you won't have to download the entire database's contents every time you run the ETL process. Without being able to install or modify anything on the remote server, there's no practical way to synchronize in "real time". The frequency and speed that you can synchronize the databases will probably come down to:

  • Indexing on the remote database (how efficiently you can retrieve new records)
  • Workload on the remote server (locks, I/O speed, available CPU)
  • Link speed (how fast your VPN is)

I would set a reasonable expectation of synchronizing the data a few times per day, for example early morning, lunch, late afternoon and/or evening. Make sure to plan those executions so they don't collide unneccessarily with running backup jobs (locally, and if you can, remote). Involve the remote party so you don't overload their servers.

T-SQL example

Supposing that you use stored procedures and linked servers, here's an idea on how to go about it. First off, if existing data never changes, i.e. rows are only added (say, like accounting transactions or something):

DECLARE @lastUpdated   datetime2(3),
        @sql           varchar(max);

--- What's the most recent row we've downloaded?
SELECT @lastUpdated=MAX(lastUpdated)
FROM dbo.tableName;

--- Build dynamic SQL
SET @sql='
    INSERT INTO dbo.tableName (x, y, z, lastUpdated)
    SELECT x, y, z, lastUpdated
    FROM OPENQUERY(linkedServerName, ''
        SELECT x, y, z, lastUpdated
        FROM databaseName.dbo.tableName
        WHERE lastUpdated>={ts '''''+CONVERT(varchar, @lastUpdated, 121)+'''''}
        '');
    ';

--- Execute dynamic SQL
EXECUTE sys.sp_executesql @sql;

... and repeat this process for each table. You may run into problems with foreign key constraints, so you may have to arrange the order of the tables carefully, but in the end you may be better off not having any foreign key constraints at all in the local database.

If your existing rows can change, the SQL script looks similar, except you download the data to a temp table first, assign a unique clustered primary key to the temp table, then MERGE ("upsert") the contents of the temp table to your local table. If you don't like or trust MERGE, you can obviously use INSERT and UPDATE the old fashion way. Note that you won't catch rows that have been deleted from the remote table this way.

Using SSIS

With SSIS (or if you decide to build your own local application), the work is the same, but you won't need linked servers, and instead of using sp_executesql, you'll dynamically build SQL statements and run them through an ADODB or similar database connection.

I've included the "build your own application" option for completeness; I wouldn't go down that road.

Indexing considerations

For the remote query to be as efficient as possible, you may want to investigate if you can get the remote DBA to set up an index on the "last updated" column of each table.

Backup option

In the end, I would probably consider Aaron Bertrand's suggestion and try to set up a solution where you download a backup of the remote database (perhaps a full backup, once per night), restore that backup locally and perform the sync operation locally on your server.

Real-time

Returning to the real-time issue - the driving parameter here has to be the business. If management needs up-to-the-second accuracy in your database, somebody is going to have to accept the considerable increase in work and costs, not to mention maintenance. In the end, you may find that a high-speed leased line may be a lot cheaper, and that you can archive the data with a giant download once a month or so instead.