Sql-server – RSync Database File Deltas in SQL Server 2008

backuprecoverysql serversql-server-2008

Alright, I feel like this is a terrible idea, but I need some help understanding why this is bad:

I'm still working on implementing a disaster recovery/business continuity solution for our datacenter. We're running MSSQL 2008 Enterprise and we plan on running a passive instance in the cloud. Currently I'm suggesting Log Shipping, but I've been asked to explore using RSync (or something similar) to push MDF/LDF file deltas to the cloud instead of using the internal tools.

The goal would be to reduce our footprint and not run the cloud SQL instance most of the time due to licensing issues (our Passive/DR license is already in use in our current datacenter, but if that center goes down, the license becomes available).

I've found a solution that uses VSS to create and push deltas even if the files are locked, but I'm wondering what sorts of issues could show up. Can I get some insight? We would be pushing deltas every 15 minutes, the database in question is roughly 2GB with maybe 10 MB of T-Logs generated in a 15 minute window.

Best Answer

I would not trust rsync to copy the ldf and mdf files for user or system databases, nor would I trust anything I hacked together (VSS or otherwise) in a production environment. SQL Server is very fussy about when (and if) things get written to the ldf and the mdf files. Software (rsync) that isn't designed with that in mind might not get it right, especially if it doesn't understand that the ldf files and mdf files need to be treated as an interrelated system of files. If that software doesn't get things right, nothing might be noticed until you failover, try to go live and have your databases flagged as suspect due to what SQL Server sees as data corruption. Even worse, "getting it right" might be dependant on how much load is on the system and you might not find the problem in a lightly-loaded test environment.

I have seen enough examples of people who thought that they were replicating their files but were not. They were left with corrupt files on the recovery site and with inaccessible backup files at the primary site. So, that meant no database for them.

In short, you are making an appointment for trouble.

If you had some sort of block-level replication technology like EMC's SRDF or were looking at shared-nothing clusters, that might be different. Those technologies interface with SQL Server and the clustering services that Windows provides in a way that your writes will be safe and your files will/should be consistent.

If my only disaster recovery option was a remote site that was normally down, I'd use log shipping and make sure that I had all of the pieces to restore the database(s) on the remote site. If you can't make the built-in log shipping do that, writing your own isn't that hard. I've probably written three or four (simplistic) log shipping systems from scratch in the last 14 years.

At a minimum, the key things you need are: A full backup needs to be taken and copied over to the remote site. Your tlog backups need to be taken and copied over to the remote site. You need an automated method to restore that full backup and any relevant tlog backups. Ideally, this should be simple enough for someone else to do and/or simple enough for you to figure out at 3AM when your primary server fails and you are half asleep.

When you have an event, bringing up the other server will take longer because you would have so much manual stuff to do. That means that this isn't as good as simply implementing regular log shipping. You will need to test this periodically, as well.

(Of course, you need to worry about other things too, like jobs, packages, login and user synchronization, changing DSNs on the web servers during a failure, etc. If you have a large environment and a serious disaster, like the loss of your primary data center, you will be trying to do this when IIS guys, file server guys, network guys and whatever else guys are trying to bring their stuff up too.)

If it were me, I would be agitating for a warm standby server at the remote site. That would make (standard, out-of-the box) log shipping easier and database mirroring possible. It sounds like you have tried that.