Sql-server – SQL Server database synchronization

replicationsql serversql-server-2008-r2

Problem definition

Our users need the ability to query a database that is mostly up to date. The data can be stale up to 24 hours and that is acceptable. What would be the lowest cost approach to getting and keeping a second database up to date with a production copy? Is there an approach I'm not thinking of?

Workload

We have a third party application that we use to monitor stock trading activity. During the day, lots of little changes occur as part of various work flows (yes, this trade was valid. No, this is suspicious, etc). At night, we perform large set based operations (load the previous day's trades).

The current solution and problem

We make use of database snapshots. At 10 p.m. we drop and recreate the snapshot. The ETL processing then begins. This is obviously taxing on our disk but allows our users the ability to query the database without locking the database (they use an Access front end). They use it late into the night and early in the morning so they will notice downtime.

The problem with this approach is two-fold. The first is that in the event the nightly processing fails, and that's not terribly uncommon, we get to restore the database which results in the snapshot being dropped. The other problem is our processing times are slipping past our SLA. We are attempting to address this by working with the vendor after having identified poorly written queries and lack of indexing. The database snapshot is also a culprit in this slowdown as evidenced by the speed difference when it is present versus not—shocking, I know.

Approaches considered

Clustering

We had database clustering turned on but that didn't address the needs of making the data available and just generally complicated the admin's lives. It has since been turned off.

SQL Server Replication

We started looking at replication last week. Our theory is that we can get a second catalog stood up and synchronized with the production database. Prior to ETL beginning, we'll sever the connection and only re-enable it once the ETL process has completed.

The admin started with Snapshot Replication but he's concerned that it's taking multiple days of high CPU usage to generate the snapshot as well as the disk consumption required. He indicates that it appears to write all the data out to physical files prior to ever shipping to the subscriber so our .6TB database will cost 1.8TB in storage costs. Also, if it'll take multiple days to generate a snap, then it wouldn't fit in the desired SLA.

After reading the fine article, it seems like Snapshot might be the way to initialize the subscribers but then we'd want to switch to Transactional Replication to keep it in sync after that. I assume turning the transactional replication on/off won't force a full-reinitialization? Otherwise, we'll blow our time window

Database Mirroring

Our database is in FULL recovery mode so database mirroring is an option but I know even less about it than Replication. I did find the SO answer that indicated "Database mirroring prevents data to be accessed directly, mirrored data are only accessible through a database snapshot."

Log Shipping

It sounds like log shipping might also be an option but this is another of those things I know nothing about. Would it be an a lower cost solution (implementation and maintenance) than anything else? Based on Remus's comment "Log shipping allows read-only access to the replica copy, but will disconnect all users when applying the next backup log received (eg. every 15-30 minutes)." I'm not sure how long that downtime would translate into so that might cause the users some angst.

MS Sync

I only heard about using Sync this past weekend and have not yet investigated it. I'd hate to introduce a new technology for something with high visibility like this problem has but if it's the best approach, so be it.

SSIS

We do plenty of SSIS here so generating a few hundred SSIS packages to keep the secondary synchronized is an option for us, albeit an ugly one. I am not a fan of doing this as that's a lot of maintenance overhead I'd rather my team not take on.

SAN "magic" snapshot

In the past, I've heard of our admin's using some SAN technology to make instant backups of entire disks. Perhaps there's some EMC magic that could be used to make uberquick copies of the mdf/ldf and we can then detach/attach the target database.

Backup and restore

I think we take full backups once a week, differentials nightly and tlog's every 15 minutes. If the users could live with the 3-4 hour outage for the full restore, I suppose this might be an approach.

Constraints

Windows 2008 R2, SQL Server 2008 R2 (Enterprise Edition), VMWare v5 Enterprise Edition, EMC SAN storage with drives mapped to vmdk files, commvault handling backups, and .6TB of data in the source catalog. This is a third-party application we host in-house. Modifying their structure is generally frowned upon. The users cannot go without querying the database and refuse to be constrained by proactively identifying the tables they monitor to do their work.

Our DBAs are purely contractors at the moment. The full-timers have set sail and we have not replaced them yet. The application admins are not well versed on SQL Server matters and we have a team of Storage/VM admins that could help/hinder this effort. Development teams are not currently involved but can be enlisted based on the approach. So a simpler to implement and maintain solution would be preferable.

Me, I'm on the development side of the hosue so I can only propose approaches and have not had to deal with the administration side of things. So with no time in the admin saddle, I'm hesitant to say one approach would be superior to another—it all looks great according to the papers. I'm fully willing to run any direction y'all suggest because as I see it, it's only going to make me more valuable as a DB professional. I have a wheelbarrow but no holocaust cloak available.

Related questions

Edits

To address @onpnt's questions

Data latency acceptance

The users currently view data that is up to 24 hours behind. The data is only current as of 2200

Amount of data change in a given minute, hour and day
Not sure how to quantify that. Business hours, maybe hundreds of changes per hour. Nightly processing, millions of rows per business day

Connectivity to the secondary

Internal network, separate virtual host and dedicated storage

Read requirements on the secondary instance

Windows group will have read access to the secondary, all tables

Up-time of the secondary instance

There is no strong definition of an up-time requirement. Users want it always available but are they willing to pay for that, probably not so much. Realistically, I'd say 23 hours out of the day would suffice.

Alterations to existing schema and all objects

Infrequent modifications, maybe once per quarter for table objects. Maybe once per month for code objects.

Security

No special security needs. The production permissions would match the copy's permissions. Although as I think about it, we could revoke the users read access to prod and only allow them to read the copy… Not a requirement though.

@darin strait

Reverting to the snapshot could be an option but I think there was some reason they didn't pursue it. I'll check with the admin

@cfradenburg

My assumption was that we'd only use one of these approaches but that is a good point that restores would break the "other" sync technologies. They are investigating doing using the EMC snapshot magic. As the admin described it, they would take a snapshot at 1900 and migrate the image over to the secondary's zone. That should complete by 2200 and then they'd perform a detach and reattach of the secondary database.

Wrap up

2012-10-29 We evaluated the EMC snapshot magic and some other replication options but the DBAs decided they could best figure out Mirroring. Upvoted the answers because they all helped out and gave me plenty of options as well as "homework" to investigate.

Best Answer

Modifying their structure is generally frowned upon

Replication is more than likely out and I'd throw Sync out before that. (from real life high transacitonal tests on Sync Framework)

If 3-4 hours is your data latency exceptance, log shipping will probably be your bests bet here on a read-only copy. But how much change is happening in the log? find that out be monitoring it to see how quickly and how much you need to push across.

If you can't go to Mirroring or upgrade to 2012 enterprise, that is out although that would be a sound strategy if you can go Enterprise if not on it.

SSIS isn't meant to just dump data over but it can do it. You're looking at far too much in the terms of lookup transformations though and the task would be expensive in time and resources. Although, like I said, it can do it.

Really, there will be a distinct narrowing of choices based on answering a few questions

  • Data latency acceptance
  • Amount of data change in a given minute,hour and day Connectivity to the secondary
  • Read requirements on the secondary instance
  • Up-time of the secondary instance
  • Alterations to existing schema and all objects
  • Security