Sql-server – What options exist for copying data between servers on a regular basis

bulkcopyetlreplicationsql-server-2008

My apologies if this has already been answered. I've searched SO and of course here on DBA, and am surprised to find no close matches. Specifically, I'm looking for a solution which survives schema differences.


I finally got a development server with which to shadow a production data warehouse server. I'll be testing schema and sproc changes here, eventually publishing such changes to the prod server. To do so, I'll need reasonably current data, but it need not be updated in real time. My plan is to copy end-state data from the main server to the dev server as the final step in overnight ETL. What's the best way to achieve this?

Parameters

All automatic copying will be from prod to dev. Any copying from dev to prod will be by hand, and generally will just be DDL.

Because the dev server's schema is expected to change independently of the production server, I don't want the copy process to fail when the target schema differs (it's OK if the specific tables in flux can't be kept in synch, of course). Otherwise I would just DROP DATABASE and restore last night's backup.

Keeping the schema, SPs, views, and UDFs in synch is not necessary, or even desirable. They should only change when I specifically change them.

Records can be changed retroactively in most tables, so incremental updates will probably not be practical.

The volume of data is 14 GB for the most critical tables, plus about 200 GB of less important data which can be updated weekly.

My goal is to finish the process within two hours. The servers are co-located, and should have high throughput. Copying (INSERT INTO..SELECT * FROM ProdServer..) a single table with 600 MB of data and 300 of indices took 7.5 minutes; not great. Of some concern, another table with 11 GB of data and 8 GB of indices did not finish in 130 minutes, when I cancelled it. I'll test this again without indices.

Taking the prod database offline is acceptable if it can be kept brief – no more than 30 minutes. The dev database can be offline for hours if necessary.

If I can temporarily exclude a specific table, keeping it static for a few days while testing, that would be valuable.

Options

1: Replication

I could create a daily snapshot after the end of ETL, and publish it to the dev server. I've not used replication before, but this seems to be the kind of scenario for which it is meant. Time for me to learn a new facet of the technology?

2: Bulk Copy and BULK INSERT

I could write a script which iterates through every table in the prod database and spits out the content to a uniquely-named file. On the dev side, I'd loop through these files and TRUNCATE/BULK INSERT into the target table, with a TRY..CATCH block in case the schema has changed. I'm not sure if this would perform acceptably, but it would be fairly simple to implement.

3: INSERT from Linked Server

I can run a TRUNCATE/INSERT..SELECT for each table, probably pulling from the dev side. This would be simple, and should be fast, especially if one drops and re-created the indices. To cope with schema changes, one could identify the intersection of the field lists for each pair of tables and only attempt to copy those fields; this would help if many fields were NULLable.


Are there other options? Is there an easier way that I'm overlooking? Are there any gotchas that you've hit when working on similar projects?

This question describes the export side, but the desired output was CSV, so the options are limited (and BCP works great, as answered).

This question talks about quickly copying a whole database, including schema. It specifically excludes replication.

This question discusses bi-directional synchronization, for which replication is suggested.

Best Answer

I'd consider backup/restore of a mix of full and differential backups from prod to dev. Then synch between this restored copy to your actual dev databases

SQL Server 2008 R2+ supports backup compression in Standard Edition (In SQL Server 2008 it was Enterprise Edition only) which makes this easier.

Reasons:

  • You test your backup integrity and restore capability
  • You can reset your dev database easily if some code monkey bollixes it
  • You can compare "before" and "after" DDL, performance etc
  • You can snapshot the restored database as needed to roll this back if you make changes

I've used this before and will use it again for these reasons

I suggested this in my answer to The smallest backup possible ... with SQL Server