Sql-server – Scheduled copy of large table from SQL Server 2000 to 2008

sql serversql-server-2000sql-server-2008

I have a table on SQL Server 2000 with 10 million records and 20 columns. The table is rebuilt once a day and is essentially a reporting table, not a transactional table. Of the 10 million records, some, but not all, will change on a daily basis.

I have the option to use SQL Server 2008, but only for this table at the time being. The reason for bringing this table to the SQL Server 2008 instance is to have this table leverage the Full Text Search capabilities 2008 offers.

Given this situation, what is the best way to go about dumping this table across servers on a routine, daily basis?

(I suppose I run the risk of this question being closed as a duplicate of How to move data from big table,
but the scenario I am describing seems sufficiently unique from that question.)

Best Answer

I would likely put a TRIGGER on the 2000 table and insert the captured changes into an audit table, then either immediately sync the 2008 database with just those changes through SSIS ( or even a linked server if that suits you ) or do an overnight sync, depending on what would have the less impact on business-hours performance and the network as a whole, truncating the audit table on successful transfers either way. This would ensure the changes are easy to acquire and small enough to pull across the network without generating calls from your infrastructure engineers. The audit table itself could even live in a different file group to reduce disk contention if necessary.

That said, there's probably half a million ways to go about doing this, and your specific, optimal solution is going to revolve largely around the details of how your environment operates today and even the specific structure of the table in question. For instance, if the "rebuild" you've mentioned is a severe example, involving either TRUNCATE or a complete DROP and CREATE, the simple TRIGGER suggestion I've made is not going to be the solution for you without a significantly extended amount of effort, especially compounded if your source table is using an IDENTITY column.

In that case, a Lookup task in a Data Flow may seem like a reasonable route at a glance, but this effectively results in the entire table being pulled across the network anyway for the comparison. Comparison logic at this point is just unnecessary overhead, you may as well stick the receiving catalog in SIMPLE recovery mode and stage all of the data you've received on the other side of the pipe. Doing an ALTER ... SWITCH after the transfer would at least then limit the lock on your reporting table to a single meta data operation, keeping it otherwise completely available to the reports being generated from it.