Sql-server – Best way to copy data to other location without blocking source

copysql server

I have a production database with some data I need to copy to other location, either another database on the same server or another server.

The DB size is huge and I don't need all the data on the DB, but for instance in 1 of the tables with ~300 million records I need just ~60 million out of it.

The issue is that might not be possible to restore the entire DB in the same/another server because of space restrictions, so we might be forced to only copy what is needed.

And the other restriction is that these tables are used by production machines, so they need to be available 100% of the time, so I can't block any write and read access to them.

So, what would be the best way to copy partial data from one DB to anoter?

  • To run a query that inserts N records until it is done? kind of INSERT INTO … SELECT TOP N … FROM … WITH(NOLOCK) WHERE …
  • Use the Import and Export data? Will this tool block the source tables when running?

Best Answer

This depends a lot on your goals / requirements for the data in the destination. Specifically you need to weigh the pros and cons between blocking and data correctness.

Avoid Blocking — READ COMMITTED with SELECT...INTO

For instance, the most naive solution under the default READ COMMITTED isolation level would solve your problem with very little blocking on the source server:

  • [Optional, if the database is on another server] Create a linked server on the destination (SQL Server 2008) instance that connects to the source (SQL Server 2016) instance
  • Run a SELECT...INTO statement to copy the desired 60mm rows
SELECT /*column list*/ 
INTO [dbo].[BigTable]
FROM [SourceDatabase].[dbo].[BigTable]
WHERE /*predicates to get the 60mm rows you need*/

Note: you'll need to specify the linked server name in the FROM clause if the linked server approach is used

Shared locks will be taken and released as pages are scanned in the clustered index (or heap, as the case may be), so there will be brief moments of blocking specific pages, but nothing that should be incredibly noticeable for your production system. Of course, there might be other points of resource contention (storage throughput, for instance).

The main problems with this approach is that it makes basically no guarantees about the correctness of the data that will arrive at the destination. There's a Great Post™ by Paul White about The Read Committed Isolation Level that you can check out for the details, but the main problem is that this doesn't guarantee a point-in-time view of the data.

In other words, DML statements (INSERTs, UPDATEs, and DELETEs) can successfully execute during the scan, which could result in rows being read more than once, rows being missed entirely, etc.

A potential problem for the destination database would be log file growth. If the size (not just number of rows) of the data being moved is very large, you may need to consider batching the inserts rather than doing them all in one go.

Favor Correctness - More Restrictive Isolation Levels

If you want the data that arrives in the destination to provide data that is consistent with a specific point in time, you can use the SNAPSHOT or SERIALIZABLE isolation levels during the entire insert process (the actual insert process could be the same as or similar to the one described above).

The SNAPSHOT isolation level is a viable option. It will cause an increase in tempdb usage as row versions are maintained to provide a consistent view of the database, so you need to make sure you system can handle that workload. It will also cause some types of blocking (more than READ COMMITTED, but less than the SERIALIZABLE approach). How much this will affect your production workload would be dependent on you to test.

The SERIALIZABLE isolation level will be the most disruptive to your production workload, as it will block all access to the table until the transaction is complete. This doesn't meet your requirements, so it's not really a viable option.

Correctness AND No Blocking - Replication / Backups

If you're able to restore a full database backup somewhere, that's the ideal solution here. You have a point-in-time view of the database, without contention, from which you could then copy the needed rows to the destination. Since you mentioned database size / space is an issue, I imagine this isn't a viable option.

Due to the special requirements for your situation, you might want to consider buying a third-party tool that supports "object level restores." This would allow you to restore just that needed to table to the destination directly from a backup. I think that Quest Litespeed supports this, although it's hard to tell behind the paywalls on their site.

Transactional Replication might also be a viable option if you need the data to be continuously up to date, although the version restrictions might prevent you from being able to leverage this option. Additionally, the maintenance and setup overhead might be impractical. I haven't used transaction replication in practice, so I'm just mentioning it as a possibility.

About NOLOCK

Don't use that.