Sql-server – Optimizing syncing of large dataset between 2 servers with SSIS

sql serverssdtssis

I have two DBs on different servers, Server A and Server B. Both are SQL Servers. I need to keep second DB in sync with the first one. DB schema and tables are identical, but Server A periodically creates new records that are missing on Server B. I decided to use SQL Server Integration Services, which is included in SSDT tools.

The simplest way to sync Server B with Server A is by creating SSIS project in Visual Studio that will generate Package.dtsx.

Here is what the package does currently

  • create OLE DB source referencing Server A
  • create OLE DB destination referencing Server B
  • pick specific table that needs to be synced
  • create column mapping for this table between Server A and Server B
  • in SSMS, add SQL Server Job that will run Package.dtsx every 15 minutes

This package seems to download all the records from Server A and tries to insert them on Server B.

The issues

  • Some records may already exist on Server B. Is there a way to insert only records that are missing on Server B, to prevent a bunch of errors in the log about duplicated primary keys?
  • Server A may have a lot of records. Is there a way to get the difference between servers BEFORE synchronization starts, maybe there is some action that can compare records by IDs and download only the difference between servers?

Best Answer

You want to add a "lookup" operator in your SSIS pacakage. this package will "check" for some columns (your PK for example) and you will then be able to take action on only the rows that are missing.