Sql-server – How does the Distributor know you have seeded/initialized your Subscriber using a backup

sql-server-2008-r2sql-server-2012transactional-replication

Im building my first transactional replication database for our Sharepoint team. The Publisher database is fairly large 330GB+ Because the Publisher database is in a different data center than the Distributor and Subscriber database I will be perfoming the initial seed using a database backup instead of using a snapshot.

I understand that the Publisher will know the last log sequence number (LSN) from when the backup was taken. The Subscriber will be populated from the back up and have it's own last LSN from the restore.

Im not sure how the process will work once I turn the synchronization on after everything has been initialized/restored.

  • Does the Publisher talk to the Subscriber through the Distributor to know how far out of sync they are? I thought the Publisher and Subscriber did not have any real direct communication?
  • How will the proper records be sent to the Distributor from the Publisher when the Distributor is unaware of the initialization from backup.

Extra Details:
Publisher: SQL Server 2008 R2
Distributor: SQL Server 2012
Subscriber: Sql Server 2012

Best Answer

The link posted as a comment to your question explains the technical steps but may not have made clear how the setup really works.

When you create the subscription you inform the publisher that the subscriber has been restored from a backup (specifying the backup file used). The backup you use has to be taken after the publication was created. When you setup replication you tell the distributor how long to keep transactions around (Transaction Retention Period). The distributor then keeps these commands until they are delivered to all subscribers, or until the retention period has been reached. So when the subscription is created, the LSN of the backup is checked to see if it was taken from within the retention period. If so, then the backup and therefore the subscriber is valid. Only the undistributed transactions created after the backup will need to be sent to the subscriber.

In your case you would:

  • setup the publication
  • take a backup
  • copy that to the subscriber before the retention period expires
  • restore the backup to the subscriber
  • create the subscription (using the initialize from backup option)
  • watch as the undistributed commands then get sent to the subscriber
  • finally see that the entire thing works!

Initializing from backup makes the entire thing much easier than snapshots, because you generally have backups running anyhow and, more often than not, can use compression. This does mean that the entire DB is on the subscriber though, which may be desired or may not.