Sql-server – Log Shipping -Monitor Server Down – Copying happening but not restoring on secondary

log-shippingmonitoringsql-server-2008-r2

I have set up a few log shipping instances, but I am not a DBA.

This weekend something happened for the first time – the log shipping Monitor Server went down temporarily due to electrical issues outside of my control.

When I looked at the primary and secondary machines during this time- both were up and running
and
the .trn files were still being copied from the primary to the secondary – as expected.
Yet, the secondary instance did not restore the .trn files at all. I received a notification:

The log shipping primary database _________ has backup threshold of 60
minutes and has not performed a backup log operation for 82 minutes.
Check agent log and logshipping monitor information.

Is this expected? The primary and secondary were both running – shouldnt the monitor be an "optional" SQL instance which should have no effect on the log shipping? It was preventing the secondary instance from applying the backups even though I later explicitly ran the restore job.
When the monitor server came back on – everything sorted itself out.

I have a lot to learn I know, any advice on whether I can check some settings or whether this is expected would be appreciated please.

Best Answer

As per error it does not necessarily indicate a problem with log shipping. The message indicates that the difference between the last backed up file and current time on the monitor server is greater than the time that is set for the Backup Alert threshold. Log shipping is out of synchronization beyond the backup threshold.

Instead, this message might indicate below problem:

The backup job is not running. Possible causes for this include the following: the SQL Server Agent service on the primary server instance is not running, the job is disabled, or the job's schedule has been changed.

The following list includes some of these reasons:

  1. The date or time (or both) on the monitor server is different from the date or time on the primary server. It is also possible that the system date or time was modified on the monitor or the primary server. This may also generate alert messages
  2. When the monitor server is offline and then back online, the fields in the log_shipping_primaries table are not updated with the current values before the alert message job runs. This seems to be the cause of LS going out of sync in you're case.
  3. The log shipping Copy job that is run on the primary server might not connect to the monitor server msdb database to update the fields in the log_shipping_primaries table. This may be the result of an authentication problem between the monitor server and the primary server.

It was preventing the secondary instance from applying the backups even though I later explicitly ran the restore job. When the monitor server came back on - everything sorted itself out.

When the monitor server instance goes offline and then comes back online, the log_shipping_monitor_primary table is not updated with the current values before the alert message job runs. To update the monitor tables with the latest data for the primary database, sp_refresh_log_shipping_monitor on the primary server instance ran and sync you're LS status.