Sql-server – Log files in Always on Disaster Recovery Situation

availability-groupsdisaster recoverysql servertransaction-log

In my Always on Avaliability group setup there are four nodes in the cluster in two datacenters, an avaliability group has for replicas, one on each node.
Two replicas in the same datacenter have sync replication for HA and the other two replicas in the other datacenter are async for DR purposes.

My company wants to perform a Disaster recovery test in the form of simulating one datacenter going down from Friday night to Sunday afternoon.

When the primary datacenter has gone down I can manually fail the AGs from the primary to the secondaries, we ave good latency so any data loss will be very negligable.

However the one thing that concerns me is how to handle the logs on replicas that are up and running in the DR site, which will grow and grow as the secondary replicas will be down and the log files cannot be backed up.

I had planned to just let the logs grow for the period of time (friday night to Sunday afternoon) and when the secondary replcias were brought back up let the databases sync up again and fail back. There is a decent amount of space on the disc drive for the logfiles.

I am not sure if I can let this situation happen for this long amount of time, the only other option I can see is removing the secondaries that are offline and then recreating and reseeding the databases when the Primary site is back. But I really dopn't want to have to do this as it will be a substantial amount of Database and will take quite a bit of time.

Is there any way I can handle it better than just letting the log files grow?

Best Answer

Is there any way I can handle it better than just letting the log files grow?

I believe you have the ideal options covered already. You either

  1. Let the log files grow for the duration of the outage, or
  2. Remove the replicas that are offline from the AG (and re-seed them when the primary datacenter is available)

I'd lean towards #1 if you have the disk space, and #2 if you become aware that there will be an extended outage (or disk space is low).