SQL Server – Factors Impacting Recovery Time

amazon-rdsrestoresql server

In a recent planning AWS RDS failover we experienced long recovery times of around 20 minutes. AWS provides some hints around why.

large transactions or a lengthy recovery process can increase failover
time.

The database itself doesn't make much use of large transactions, so I don't think this is the culprit. Therefore I'm interested to understand what factors might result in a lengthy recovery process?

  • Does the SQL Server version have an impact (We are on 2016)
  • Does the size of the database have an impact?

The recovery process took around 20 minutes and we'd like to take steps to reduce this to around 10 minutes.

Update:

One thing I didn't mention is, due to years of bad design about 50% of the database size is due to storing blobs of files, such as html or XML. Could this have any impact on recovery time?

Best Answer

Since you are not using SQL server 2019, ADR won't be applicable. There are few general guidelines, which should be considered in almost all the cases and they are as below:

  1. Keeping Transactions as tiny as possible.
  2. Control long running transactions liaising with application team or vendor as appropriate.
  3. Take transaction log backup as frequently as possible to reduce Analysis phase.

There is an option of changing target recovery time however that has some down-side also. You should consider them before enabling them.

From MS-site, if you avoid below scenarios, you don't need ADR:

Who should consider accelerated database recovery The following types of customers should consider enabling ADR:

  • Customers that have workloads with long running transactions.
  • Customers that have seen cases where active transactions are causing the transaction log to grow significantly.
  • Customers that have experienced long periods of database unavailability due to SQL Server long running recovery (such as unexpected SQL Server restart or manual transaction rollback).

From the comment and new addition to your question about using blob type in your database, yes having these data types can significantly delay any checkpoint by database. In stead, you should use CAS storage type as explained well by Mr. Brent Ozar:

https://www.brentozar.com/archive/2015/03/no-more-blobs/

Hope this helps you in taking right decision.