How to ensure backups are successfully persisted to S3 when using Ola Hallengren solution on AWS EC2 Instances

amazon ec2awsbackupdisaster recoveryola-hallengren

Improving Existing Backup Approach By Using Ola Hallengren Backup Solution

Deploying a new backup approach to eliminate dependency on a vendor tool (Cloudberry) that I've leveraged before. In my microblog post, I describe the process I'm taking which includes:

  • Creating and deploying all the maintenance solution via dbatools
  • Leveraging s5cmd as a sync tool to ensure backups get copied from EBS volume to S3. (performant alternative to installing aws-cli).
  • Using Red Gate SQL Monitor/PagerDuty integrations for monitoring and alerting on issues.

Painpoint – Replacing Existing Tool

The question is marked for AWS EC2 specifically as much of this is solved in Azure with the ability to backup to Azure Blob storage supported by SQL Server.

Cloudberry is a great budget oriented tool that allows you to use a GUI to schedule backups that automatically get pushed to S3. It's a pretty good budget solution, but I've seen serious scaling issues once you deal with thousands of databases. At this point, the local sqlite database and S3 metadata queries seem to cause slow-downs and impact the server RPO.

I've evaluated other tooling, but very few of them were flexible with S3 uploads as a native feature.

An eye towards reliability

As I've worked through the final steps I'm looking for points for failure and reliability concerns.

One area that becomes a bit less easy to validate is the successful sync of all contents to S3.
While the agent should fail with an exit code, knowing all files did get up to S3 is important.

How to Gain Assurance of All Files Persisted To S3

Here's a few questions specifically that I'm trying to work through and would appreciate any insight on.

  • Can EBS Snapshots replace the need to sync to s3 with a goal of 15 min RPO on a dedicated backup volume? I'm not including an data files/log files in this, so it would purely a backup drive.
    • Based on some prior tests I believe a shorter interval is difficult to get on EBS.
  • S5cmd performs very well, but isn't as controlled as doing this with a dedicated PowerShell script. For instance, simply iterating through the S3 files to perhaps generate a diff report at the end would take 8 seconds along on s5cmd and 43 seconds getting it via AWS PowerShell tools. With this running every 15 minutes I want as quick and minimal of a performance impact as possible to the server, not run a lot of custom scripts beyond this.
  • Is there any approach you'd take to audit the local directory of backups against S3 can validate nothing locally is missing or is there where relying on the sync tool just has to be done.
  • Any usage of AWS Backup, Data Sync, or other tooling natively integrated in AWS that could solve these issues? FSx, DataSync, and others seem to add more risk and complexity to manage.

Other Solutions

  • I've considered dbatools based backups in the past, as I could gain full control of running the backup, pushing to s3, and logging all of this with more control. However, after deliberation and community discussion I decided against this as I felt the use-case for dbatools was more for adhoc backups and that leveraging Ola Hallengren Backup tooling was a more robust solution for production usage. The negative to using it is that my PowerShell error handling and logging isn't going to be implemented.

I look forward to any insight and appreciate the help ?

Best Answer

Have you tried using the AWSCLI? I have an Agent Job that runs a .bat after my backups are complete. The bat runs the sync command. I don't delete anything from S3, I simply let the CLI sync new or changed files up to my bucket. I then use bucket policies to manage the lifetime of my backups that are up there.

My bat script is below. You will need to provide your own paths. Don't forget to run AWS Configure to setup the CLI. I suggest creating a specific profile that has least privilege access to upload to S3. This is for Windows.

setlocal

set  tm=%time: =0%
set  tm=%tm:~0,2%_%tm:~3,2%_%tm:~6,2%
set dt=%date:~10,4%_%date:~4,2%_%date:~7,2%

echo "Starting..." >> "L:\Logs\SyncBackups\s3SyncBak_%dt%_%tm%.txt"

aws s3 sync "B:\Database" "s3://MyBucketName/ProductionDatabaseBackups" --exclude "*" --include "*.bak" --profile MyProfileName >> "L:\Logs\SyncBackups\s3SyncBak_%dt%_%tm%.txt"

echo "Synced with no problem ..." >> "L:\Logs\SyncBackups\s3SyncBak_%dt%_%tm%.txt"