Sql-server – Transaction log filling the drive if mirroring fails

mirroringsql serversql-server-2016transaction-log

I am having two SQL Server 2016 machines with mirroring set up. This is supposed to be hot-standby scheme.

HDD size is 1TB, and database size is around 600GB (just one DB). This is 90 days worth of archived data, everything older than 90 days is getting deleted every night (automatically through an external application which is using/filling the database in the first place). So 600GB is the peak DB size, it will not go beyond as it is being cleaned up regularly.

The problem is with the transaction log if one server fails, or if mirroring gets suspended for any other reason. If I understood the principle correctly, healthy server will retain the transaction logs as long as it doesn't get information from partner that everything is OK. So if mirroring fails, the HDD will get filled within several hours.

Is there any suitable technique to prevent this? I have backups of logs every 15 minutes and everything works fine, but if mirroring gets suspended, backups are not worth much, as the log will keep growing in spite of the backups. And the situation on site is a bit specific, there are no engineers, only operators who are accessing this data once or twice per day, so it's impossible to react straight away. It can take more than 24 hours to have someone attend the problem.

Only thing I could think of is some sort of trigger that would remove the mirroring completely once it was suspended for some time (or maybe if it is suspended and HDD space is too low). This will prevent the healthy server from crashing completely, but someone will again have to come to site and set the mirroring up from scratch. And due to bad design from the start, DB size is bigger than half of the HDD size, so I can't even create local backup/restore, I would have to do everything via the 100Mbps NAS that belongs to the client. And this would take more time than it would take the transaction log to fill the drive again.

Now I am thinking to set up a script that would execute every half an hour for example, and remove mirroring if log size is higher than let's say 200GB. Any prettier way to do this? Also, what will happen with mirroring if log hits the maximum size? All transactions are small and can be rolled back quickly, so I am also thinking to set the maximum log size a bit higher than previous threshold (250GB maybe) as an additional precaution. Any downside to this?

Edit 1: I want to put this in initial question as well, maybe my question was not clear enough, but I do not have issues with mirroring getting suspended, it is working perfectly (for now), I am just preparing for worst case. Main goal is full redundancy, and without any precautions, application will become unusable because of full hard drive if just one computers fails, it will happen within hours.

Best Answer

What I see from your question is that you are not addressing the root cause, instead you are building solutions around the core problem which will resurface until you address the root cause - why does mirroring get suspended ? Did you see any errors in error logs (windows or sql server) ? Does it happen on a frequent basis, what are the patterns that contribute to this ?

Address the above ones first. You wont have to re-setup mirroring for 600GB DB once you address the root cause.

This is what I have done in my current environment -

Have a process to check for mirroring health like using powershell or sql agent job :

select db_name(database_id) as MirrorSuspendedDBs 
        from sys.database_mirroring 
    where mirroring_state_desc = 'SUSPENDED' 
    and mirroring_role = 1

If it finds any db/s that are suspended, try to resume mirroring using

ALTER DATABASE dbname SET PARTNER RESUME;

if above fails (use error codes to catch errors, may be retry 2-3 times by sleeping in between a retry), generate a ticket or email for the DBA team.

This way, you know that something is wrong and address that for a permanent fix.

see this : Why Does the Transaction Log Keep Growing or Run Out of Space?

Edit:

The problem is with the transaction log if one server fails, or if mirroring gets suspended for any other reason. If I understood the principle correctly, healthy server will retain the transaction logs as long as it doesn't get information from partner that everything is OK. So if mirroring fails, the HDD will get filled within several hours.

For mirroring or AlwaysON, the secondary has to be up for the tlog to get truncated and reused by sql server.

If your concern is that secondary goes down and your drive fills up due to Tlog not getting truncated, then look into switching to logshipping provided it does not break your RTO (how fast you can bring your system up) and RPO (how much data loss is acceptable to your business).

Failover for mirroring is almost instantaneous vs logshipping (if you dont have a witness configured) is manual.

You can even go with clustering - DR-1 (2 nodes in each side clustered) and mirroring between the 2 sites.

Again - preparing for failure is a good thing, but do test out all your scenarios when choosing between HA/DR technologies.