Sql-server – SQL Server – Log file not shrinking in AlwaysOn group

availability-groupsreplicationshrinksql servertransaction-log

We have three-node AO setup in SQL Server. Primary node, one secondary node in Sync mode and another secondary node in Async mode.
Main Database on Primary node, log file-size is 434GB, Used 99.91%. Log backups is happening every hour only on Primary node.

Ran following query on Primary and Secondary servers.

SELECT log_reuse_wait_desc
FROM sys.databases
WHERE name = 'Main DB Name'

Result:

node        log_reuse_wait_desc
------------------------------------------------------------
primary     AVAILABILITY_REPLICA
Secondary-1 LOG_BACKUP
Secondary-2 AVAILABILITY_REPLICA

While other DBs (much smaller than main DB) in AO groups log usage is very minimum (1 to 10%) and can be shrunk, main DB log usage continuously showing more than 99% and not shrinking.
I am guessing since log_reuse_wait_desc is showing "AVAILABILITY_REPLICA" on Primary Node, means not all the log is transferred to Secondary replica, and so its still full.

I checked Redo Queue Size in Secondary-1 Node showing 27GB, Secondary-2 is showing nothing. So most likely Secondary 1needs to get all logs from Primary. The problem is Redo Queue Size in Secondary-1 Node is increasing and not decreasing as what I through it should. Also log-file on Primary node is slowly increasing which is going to fill-up whole disk-space.

Now Secondary-1 Node showing LOG-BACKUP. How do I solve this issue? Should I take Log backup on Secondary-1 Node and check?

Best Answer

The issue is resolved. Apparently one of programs connecting to same large DB had been running for more than 24-hours, seemed it was in "hung" state, had several open connections to secondary replica-DB caused blocking. I guess that was causing not completing log replication from primary to secondary, because when I ended those programs, immediately blocking were gone and Redo Queue Size started decreasing instead of previously increasing. After about an hour or so, log file in primary node showed more than 80% available free space.

dbcc opentran and sp_who2 didn't give that much details to track this issue. What helped was

select * from sys.sysprocesses where blocked<>0
dbcc inputbuffer(spid)

The first statement was to find out what was causing blocks, and second one to kill. I am just sharing if it helps others in similar situation.

TLog backup is being taken for all user-Databases in every hour only on primary node as mentioned in my post. So far TLog physical file-size was small, wouldn't increase and most of the time had more than 90% available free space - which is what its supposed to be. So after the issue is resolved, only difference is file-size is very big which I understand because it kept on growing. Now I need to do a TLog shrink (we did before) and it should go back to initial size.

TLog back doesn't need to be taken on all replicas. Technically you can, but if you are taking on all replicas, in case of Database restoration you have to gather all TLog backups from all Nodes and restore them which may be complicated.

Related Solutions

Sql-server – SQL availability groups log maintenance

When you run the log backups from a secondary server, when the log backup is complete the secondary replica will tell the primary replica that the log backup has been completed and what the last LSN to be backed up was. The primary replica will then mark the VLFs which were backed up as inactive. That change (which is logged) will then be transferred to the secondary replicas.

In other words, no you don't need to take backups from the primary replica as well as the secondary.

Sql-server – Transaction Log Maintanance While Using AlwaysOn Availability Group

You can backup the transaction log from either of the replicas. Doing the transaction log backup on either the primary replica or the secondary replica will mark both replicas' transaction logs as reusable (provided no other stoppers are in place like active transactions, etc.).

To do a test, in a non-production environment, setup an availability group just as you have it in your production system (asynchronous commit to the secondary replica).

In my test environment I have a test database, TestBackupDatabase, and I bloated it with logged transactions through a dummy table:

use TestBackupDatabase;
go

create table dbo.TestTable
(
    id int identity(1, 1) not null,
    some_int int not null
        default 1
);
go

insert into dbo.TestTable
default values;
go 1000

Now when I do a transaction log backup on my primary, using DBCC SQLPERF(LOGSPACE) I see on both transaction logs (primary and secondary) that space used has dropped due to log truncation. Bloating the transaction log back up with the same test on the primary:

insert into dbo.TestTable
default values;
go 1000

I now do a transaction log backup on the secondary async replica. Running DBCC SQLPERF(LOGSPACE) again on each replica I see the same behavior: transaction log reuse.

BOL Reference: Active Secondaries: Backup on Secondary Replicas (AlwaysOn Availability Groups)

Best Answer

Related Solutions

Sql-server – SQL availability groups log maintenance

Sql-server – Transaction Log Maintanance While Using AlwaysOn Availability Group

Related Question