Sql-server – Always On log files bigger than database files

availability-groupssharepointsql serversql server 2014windows-server

We had SQL RAP and we got warning that Transaction log files larger than the database files.

They advised me to remove databases from Always ON, change recovery mode to SIMPLE and then shrink log files and put it back to Always On. And change maintenance plans. T-LOG backup is running every hour. But again next day transaction log files are larger than the database files.

As this is running SharePoint 2013 databases, they said it could be do to the migration of SharePoint 2010 databases to 2013, but I have same behavior on another Always On cluster, which is not SharePoint. I have tried also to implement Ola Hallengren's maintenance script with same results. On other machines without AO there is no such problems.

So I am not sure why is this happening, is it related to Always On? I know that Always On AG Databases need bigger Log files, so would this be normal behavior in AO?

I have looked at Transaction log is growing very fast. I only have log_reuse_wait=2 in between log backups, everything looks to be working as expected but it is not.

I checked the log send queue and REDO queue size, and it looks good, as it only has "2 Log backup You are waiting for a log backup to occur." while it waits for next log backup. AOAG health is OK and bandwidth is more than enough. Log backup is every 15 minutes.

Best Answer

So I am not sure why is this happening, is it related to Always On?

It is and also isn't related to Always On. Always On Availability Groups requires that the database be in the full recovery model - which is why you need log management in the first place. So long as your secondary replicas are staying very close to the primary (i.e. no lagging secondary replicas for hours or days) log management shouldn't be affected very much due to AGs.

I know that Always On AG Databases need bigger Log files, so would this be normal behavior in AO?

Always On Availability Groups does not inherently need larger log files. Generally it ends up being the cause because most databases are in the simple recovery model before the change is made to Availability Groups and thus log managements is a new concept to that database.

If the database was in the Simple recovery model and the log grew to X GBs, then the minimum log size will be X GBs regardless of recovery model, Always On, etc. However, since log management isn't automatic anymore a random number (generally 30 minutes or an hour) is arbitrarily chosen and log management happens on those intervals. Now the log will need to grow to accommodate the changes in log management automation.

Much as @Shawn Melton and @RDFozz have stated in the comments - if you don't want the log growing too large and all of your replicas are properly in sync, then have your log management happen more frequently.

As this is running SharePoint 2013 databases ...

SharePoint is known for having their own timer jobs which run internal maintenance items and storing blob data. This, on its own, will cause poor log management and there is simply nothing you can do about it. Thus it doesn't surprise me that the log the this database is larger than the data files.

It's never a good sign when the datafiles are smaller than the log, but I do believe in this case - unless you change the log management automation to shorter intervals [And even then it depends on what SharePoint does to the DB] it'll most likely end up staying that way - and that might be normal. In that case we can mark it as normal and go about our day.

We had SQL RAP and we got warning that Transaction log files larger than the database files.

That's a great step in the right direction (I work for Microsoft) of having a risk and health check of your instance(s). This specific check is something that we us to look for databases that may be having improper or no log management. In some cases (I've personally dealt with) the DBA has said, "Ooops that shouldn't be in bulk-logged..." and we've changed it back to Simple because it didn't need any point in time or extended recovery options. That's ok! The check does, however, give everyone a opportunity to look at the databases in question and make sure that something isn't wrong and that's the key distinction. Like I said above, this might just be normal and that's ok - however we wouldn't be doing our jobs properly if we just left it alone and didn't say anything, it's important.

Anecdotally when I was a DBA for a large clothing company, we had a 3rd party application that used a cursor to loop through all employees every 15 minutes and run a stored procedure which deleted all of their history and then rebuilt it from scratch... in a single transaction. The database was 50 GB and the log file for it was.... 280 GB. That's how the app worked, I logged complaints and possible code changes with the 3rd party vendor and was summarily dismissed. It happens and some application just run that way.