Sql-server – How to manage log file size in large, heavy-use database

sql serversql-server-2012

We are a team of mathematicians (i.e., no DBA experience).

We have a large database in SQL Server 2012. It has more than 2 TB of data (hundreds of tables, each with millions of rows and hundreds of columns wide). Each month, we receive a bundle of additions and revisions to the data, requiring us to perform extensive updates to our database by deleting, replacing, or updating most or all of the tables.

Our work is mostly focused on crafting the SQL logic to calculate the results that we need. We're not running a real-time call center. We apply a few indexes as needed, and we're pretty happy with the performance.

The problem is the log file. Naturally, the log file grows and grows with so much data manipulation. Our log file currently stands at about 1 TB. We have a good amount of disk space, but it is not infinite.

From what we read on the Internet, we understand that the log file is necessary for transaction integrity, rollbacks, and recovery. But for our specific purposes, we don't care about any of that. We will probably never perform a rollback, nor would we ever attempt a recovery. Worse comes to worse, we would simply download the data files again and create a new database from scratch.

We really just want the log file to go away and never come back.

We set the database recovery mode to Simple, naively thinking that this meant "no recovery mode", but we were quickly cured of those illusions.

We also understand that there are many wrong things not to do (detaching, shrinking, etc.). We just don't know the right thing to do.

Maybe someone is going to suggest that we set a limit to log file growth. However, that leaves two issues:
(1) How do we get rid of the 1 TB which is already there?
(2) We previously tried that, and as we approached the specified limit we started getting Error 9002 (log file full) here, there, and everywhere. So now we are afraid of applying a size limit.

How do we tell the database "No log file, please" without any hurt feelings?

Best Answer

If you never anticipate doing a point-in-time restore, then you just need to do 2 things:

  1. Change the database to SIMPLE recovery mode
  2. Do large changes in small batches.

In SIMPLE recovery mode, the log file is used only for transactions in process. When the transaction completes, the space in the tran log will be marked for re-use, and the next transaction will just re-use the space*.

(* This isn't strictly true, there are "checkpoints" that run occasionally that actually clear the space for re-use, but for the most part it works without intervention; only finicky DBAs will likely care about the inner details here.)

So keep your transactions bite-sized:

  • If you need to delete all rows of a table, don't do a DELETE of 100m rows (that will blow up the log), do a TRUNCATE instead, or just DROP the table and recreate it.
  • If you can't do that (you need to delete just some rows), do it in smaller batches
  • When doing large data imports, learn about bulk imports or an SSIS Fast Loads that load data quickly and commits the data every so often
  • Same thing for updates, don't update every row of a massive table, use more selective WHERE clauses to limit your updates to ~100k rows at a time. Learn how to use cursors in SQL

The exact details are going to depend on your actual database and actual activity, of course, but a database with 1TB of data shouldn't have a tran log file any larger than, say, 100Gb, depending on how big your batches are.

With regard to shrinking the existing file, you can use the following script to see the free/used space in your data and log files:

SELECT DB_NAME() as dbname, type_desc, name as logical_name, 
    CONVERT(decimal(12,1),size/128.0) as TotalMB,
    CONVERT(decimal(12,1),FILEPROPERTY(name,'SpaceUsed')/128.0) as UsedMB,
    CONVERT(decimal(12,1),(size - FILEPROPERTY(name,'SpaceUsed'))/128.0) as FreeMB,
    physical_name
 FROM sys.database_files WITH (NOLOCK)
 ORDER BY type, file_id;

If the used space in the log is high, then you likely still have open transactions. Look for the 3rd-party tool sp_WhoIsActive for digging into your current activity/open trans.

When you are ready to shrink, pick a reasonable target size, and execute a SHRINKFILE:

DBCC SHRINKFILE (NAME = 'mylogfile', SIZE = 100000)

(That size is in MB).

If your log file still grows, go back to sp_WhoIsActive to see what transactions are running so long, and figure out how to do them in batches.