Transaction logs are, as far as I'm aware, not stored in what we would consider a very disk space friendly manner. This is because the logs have to be able to do several things. Almost all the neat tricks of ACID compliance are not because the database data files exist, it's because the transaction log files exist.
Think about some of the things the logs have to do:
- Allow a transaction to be rolled back to the state the database was in prior to the query running. This means the database has to know not just what was in place, but how to get back there.
- Keep the rest of the queries and transactions running from interfering with or being interfered by the query you're trying to run. Even if you're not doing these kinds of things, the database will do them unless you specifically tell it not to. The database engine needs to be able to seamlessly treat data not yet written to the database as though it were and as though it weren't. Potentially at the exact same time.
- The database employs write-ahead-logging, meaning it will write everything to the log before the transaction is considered complete. Note that committing a transaction doesn't write the data to the database. It means it's finished write-ahead-logging! This means the data is first written to the log, and then later is copied to the actual database. Writing data to logs is necessarily faster at the cost of not being space-efficient.
- Assuming you're using the full recovery model, the database logs must support point-in-time recovery. This means the system has to be able to recover the database to a specific point in time at all times regardless of what the database has undergone. So the transaction logs would have to show how to recover the database to how it was before your massive query as well as after it. This is different than the need to be able to rollback the transaction. This is about recovery hours or days after the data has been committed.
- Save to disk all of the above as fast as possible because performance is really important, too.
You may find it useful to read about the transaction log architecture. It's really, really not surprising that transaction logs eat up so much disk space.
A database I work on is about 5 GB in size. We restored the DB to a testing area so I could test log growing. This was on a DB using Simple recovery. One table in this database is about 2.5 million rows and is about 5% of the database. It has a clustered index for the primary key, but no other indexes, foreign keys, triggers, or check constraints. I ran two statements, essentially update bigtable set int_field = int_field + 1
followed by update bigtable set int_field = int_field - 1
. This does nothing at all but creates an awful lot of work for the database, especially because int_field
is part of the clustered index. Each execution caused the transaction logs to grow about 1 GB.
I'm assuming you ran (essentially):
insert into production.table (id, field, ...)
select id, field, ...
from staging.table
That will work, but it means the database has to keep track of an awful lot by default. However, you can use it in a minimally logged manner that will help a lot. If you really have to do this kind of thing, here's what I would do:
- Do a full backup of the DB and logs.
- Set the database to single-user mode.
- Disable any nonclustered indexes on the target table.
- Set the recovery mode to bulk logged.
From here, I'd follow MS's best practices for using INSERT INTO ... SELECT
for large operations with minimal logging:
Using INSERT INTO…SELECT to Bulk Import Data with Minimal Logging
You can use INSERT INTO SELECT FROM
to efficiently transfer a large number of rows from one
table, such as a staging table, to another table with minimal logging.
Minimal logging can improve the performance of the statement and
reduce the possibility of the operation filling the available
transaction log space during the transaction.
Minimal logging for this statement has the following requirements:
The recovery model of the database is set to simple or bulk-logged.
The target table is an empty or nonempty heap.
The target table is not used in replication.
The TABLOCK hint is specified for the target table.
Rows that are inserted into a heap as the result of an insert action
in a MERGE statement may also be minimally logged.
Unlike the BULK INSERT statement, which holds a less restrictive Bulk
Update lock, INSERT INTO…SELECT with the TABLOCK hint holds an
exclusive (X) lock on the table. This means that you cannot insert
rows using parallel insert operations.
Once I was done, I'd undo the three steps I'd done above. I'd revert the recovery model, index and single-user mode change, and do another backup. You'll then probably want to rebuild your indexes.
I'm not sure if you actually need to change the recovery mode to bulk logged to get some advantage from the minimal logging. I've never had to do exactly what you describe, so I can't be sure.
•On my observation, I found that this is happening only with insert statement not with select. So, I presume that log entries will only be for INSERT/UPDATE & not for SELECT statement in any case. Am I correct?
Select statement as such is not logged as compared to DML statements like insert , update and delete. If you see output of fn_dblog
for select statement it wont have any entry. But there would be numerous entries for DML statements. Please note. The transaction log is maintained and information related to transaction is written there so that during crash recovery or recovery SQL Server can know what transactions were committed and what were uncommitted and reading this it can bring database to consistent state THIS IS WHY log is so much necessary. Without transaction log recovery is not possible
For all practical purposes every transaction is logged in SQL Server. So YOU CANNOT have a scenario where you run a transaction and nothing is logged in transaction log. You have no option to disable logging in SQL Server. I suggest you to read Logging and recovery in SQL Server.
•How can I minimize logging while inserting updating records? I already have SIMPLE RECOVERY MODE
You are mistaken that logging does not happens in simple recovery model it does happen but moment transaction commits the transaction log is truncated and space used by logs generated for transaction is reutalized. As a fact logging in simple recovery is almost same as full recovery and difference lies when log truncation happens. Yes you can have minimal logging in Bulk Logged recovery model for only certain commands. For rest commands even in bulk logged recovery model logging would be full.
•I have found that average throughput during this period went from 25 MB approx. to 1.2 MB. What does it mean?
I cannot understand this can you please explain it.
•Is there any other way to troubleshoot this issue, other than increasing disk space?
As already suggested you are most likely to face issue if you keep @SET_SIZE INT = 500000000(even though its dummy) why such a high limit. Can you decrease it make it to 500000 and see how log file behaves. Make sure auotgrowth of log file is in MB and set to some sensible value. Do transaction in batches. Finally if you read Why transaction log keeps growing or runs out of space you can get some good idea.
•If only option is to shrink a file, then when should I do it? Can I do it when any active transaction is ON? [Production environment], for guiding me but I need to release space. I have a large Data warehouse which deal with millions of data everyday
Please don't shrink any, data or log file. I would not recommend it. If you really want to shrink ONCE to reclaim space and you have no option other than shrinking you might as well try it. But again remember you have to rebuild all fragmented indexes after you are done with shrinking. You can try method given By Paul in This Article which says not to shrink and gives alternative of shrinking.
You can shrink log file only once but please remember log file grew because you ran transaction which forced it to grow so its basically your mistake not transaction log file. Instead of this whole process of growing and shrinking why not preallocate some amount of space to log file so that it can avoid autogrowth events
Another thing
The above configuration is very bad. 1 MB autogrowth for data file is bound to cause issues. To set correct value please read the article Autogrowth settings. This will help you calculate correct autogrowth setting.
Best Answer
The log file is divided into several Virtual Log Files internally. The percent usage is based on this. If you have, say, 4 VLFs of equal size, you will never have less than 25% usage. There is always the head of the log and that VLF is in use.
The percentage can be different even when only one VLF is used, since VLFs in the same ldf file can be of different sizes.
DBCC LOGINFO will tell you which VLFs are in use. Status column. 2 means in use, 0 means not in use.
If you have more than one VLF in use, then the log_reuse_wait_desc column in sys,databases will tell you why SQL Server has more than one in use. Is it waiting for a log backup? Is replication holding it up? Etc.