The SQL Server log contains all the data that was written to the database (in some form). So, you are right to be concerned about it.
The log is a ring buffer made up of virtual log files (VLF). Once a VLF is not needed anymore, it can be reused. However it will not be reused until the "write head" comes around to that VLF again.
The only difference between simple and full (with a few small exceptions) recovery model is how quickly a VLF is "not needed anymore". If you use full recovery mode, a VLF is "needed" at least until a log backup has been taken. (A full backup never clears the "needed" flag of a VLF.)
The only way to force a VLF to be overwritten is to first make sure that it is not needed anymore and then pump enough transactions into the database that the VLF gets reused. That however is probably not acceptable to you.
Another way to force the data out is to shrink the log file. If you can take downtime, follow this process:
- Set the database to READ_ONLY
- Run the CHECKPOINT command.
- take a log backup (if in full recovery)
- make sure other log-readers (replication, CDC, ...) have caught up
- set the database to READ_WRITE
- create a new log file
- empty the original file migrating the data to the new (
DBCC SHRINKFILE
, there should not be any data left to migrate at this point, but you still need to execute this step.)
- drop the old log file
- take another log backup (if in full recovery)
You still need to test this, but this should get you to where you need to be.
If you can't take downtime, follow the same process without setting the database to READ_ONLY. However, you run the risk not to be able to have a complete clean cutover, but if you do this regularly you still should be OK.
Note, as some of the commenters have mentioned, a delete does not actually remove the data from the MDF files either. However, that can be dealt with in a similar fashion:
- rebuild all indexes on the offending table(s) (or better all tables)
- create a new file
- empty the old file, migrating the data to the new file using
DBCC SHRINKFILE
- drop the old file
Step 1 is necessary as the file migration does not change page contents, so single deleted rows might survive the move. Rebuilding the indexes should use new pages.
Additional notes:
The log cleanup has to happen after the data file cleanup as there is a chance that the data file cleanup moves some of the data into the log file.
Index rebuild operations and also a lot of data access operations utilize tempdb, so you have to shred that too. Tempdb is recreated on server restart, so you can stop the SQL Server service, delete the tempdb files and then start the service again.
Finally, this all causes a lot of blocks (~ database pages) to be unlinked from existing database files. This unlinking (due to file shrinkage or the deletion of files) does not force those blocks to be overwritten. So it is recommended to create the new files in the above processes on a separate disk and afterwards securely wipe the original disk.
Removing data will not release the space used by to the filesystem unless you then perform a database (or per file) shrink. The newly unallocated space will get used by new data as it comes in before any more is claimed from the OS by growing the relevant files.
If you need to fully reclaim the space then you need to look at DBCC SHRINKDATABASE
and DBCC SHRINKFILE
, though don't do this unless you have to as it can cause internal fragmentation that if allowed to build up can impact performance.
When you take a backup only used pages are included, so you will see backup sizes drop even though the actual database size has not reduced.
Best Answer
It looks to me like just updating the columns to
NULL
will release pages for reuse. Here's a Very Scottish® demo, to celebrate it being almost 5PM, EST.With rows inserted, let's check on our index pages.
After the insert, I get this. Actual pages may vary for you.
Let's
NULL
out some rows!And check back in on our pages:
So page count was reduced. Huzzah! For the two indexes that touch our
VARBINARY
data, they lost a buncha pages. That means they're back in circulation for other objects to use. Since I'm in tempdb, they probably get gobbled up pretty quickly by all the garbage things that go on here.Now let's put some data back in:
And checking back in:
Page counts inched up a bit.
So, looks like you don't have to do anything too crazy, or even shrink your database to get space reuse. I think you were conflating the behavior of dropping columns and needing to run
DBCC CLEANTABLE
with what you're actually doing.Hope this helps!