Sql-server – Reducing the size of a database

disk-spaceperformanceshrinksql serversql-server-2008

SQL Server newbie here. I'm a MySQL guy. I'm having a look at something for a client in their 2008 SQL Server, and need some advice. Whoever designed the database chose to log insane amounts of stuff and never flush those log tables.

The largest table stores complete XML documents from transactions between the app and APIs of sites like eBay. I can only assume that the database being about 230 gigabytes hurts performance. I'm guessing these tables are not queried on in the app, but even so, I don't like the idea of such a huge database. After purging the log tables, I would anticipate a total remaining size of about 30GB.

I'd like some advice on how to go about this. From the little I've read on the subject, after deleting a bunch of data, the database file will not automatically shrink in size. I also read that shrinking and re-indexing are bad.

  • Is this large database hurting performance of other tables?
  • Should I do something about it?
  • How can I safely do something about it that will yield a performance increase?

Best Answer

If you are going to continue logging data in this database, the last thing on earth you want to do is shrink the database file (and then perform index maintenance that will require it to grow again). Never mind that these shrink and grow operations will interfere with performance, and that the end result is not going to leave you much better off than when you started.

Since the file is only going to grow again, this is an extremely futile operation - much like drying off while still in the shower. What are you going to do with all that disk space you free up temporarily? Lease it out to another application until the database needs to grow again? Of course not. If the database grew to that size once, it will grow to that size again, but it is going to be much more efficient to reuse the space within the file without all this unnecessary shrink-grow-shrink-grow roller coaster of the file itself.

Even if you move the logging table to another database, you should do what you can to pre-allocate the data file to a size that will accommodate the amount of logging you want to save (be it a week, a month, what have you). Keep this database trim by purging data every day, and stop worrying about shrinking & re-indexing. If you size it right, there should always be some free space, but not excessive free space. And if you need to re-index (you shouldn't really, if your clustered index is datetime or otherwise monotonically-based), do so after the purge (when you have the most amount of free space), not after a shrink (when you have the least).

You can do what Mark suggests without introducing a new database to the application, or changing the application or its interface to the database at all (one important change, of course, would be the removal of any foreign keys or other database-dependent features). You can simply create your table(s) in the new database, then add an INSTEAD OF INSERT trigger to the table in the current database (I assume there are no updates to the logging table, but you may also need an INSTEAD OF DELETE trigger in case you don't directly control the process that performs the purging). That will help with writes, but you'll have to point reads elsewhere, since there are no INSTEAD OF SELECT triggers. Another alternative would be to rename the existing table and create a synonym or even a view that points to the new table.


If you're going to need to clean up the logs table that has grown, I would avoid a single atomic transaction like:

DELETE dbo.logs_table WHERE [datetime] < '20121201';

This will cause massive log growth and will take a long time. Instead you can break the cleanup into chunks, e.g.

BEGIN TRANSACTION;

SELECT 1;

WHILE @@ROWCOUNT > 0
BEGIN
  COMMIT TRANSACTION;

  -- if in simple: CHECKPOINT
  -- otherwise: BACKUP LOG

  BEGIN TRANSACTION;

  DELETE TOP (1000) FROM dbo.logs_table WHERE [datetime] < '20121201';
END

I picked 1000 and Dec. 1st arbitrarily, I don't know what fits your scenario best. The point is that you want to keep transactions short and contained, and prevent any long-term impact while you clean up the table. Another option I've used in the past, instead of deleting 99% of the junk in a table, move the 1% you want to keep to a new table and drop the old one.

BEGIN TRANSACTION;

SELECT * 
  INTO dbo.new_logs_table 
  FROM dbo.logs_table
  WHERE [datetime] >= '20121201'

COMMIT TRANSACTION;

-- create indexes/constraints/triggers on dbo.new_logs_table

BEGIN TRANSACTION;
  DROP TABLE dbo.logs_table;
  EXEC sp_rename N'dbo.new_logs_table', N'logs_table', N'OBJECT';
COMMIT TRANSACTION;

If you say the logs have never been purged, then you may very well be in a scenario where the database will be at a size that you don't anticipate you will ever need again (say, if you only intend to ever keep a week of logs at a time). In this case I might go along with a shrink operation, but only if truly necessary (e.g. you really do need the space for other purposes). A bunch of empty pages aren't going to affect your backups or other operations, and the pages will eventually be fully deallocated and reused.