Sql-server – Wasted space in MDF file: Should I just use SHRINKFILE or some other alternative

shrinksql serversql-server-2012

I've got a database which has an 850GB MDF file. Over the course of 20 months a logging mechanism in the application created a variety of huge tables. The maintenance script which was only supposed to retain a weeks data at any one time wasn't executing properly (it is now) so we had to manually clean things up.

We actually cleared about 600GB worth of data. I want to free up some of this space as the drive it is on is near limit. The data file will never grow this large again so it's just wasted space.

I was planning to use SHRINK-FILE to deal with this. I have tested on a clone of the database it takes around 3 hours. There doesn't seem to be any performance degradation on the database. Yeah, the indexes are heavily fragmented afterwards but I can sort those.

My question is this:

The application needs to remain 'up' throughout this process as it's critical. I understand that the SHRINKFILE operation is a fully online one, or am I wrong?

The database has a large amount of write activity occurring pretty much every second.

Will the increased I/O caused by the SHRINKFILE operation affect these writes?

Turning this on its head, will the continuous writes affect the SHRINKFILE operation, i.e. make it slower to complete?

Finally, is there a better way to do this?

Update

Just for anyone reading this – I ran SHRINKFILE and monitored I/O and
CPU. There was a slight rise in both but no latency issues at all with
my high-writes-per-second database. It remained up, stable and
trouble-free throughout the 1.5 hour SHRINK operation.

Indexes were horribly fragmented afterwards which gave the reporting
side of the application a little grief (graphs were slow to render on
the client application) but an Online index rebuild later and
everything was back to normal.

Best Answer

It's impossible for us to know if it will affect the application, and it will be extremely difficult for you to determine this either. Even with a 100% like-for-like system and distributed replay clients, you would not notice the millisecond impact the disk IO has, and you would not be able to tell that the application itself is sensitive to that.

The only certainty is if the vendor themselves provides you a solution to prove it one way or the other (i.e. the application itself can do its own load testing) and IMHO very few vendors have such a thing.

Now onto the rest.

Shrink file on a data file is doing a lot IO and this could affect an extremely latency sensitive application - but I've never seen it. If your backups aren't killing you then it's fairly unlikely that this will kill you either. You'd have to already be living on a knife edge of tolerances.

As a DBA you should always cover yourself by letting people know what's going on (either formally through a change ticket or informally through other means). Let them know there's no known or expected impact but that you're available to stop it on a dime in the off chance anything does occur. Shrink file does stop immediately if you cancel it.

But otherwise for me it's a business as usual change with no impact. I'm not in finance or health care so your standards may be higher.

Here's what you do need to know though:

  • You'll be generating a lot of log records. This means you better be in simple mode, or full mode with very frequent (every few minute) log backups, to ensure you don't start filling your disk.

  • If you're in full mode you'll have a massive set of DIFF backups (if you do them) up until the next full backup.

  • On the weekend, or whenever your index maintenance window is, you're going to hit the same case of massive log backups and DIFF backups. This might be a problem if your maintenance does rebuilds and not just reorganisation; rebuilds could put you outside of your maintenance window, will take a massive amount of log or tempdb space while running, and blow up to hundreds of GB. If they're just doing reorganisation though (i.e. no outage) then you'll just see the large log and DIFF backups.

Now all of those latter things definitely can cause outages. But if you have plenty of backup space and disk space for your data and log files, and every-few-minute log backups, and are only doing reorgs, and have sensible small non-percentage based log growth sizes set, then you're unlikely to run into any trouble.