Sql-server – Determining what objects SHRINKDATABASE will free space from

disk-spacefragmentationshrinksql server

First off, yes, shrink is bad. Don't use it, ever. I understand. Unfortunately, it's kind of SOP and I'm not in a position to argue since it's usually a stopgap until we can get a maintenance window to allocate more space and rebuild/reorganize indexes.

We recently had an issue where we were switching over from traditional backups to a third-party tool. Due to the size of the database and some large objects we'd recently removed, it made sense to shrink it prior to taking the clean backup with the new tool before the cutover. I had some reservations, but we went ahead with it as it rarely causes problems. Obviously, clearing the logs isn't really an issue, but we were shrinking the mdf. This caused a lot of blocking on just a few large, highly-transactional objects and, rightly so, brought up some pretty serious questions about how we use this.

I've been doing a lot of research, but haven't yet found enough detail to my satisfaction.

Is there anywhere that actually describes the internals of how SHRINK is operating?
Is there a way to determine which objects are more fragmented than others and how it's freeing up space? Alternatively, does it not matter how fragmented a table is and it's more about the overall size of the object compared to others in the db?

Paul Randall talks a bit about it here, but it's not enough detail.

I know you could sort of base it on the dm index stats, but it's been pretty difficult to run on some of our larger databases due to the size and over-indexing. I'm also not sure if that's all I'd need to be concerned with. Main tables are stored as heaps and only the indexes are really fragmented, but I'm not sure how different partitions affect things or if other parts of the underlying DB could also be getting freed up.

I did also see this question, but I'm not sure if I'd be able to configure it to the kind of granularity we're looking for.

Best Answer

A few thoughts:

1. Never use SHRINKDATABASE, always use SHRINKFILE.

You want to be in control of which file it is taking space from, and exactly how much so you leave some space in reserve. Run a query like this to see the actual used/free space in each file:

SELECT DB_NAME() as dbname, type_desc, name as logical_name, 
    CONVERT(decimal(12,1),size/128.0) as TotalMB,
    CONVERT(decimal(12,1),FILEPROPERTY(name,'SpaceUsed')/128.0) as UsedMB,
    CONVERT(decimal(12,1),(size - FILEPROPERTY(name,'SpaceUsed'))/128.0) as FreeMB,
    physical_name
FROM sys.database_files WITH (NOLOCK)
ORDER BY type, file_id;

Let's say your primary data file is 100GB, but only 20GB is actually used, you might have a final target size of 25GB or 30GB to leave some internal free space. So you'd do a shrink something like:

DBCC SHRINKFILE (name='logical_name', size=30000)

2. Internal SHRINKFILE Behavior

Internally, shrinking a data file simply moves used pages one at a time from the end of the file to an open spot near the beginning until it has enough space at the end of the file to meet your target size.

It's a fairly simply brute-force method and pays no attention to the contents of the pages, which is why it frequently results in table fragmentation. From the Paul Randal article you linked:

A data file shrink operation works on a single file at a time, and uses the GAM bitmaps to find the highest page allocated in the file. It then moves it as far towards the front of the file as it can, and so on, and so on.

If the page it wants to move is locked by another process, it will be blocked.

3. SHRINKS keep their progress

If you start a shrink, and then have to stop it because of blocking or other conflicting activity, it will retain its progress. All the pages it successfully moved will stay moved. You can resume later to finish the work.

4. Finding out what tables will be impacted by a shrink

There might actually be some ways to do this, using queries on sys.allocation_units or sys.partitions or maybe using DBCC PAGE or something, but my recommendation is: don't bother, large databases will have the scattered remains of thousands of objects across the data file, most of which you don't care about.

If you're insistent, here are some advanced articles I found via search:

Stack Overflow: See what data is in what SQL Server data file?
Using DBCC PAGE to Examine SQL Server Table and Index Data by Armando Prato
Inside the Storage Engine: GAM, SGAM, PFS and other allocation maps by Paul Randal
Analyzing SQL Server Data File Anatomy by Murilo Miranda

You can already find out what tables/indexes are at play when it starts blocking, everything else is noise.

If that lock never clears to give you a chance to move the page, then you might have to schedule some downtime with no other activity.

5. When you are done, do a reindex to fix fragmentation

The data file might grow a bit again, but unless your database is one massive table, you'll still likely be left with a smaller end result than when you started.

Best Answer

Related Solutions

Sql-server – Defrag or Backup+Restore

Sql-server – Reclaiming space from a large partitioned database

Related Question