Sql-server – Extracted data WAY bigger than deficit left from deleted rows

disk-spacesql serversql-server-2008-r2

I recently undertook an exercise to free some space up in a production DB (SQL Server 2008 R2) due to disk constraints. Just one table in particular takes up 98% of the DB. Records that were not needed were identified and represent 1/3rd of the data (Roughly 160GB of 450GB).

So I created a DB on a different disk, created a table with the same schema, only difference being an ID column was created without IDENTITY. I'll refer to this as Archive and the Original as Source. Original had PK index and two other small ones amounting to 19MB according to SSMS. Archive just has PK index as it wont be queried.

I transferred the data out in batches of 1000 rows at a time to minimize the impact on production. After a few hours the disk space on the Archive was running out too so I stopped the transfer when the DB reached 145GB, this represented 800,000 rows ish.

Then I began a delete on the Source table, making sure to only delete what was Archived. This ran overnight and was stopped before trading began, this ended up deleting 550,000 rows from the original table.

According to archive, this should be about (145GB / 800000) * 550000 ~= 100GB. But free space in the Source database went to just 45GB!

I've done the obvious googling and checking DMVs but i couldn't find anything useful, then i undertook the task of comparing the data lengths of all the data to see if the issue lied there. As we're using XML columns my thinking was the deleted columns represented a smaller proportion of archived data due to their contents. This got more confusing, here are the results:

  • In BOTH Archive And Source, (ARCHIVE FIGURE) = 69.24362996 GB
  • In BOTH Archive And Source, (SOURCE FIGURE) = 69.24362996 GB
  • In Archive but not Source (Deleted from source) 61.09401641 GB
  • Archive Total 130.3376464 = GB

So this proved that the Archive's data size matches the Source (Top Two), And more importantly what I suspected that 550000 rows deleted represented a smaller portion of disk space than the remaining 250000.

But the deleted records have added up to 16GB more than the 45GB that was free'd in the DB. And also the Archive is calculated as having 130GB despite being 145GB in size so again a 15GB deficit.

This is what i'm struggling to figure out. My current theory, though I don't know how to prove it, is that the data removed (which date wise is spread across the table) has been removed from pages leaving some data behind that didn't meet the criteria, so if these pages were full at 8kb and we removed some data that represented 2kb of that, is this why the figure doesn't match, only 45GB worth of pages were released and the remaining 16GB are tied up in pages that were shared with other rows not matching criteria.

Sorry about the massive post, I've pretty much exhausted my current knowledge on the intricacies of how SQL Server stores things.

If you have any idea why either the Archive row sizes add up to 15GB less than the total DB size (it is the only table in the DB btw).
OR
Most importantly Why only 45GB of space was release in the Source when the same data takes up 61GB in the Archive.

Additional notes:

  • After the delete, all indexes were rebuilt so that figure is after rebuild.
  • The table in question does have a clustered index (on the PK) – it's not a heap.
  • FillFactor is 80% on both tables, and page fullness is 76% on Source and 78% on archive.
  • I altered the fill factor on archive (by rebuilding the index) to 99% to see if it would free any space and it had no effect.
  • I've ran reorg on the Source with Compact large object data on and its also made 0 difference to free space.
  • The number of rows per page varies because of the xml column, it could be anywhere from 1 to 18.
  • Fillfactor is 80% on the archive because the schema script generated the fill factor with it. Its only a temporary holding space, we'll be setting up a proper archive or datawarehouse soon. Regardless it doesnt seem to be an index issue as altering rebuilding and reorging hasn't impacted either archive or the source DB.
  • Allocation Information
    enter image description here

-LOB_DATA USED: 47740971 DATA:0 TOTAL:49987517
it is clear that lob data pages are being held and not used.
I will attempt to use this command found on MSDN when i get get approval
DBCC CLEANTABLE
Though this is for dropped columns, i'm running out of options.

-DBCC CLEANTABLE did not work, admittedly it was a long shot. So back to the drawing board, the only other suggestion that might work i migrating into a new table but given that that is 400GB+ there is no way i'd have a maintenance window or enough disk space to try that. Happy to hear other theories.

Best Answer

For the first thing I'd control the FillFactor for both indexes (you said your tables have PK defined on them, so I assume the tables are clustered).

It could be set to 50% in the first table or to smth like that, even if I do not see much sense in setting FF to smth different from 100 for an index on identity column.

In this case when you rebuilt the first table even if it should compact the table it could instead leave free space according to FF defined.

.................

The second option is your XML that did not release the space.

After you rebuilt your table all the index pages and data pages with in-row or row-overflow data are full again (of course if the FF was not changed to some lower value than it was before)

But the LOB pages were not compacted by rebuild. In this case you can try ALTER INDEX..REORGANIZE to compact your XML