Sql-server – Best way to shrink a DB after nulling data from varbinary(max)

blobdisk-spacesql serversql server 2014varbinary

We have a database with a large amount of data stored in a field of type varbinary(max). At some point we can purge that data for most rows, but not all. Our plan is to make that field nullable, and simply null out the data when it is no longer needed. Once we do that we would like to reduce the size of the DB. What is the best way to accomplish this?

If there isn't a good way to reclaim space with the current setup, one idea I have is to move that data field to a separate table with just two columns: the key to the main table and the data field. Then we could simply delete the rows when they are no longer needed. (And then do some sort of shrink.) However, this would be a much more difficult change to make than simply making the existing field nullable.

Note: I actually don't care so much about making the database file smaller, but I do care about the newly freed up space becoming re-usable.

Over 90% of the DB size is this one field. I'm at 3TB already.

Best Answer

It looks to me like just updating the columns to NULL will release pages for reuse. Here's a Very Scottish® demo, to celebrate it being almost 5PM, EST.

USE tempdb;

DROP TABLE IF EXISTS dbo.RobertBurns;

CREATE TABLE dbo.RobertBurns
(
    Id INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
    Scotch VARCHAR(50),
    HaggisAddress VARBINARY(MAX)
);

DECLARE @AddressToAVarbinaryHaggis VARBINARY(MAX); 
DECLARE @AddressToAHaggis NVARCHAR(MAX) = N'
Good luck to you and your honest, plump face,
Great chieftain of the pudding race!
Above them all you take your place,
        gut, stomach-lining, or intestine,
You''re well worth a grace
        as long as my arm.

The overloaded serving tray there you fill,
Your buttocks shaped like a distant hilltop,
Your wooden skewer could be used to fix a mill
         if need be,
While through your pores your juices drip
         like liquid gold.

His knife see the serving-man clean,
And then cut you up with great skill,
Making a trench in your bright, gushing guts
        To form a ditch,
And then, 0h! What a glorious sight!
        Warm, steaming, and rich!

Then, spoonful after spoonful, they eagerly eat,
The devil will get the last bit, on they go,
Until all their well-stretched stomachs, by-and-by,
        are bent like drums,
Then the head of the family, about to burst,
        murmurs “Thank the Lord".

Is there a pretentious soul who, over his French ragout,
Or Italian cuisine that would make a pig sick,
Or French stew that would make that same pig ill
        with complete and utter disgust,
Looks down with a sneering, scornful attitude,
        on such a meal? (as Haggis)

Poor devil! See him over his trash!
As feeble as a withered bullrush,
His skinny leg no thicker than a thin rope,
        His fist the size of a nut,
Through a river or field to travel,
        Completely unfit!

But look at the healthy, Haggis-fed person!
The trembling earth respects him as a man!
Put a knife in his fist,
        He''ll make it work!
And legs, and arms, and heads will come off,
        Like the tops of thistle.

You Powers who look after mankind,
And dish out his bill of fare,
Old Scotland wants no watery, wimpy stuff
        That splashes about in little wooden bowls!
But, if You will grant her a grateful prayer,
        Give her a Haggis!';


INSERT dbo.RobertBurns (Scotch, HaggisAddress )
SELECT TOP 1000 
CASE WHEN x.c % 15 = 0 THEN 'Laphroaig'
     WHEN x.c % 5 = 0 THEN 'Lagavulin'
     WHEN x.c % 3 = 0 THEN 'Port Ellen'
     ELSE 'Ardbeg'
END AS Scotch, 
CONVERT(VARBINARY(MAX), REPLICATE(@AddressToAHaggis, x.c % 20 + 1))
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY @@ROWCOUNT) AS c
FROM sys.messages AS m
) AS x;

CREATE INDEX ix_novarbinary ON  dbo.RobertBurns (Scotch, Id);
CREATE INDEX ix_yesvarbinary ON dbo.RobertBurns (Scotch, Id) INCLUDE (HaggisAddress);

With rows inserted, let's check on our index pages.

SELECT   OBJECT_NAME(i.object_id) AS table_name,
         i.name AS index_name,
         MAX(a.used_pages) AS leaf_me_alone
FROM     sys.indexes AS i
JOIN     sys.partitions AS p
ON p.object_id = i.object_id
   AND p.index_id = i.index_id
JOIN     sys.allocation_units AS a
ON a.container_id = p.partition_id
WHERE OBJECT_NAME(i.object_id) = 'RobertBurns'
GROUP BY i.object_id, i.index_id, i.name
ORDER BY OBJECT_NAME(i.object_id), i.index_id;

After the insert, I get this. Actual pages may vary for you.

table_name  index_name                      leaf_me_alone
RobertBurns PK__RobertBu__3214EC074BE633A2  5587
RobertBurns ix_novarbinary                  10
RobertBurns ix_yesvarbinary                 5581

Let's NULL out some rows!

UPDATE rb
    SET rb.HaggisAddress = NULL
FROM dbo.RobertBurns AS rb
WHERE rb.Id % 15 = 0;

And check back in on our pages:

table_name  index_name                      leaf_me_alone
RobertBurns PK__RobertBu__3214EC074BE633A2  5300
RobertBurns ix_novarbinary                  10
RobertBurns ix_yesvarbinary                 5273

So page count was reduced. Huzzah! For the two indexes that touch our VARBINARY data, they lost a buncha pages. That means they're back in circulation for other objects to use. Since I'm in tempdb, they probably get gobbled up pretty quickly by all the garbage things that go on here.

Now let's put some data back in:

INSERT dbo.RobertBurns (Scotch, HaggisAddress )
SELECT TOP 10 rb.Scotch, rb.HaggisAddress
FROM dbo.RobertBurns AS rb;

And checking back in:

table_name  index_name                      leaf_me_alone
RobertBurns PK__RobertBu__3214EC074BE633A2  5330
RobertBurns ix_novarbinary                  11
RobertBurns ix_yesvarbinary                 5305

Page counts inched up a bit.

So, looks like you don't have to do anything too crazy, or even shrink your database to get space reuse. I think you were conflating the behavior of dropping columns and needing to run DBCC CLEANTABLE with what you're actually doing.

Hope this helps!