Sql-server – How to free the unused space for a table

database-sizedbccshrinksql server

This question is asked like tens of times, and to my surprise such a simple requirement comes so difficult. Yet I can't solve this problem.

I use SQL Server 2014 Express edition with 10GB limit of database size (not a filegroup size, database size).

I crawled news, and inserted HTML into a table. Table's schema is:

Id bigint identity(1, 1) primary key,
Url varchar(250) not null,
OriginalHtml nvarchar(max),
...

Database ran out of size and I received insufficient disk space

Of course shrinking database and filegroup didn't help. DBCC SHRINKDATABASE didn't help. So I wrote a simple application to read each record, strip out some unwanted parts of the OriginalHtml like head section and aside and footer to keep the main body only and I now see this image when getting report of disk usage by top tables:

enter image description here

As I understand this picture the unused space is now like 50 percent of total size. That is, now I have 5GBs unused space. But I can't reclaim it.
Rebuilding indexes didn't help. The truncateonly option won't help because as I understood no record is deleted, only the size of each record is reduced.

I'm stuck at this point. Please help, what should I do?

Clustered Index is on column Id.

This is the result of EXECUTE sys.sp_spaceused @objname = N'dbo.Articles', @updateusage = 'true';

name        rows     reserved     data        index_size   unused
----------- -------- ------------ ----------- ------------ -----------
Articles    112258   8079784 KB   5199840 KB  13360 KB     2866584 KB 

Best Answer

All things being equal, it should be enough to compact the large object (LOB) column OriginalHTML. You don't specify the clustered index name in the question, so:

ALTER INDEX ALL
ON dbo.Articles
REORGANIZE 
WITH (LOB_COMPACTION = ON);

See ALTER INDEX (Transact-SQL)

If you have the clustered index name (not just the clustered column(s)), replace the ALL above with that name.

The LOB_COMPACTION option defaults to ON, but there's no harm in being explicit. You may need to run the REORGANIZE repeatedly to finish reclaiming all the unused space.

Unfortunately, the way LOB data is organized and the way LOB compaction is implemented means this method may not always be able to reclaim all the unused space, no matter how many times you run it. It can also be very slow.

You could also try the method in the related Q & A Freeing Unused Space SQL Server Table

If, for whatever reason, the above does not work for you, export the data to a file, truncate the table, then reload it. There are several methods to achieve that, for example the bcp utility.

Example

The following creates a table with 10,000 wide rows:

CREATE TABLE dbo.Test 
(
    c1 bigint IDENTITY NOT NULL, 
    c2 nvarchar(max) NOT NULL,

    CONSTRAINT PK_dbo_Test
        PRIMARY KEY CLUSTERED (c1)
);

-- Load 10,000 wide rows
INSERT dbo.Test WITH (TABLOCKX)
    (c2)
SELECT TOP (10000)
    REPLICATE(CONVERT(nvarchar(max), 'X'), 50000)
FROM master.sys.columns AS C1
CROSS JOIN master.sys.columns AS C2;

We can see the space usage using the sys.dm_db_index_physical_stats DMV:

SELECT
    DDIPS.index_id,
    DDIPS.partition_number,
    DDIPS.index_type_desc,
    DDIPS.index_depth,
    DDIPS.index_level,
    DDIPS.page_count,
    DDIPS.avg_page_space_used_in_percent
FROM sys.dm_db_index_physical_stats
(
    DB_ID(),
    OBJECT_ID(N'dbo.Test', N'U'),
    1,
    NULL,
    'DETAILED'
) AS DDIPS
WHERE 
    DDIPS.alloc_unit_type_desc = N'LOB_DATA';

DMV output

We now update the LOB content to a smaller size (but one that still requires off-row storage):

-- Change LOB data to a smaller value (that will not move in-row)
UPDATE dbo.Test WITH (TABLOCKX)
SET c2 = REPLICATE(CONVERT(nvarchar(max), 'Y'), 5000);

DMV output

Notice that some space has been reclaimed, but the remaining pages are much less full than they were.

We can compact the LOB space using:

ALTER INDEX PK_dbo_Test ON dbo.Test 
REORGANIZE 
WITH (LOB_COMPACTION = ON);

DMV output

This results in some compaction and space savings, but it is not perfect. Running the compaction again may or may not improve the situation. In my test, it did not, no matter how many times I re-ran it.

Export, truncate, reload

One way to do this entirely from Management Studio involves using xp_cmdshell to export the table data to a file. If xp_cmdshell is not currently enabled, the following will do that:

-- Enable xp_cmdshell if necessary
EXECUTE sys.sp_configure
    @configname = 'show advanced options',
    @configvalue = 1;

RECONFIGURE;

EXECUTE sys.sp_configure
    @configname = 'xp_cmdshell',
    @configvalue = 1;

RECONFIGURE;

Now we can perform the export:

-- Export table
EXECUTE sys.xp_cmdshell
    'bcp Sandpit.dbo.Test out c:\temp\Test.bcp -n -S .\SQL2017 -T';

Note you will need to change the path and -S server name, and possibly provide login credentials.

How we can truncate the table, and reload it using BULK INSERT:

-- Truncate
TRUNCATE TABLE dbo.Test;

-- Switch to BULK_LOGGED recovery model if currently set to FULL
-- Bulk load
BULK INSERT dbo.Test
FROM 'c:\temp\Test.bcp' 
WITH 
(
    DATAFILETYPE = 'widenative', 
    ORDER (c1), 
    TABLOCK,
    KEEPIDENTITY
);

The final step is to reset the identity seed:

-- Check and reseed identity
DBCC CHECKIDENT('dbo.Test', RESEED);

This sequence of operations is typically faster than LOB compaction, and should always produce optimal results:

DMV output

The above is not quite as efficient as it could be due to a long-standing bug: BULK INSERT with IDENTITY column creates query plan with SORT. The workaround listed there is effective, but I would only bother with it if the table is very large.

Don't forget to delete the temporary file used to hold the exported data.

You are of course free to use whichever bulk export/import approach is most convenient to you. It is not required to use xp_cmdshell or bcp.

Additional notes:

  • FILLFACTOR only applies to index pages. It does not affect off-row LOB storage (which is not stored on index pages).

  • Row and page compression are not available for off-row storage.

  • As an alternative, you can compress and decompress data explicitly using the COMPRESS and DECOMPRESS functions available from SQL Server 2016.

    An option for those using SQL Server 2014 (which is the case here) or older (down to SQL Server 2005) to get the same compression functionality provided by the COMPRESS and DECOMPRESS built-in functions is to use SQLCLR. Pre-built functions that do just this are available in the Free version of SQL# written by Solomon Rutzky. The Util_GZip and Util_GUnzip functions should be equivalent to COMPRESS and DECOMPRESS, respectively. And, anyone using SQL Server 2012 or newer should make sure that the server running SQL Server is updated with .NET Framework version 4.5 or newer so that the much improved compression algorithm will be used.