Sql-server – SQL Server Frozen Ghost Cleanup workaround needed

ghost-cleanupsql serversql-server-2008-r2

I have several tables with amount of rows between 5M and 1.5G

Each table has its BLOB field, which size varies from 100 bytes to 30 MBytes and which is stored as 'large value types out of row' = ON

Tables are stored in different filegroups with 3-4 files each on different disk @ different LUNs @ very fast SAN

Every day these tables grow for 5-100 Gb in size and with 600k – 1.5M rows

After certain amount of time, which varies from 2 weeks to 6 months some of the rows are deleted or moved to archive DB, so – there is no any rows in worktables that older than 6 months.

Current configuration of server:

  • SQL server engine is 2008 R2 SP1 Enterprise @ 24 cores, @ 64Gb RAM
  • SQL Server runs with extra startup flags:

-T 3640; (Eliminates sending DONE_IN_PROC messages to client for each statement in stored procedure. This is similar to the session setting
of SET NOCOUNT ON, but when set as a trace flag, every client session
is handled this way)

-T 1118;(Switches allocations in tempDB from 1pg at a time (for first 8 pages) to one extent.)

-T 2301;(Enables advanced optimizations that are specific to decision support queries. This option applies to decision support processing of
large data sets)

-T 1117;(Grows all data files at once, else it goes in turns.)

-E; (Increases the number of extents that are allocated for each file in a filegroup. This option may be helpful for data warehouse
applications that have a limited number of users running index or data
scans)

-T 834; (Causes SQL Server to use Windows large-page allocations for the memory that is allocated for the buffer pool,
http://msdn2.microsoft.com/en-us/library/aa366720.aspx,
http://support.microsoft.com/kb/920093)

  • SQL Server uses Large Page Extensions
  • SQL Server utilizes fast file initialization option
  • AUTOSHRINK is OFF for all the databases

The problem is – that starting from some point of server's uptime (from couple of days to months) GHOST CLEANUP process refuses to work out forced cleanups and simply do its usual job – cleans up several pages in several seconds (which is seen thru Extended Events), which is not suitable, because it is not able to clean up all the deleted rows

The problem persists from the times of SQL Server 2005 RTM Enterprise

How I was tried to solve the issue:

  • Tried to force SCAN operations on clustered indexes of the tables
  • Tried to force SCAN operations, which involving all the contents of BLOB column on clustered indexes of the tables
  • system sp_clean_db_free_space & sp_clean_db_file_free_space
  • manually dbcc cleanpage(@dbid , @fileid, @page) for all the files and pages in DB
  • clustered index rebuilds and reorganizing
  • recreating database
  • DBCC FORCEGHOSTCLEANUP

  • When I run the query:

    select * 
    from sys.dm_db_index_physical_stats(db_id(), object_id('ProblemTable'), 1, 0, 'detailed')
    

    I see millions and tens of millions ghost records, but only for
    allocation unit type of LOB_DATA

The only things, that help:

  • stopping the server with SHUTDOWN command or restarting the whole
    host – it helps, after restart GHOST CLEANUP process runs some hours
    and actually cleans all the ghosted records
  • DBCC SHRINKFILE with EMPTYFILE option – moving all the data from one file to other or newly created files cleans up ghost records in this file only – the problem is that I really hate shrink operations. And this takes 3-4 days for ONE file

the question – is there exists any programmatic (preferable) or maintenance way to force GHOST CLEANUP without server downtime at all, because server downtime costs too much, even unacceptable – its from thousands to tens of thousands $ per hour

Problems were noticed alike mine are here:

And just the same is here:

Best Answer

Finally, MS has recognized the issue as a bug: http://support.microsoft.com/kb/2622823

Briefly: It is fixed in

  • Sql Server 2008 SP3 CU4
  • Sql Server 2008 R2 CU10
  • Sql Server 2008 R2 SP1 CU4

In Sql Server 2012 SP1 I'm not experiencing the issue for more than year of runtime.