Sql-server – How to Resolve Corruption Detected by BACKUP WITH CHECKSUM, But NOT DBCC CHECKDB

checksumdbcc-checkdbsql-server-2008-r2

Recently, I was asked to refresh a couple of databases on a test server from production. Always wanting to follow best practices, I performed the backup using the WITH CHECKSUM option, and eventually got the following error:

Msg 3043, Level 16, State 1, Line 32
BACKUP 'DBName' detected an error on page (1:428321) in file 'F:\path\DBName.mdf'.
Msg 3013, Level 16, State 1, Line 32
BACKUP DATABASE is terminating abnormally.

Surprised at the error, I ran DBCC CHECKDB to see what's going on, but unfortunately (or perhaps fortunately), it comes back clean.

All of the information I have found so far encourages the use of DBCC CHECKDB's findings as a guide on where to go next, so I'm not sure how to fix this?

Some quick info:

  1. SQL Server is 2008 R2.

  2. I found confirmation of the error in the msdb.dbo.suspect_pages table. It contains event types 1 and 3.

  3. The DBCC command I used was DBCC CHECKDB (DBName) WITH NO_INFOMSGS, ALL_ERRORMSGS;

  4. I am not the official DBA, but since I have some experience, I was permitted to perform the backup and restore.

  5. I did read the similar post What types of corruption can DBCC CheckDB miss?. However, while I believe I understand why (the point of that post), I don't know how to resolve it.

  6. I was able to successfully perform the backup by omitting the WITH CHECKSUM option.

Thanks in advance for any guidance!

UPDATE

The comment provided by "usr" below got me on the right track. Just what exactly does page (1:428321) contain? I did some more research and found the DBCC PAGE command. However, I noticed something odd. When I executed the following…

DBCC TRACEON (3604);
DBCC PAGE ('DBName', 1, 428321, 3);

I got the following back (note the page number on the first line)…

PAGE: (1:2021305)

BUFFER:

BUF @0x0000000CFEFCC580

bpage = 0x0000000CFE764000           bhash = 0x0000000000000000           bpageno = (1:428321)
bdbid = 18                           breferences = 0                      bcputicks = 0
bsampleCount = 0                     bUse1 = 20204                        bstat = 0xc00809
blog = 0xbcca2159                    bnext = 0x0000000000000000           

PAGE HEADER:

Page @0x0000000CFE764000

m_pageId = (1:2021305)               m_headerVersion = 1                  m_type = 1
m_typeFlagBits = 0x0                 m_level = 0                          m_flagBits = 0x8000
m_objId (AllocUnitId.idObj) = 5313   m_indexId (AllocUnitId.idInd) = 256  
Metadata: AllocUnitId = 72057594386120704                                 Metadata: PartitionId = 0
Metadata: IndexId = -1               Metadata: ObjectId = 0               m_prevPage = (1:2021304)
m_nextPage = (1:2021306)             pminlen = 194                        m_slotCnt = 10
m_freeCnt = 228                      m_freeData = 7944                    m_reservedCnt = 0
m_lsn = (121329:238172:13)           m_xactReserved = 0                   m_xdesId = (0:0)
m_ghostRecCnt = 0                    m_tornBits = 297819981               

Allocation Status

GAM (1:2) = ALLOCATED                SGAM (1:3) = NOT ALLOCATED           
PFS (1:420576) = 0x40 ALLOCATED   0_PCT_FULL                              DIFF (1:6) = CHANGED
ML (1:7) = NOT MIN_LOGGED            

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

Page 2021305??? Didn't I enter 428321? If I simply run the command with the page before it (428320), I get that page. So I guess I'm getting closer, but still don't know what that page contains.

Best Answer

I'm not sure how to fix this

In the comments we investigated but never found out conclusively whether the page is allocated or not.

If it's allocated this is definitely a corruption that CHECKDB missed. A bug in CHECKDB. Please report that to Microsoft.

If it's not allocated here's how I would fix it: Fill the file with dummy data e.g. CONVERT(binary(8000), 0x). Now the page should have been consumed by the dummy data and overwritten with something clean.

Actually, you can use this idea to test whether SQL Server thinks the page is allocated. If this works it's definitely not allocated.

Alternatively, you could use the write page feature to zero fill the page but that leaves an audit trace in the database boot page. Microsoft support will not like finding this, I guess, when they help you in the future with something. They might think your page write caused damage. Also, it steals all the fun from this investigation by terminating it without resolution :)