Sql-server – Backup detects corruption, but CHECKDB does not

corruptiondbcc-checkdbsql server

I have a database where when I run the backup command

BACKUP DATABASE [MyDatabase] TO     
DISK =  'G:\Backup\MyDatabase_01_01_2018.bak'   
WITH    NOFORMAT, NOSKIP, COMPRESSION, INIT, BUFFERCOUNT = 100

I get the error message

Msg 3043, Level 16, State 1, Line 8
BACKUP 'MyDatabase' detected an error on page (1:745345) in file 'F:\Data\MyDatabase_1.ndf'.
Msg 3013, Level 16, State 1, Line 8
BACKUP DATABASE is terminating abnormally.

I ran a full CHECKDB but it comes back clean. I did notice that the Page Verify option had been set to NONE (not my doing) so I changed it to CHECKSUM and rebuilt all the indexes in the DB to get it to write to all pages and generate checksums. After this the backup still fails and the checkdb still shows clean (so no change).

DBCC CHECKDB('MyDatabase') WITH NO_INFOMSGS, ALL_ERRORMSGS,
DATA_PURITY, EXTENDED_LOGICAL_CHECKS;

from the SQL log:

DBCC CHECKDB (MyDatabase) WITH all_errormsgs, no_infomsgs, data_purity executed by xxx found 0 errors and repaired 0 errors. Elapsed time: 0 hours 21 minutes 46 seconds. Internal database snapshot has split point LSN = 000ab776:0000112f:0001 and first LSN = 000ab776:0000112d:0001.

I ran DBCC PAGE but it errors (doesn't even seem to be returning the right page in the first place). I CAN run it with print option 2 and it returns but honestly I don't know what I'm looking for there.

DBCC PAGE ('MyDatabase',1,745345,3)
PAGE: (3:513793)

BUFFER:


BUF @0x00000003811F8280

bpage = 0x00000000F2D70000          bhash = 0x0000000000000000          bpageno = (1:745345)
bdbid = 5                           breferences = 0                     bcputicks = 0
bsampleCount = 0                    bUse1 = 44283                       bstat = 0x809
blog = 0x5adb215a                   bnext = 0x0000000000000000          

PAGE HEADER:


Page @0x00000000F2D70000

m_pageId = (3:513793)               m_headerVersion = 1                 m_type = 2
m_typeFlagBits = 0x4                m_level = 0                         m_flagBits = 0x0
m_objId (AllocUnitId.idObj) = 1075937538                                 m_indexId (AllocUnitId.idInd) = 2
Metadata: AllocUnitId = 633462595911680                                  Metadata: PartitionId = 0
Metadata: IndexId = -1              Metadata: ObjectId = 0              m_prevPage = (3:513795)
m_nextPage = (3:513820)             pminlen = 17                        m_slotCnt = 426
m_freeCnt = 2                       m_freeData = 7338                   m_reservedCnt = 0
m_lsn = (608841:643611:411)         m_xactReserved = 0                  m_xdesId = (0:0)
m_ghostRecCnt = 0                   m_tornBits = 0                      DB Frag ID = 1

Allocation Status

GAM (1:511232) = ALLOCATED          SGAM (1:511233) = NOT ALLOCATED     
PFS (1:744096) = 0x40 ALLOCATED   0_PCT_FULL                             DIFF (1:511238) = NOT CHANGED
ML (1:511239) = NOT MIN_LOGGED      

Msg 2514, Level 16, State 8, Line 20
A DBCC PAGE error has occurred: Invalid page metadata – dump style 3 not possible.

Any ideas what I could try next? Server version is

select @@version
Microsoft SQL Server 2014 (SP2-CU11) (KB4077063) - 12.0.5579.0 (X64) 
    Feb 21 2018 12:19:47 
    Copyright (c) Microsoft Corporation
    Developer Edition (64-bit) on Windows NT 6.3  (Build 9600: ) (Hypervisor)

Compatibility level of the DB is 100 (SQL 2008).

Best Answer

This answer is taken from an issue of the SQLskills.com newsletter written by Paul Randal, about "a database which would fail a backup with page checksum errors, but passed a DBCC CHECKDB".

The only time this can happen is when an extent is a mixed extent (where the 8 pages in the extent can be allocated to potentially 8 different allocations units – see here) and some pages are erroneously marked as allocated by the relevant PFS page.

When that happens, DBCC CHECKDB will not attempt to read those pages, as it derives what pages to read from an allocation unit's IAM pages (the first of which lists the pages allocated from a mixed extent). This case is a gap in DBCC CHECKDB's corruption-detection logic.

[Because] DBCC CHECKDB couldn't detect the corruption, it wasn't possible for it to make the repairs needed to fix them. So using DBCC WRITEPAGE, I worked out the changes needed in the allocation status for the erroneously-allocated pages, directly in the PFS page, and it worked!

This was an extremely rare case – it's much more common that a DBCC CHECKDB fails but a backup would succeed.

In my opinion, Paul's resolution is way above and beyond exporting and importing the data like you did, so I think you did the right thing.