A server I administer recently experienced a power outage that caused issues with the SSRS ReportServerTempDB
. After the outage, the nightly job that runs Ola's DatabaseIntegriyCheck
started reporting problems:
sqlcmd -E -S $(ESCAPE_SQUOTE(SRVR)) -d Master
-Q "EXECUTE [dbo].[DatabaseIntegrityCheck]
@Databases = 'USER_DATABASES,SYSTEM_DATABASES',
@LogToTable = 'Y'" -b
Command: DBCC CHECKDB ([ReportServerTempDB]) WITH NO_INFOMSGS, ALL_ERRORMSGS, DATA_PURITY
Msg 8979, Level 16, State 1, Server XXXXXXX, Line 1
Table error: Object ID 133575514, index ID 1, partition ID 72057594040156160, alloc unit ID 72057594041466880 (type In-row data). Page (1:130168) is missing references from parent (unknown) and previous (page (1:123239)) nodes. Possible bad root entry in system catalog.
Msg 8979, Level 16, State 1, Server XXXXXXX, Line 1
Table error: Object ID 133575514, index ID 1, partition ID 72057594040156160, alloc unit ID 72057594041466880 (type In-row data). Page (1:130169) is missing references from parent (unknown) and previous (page (1:123239)) nodes. Possible bad root entry in system catalog.
Msg 8979, Level 16, State 1, Server XXXXXXX, Line 1
Table error: Object ID 133575514, index ID 1, partition ID 72057594040156160, alloc unit ID 72057594041466880 (type In-row data). Page (1:130170) is missing references from parent (unkn…
Process Exit Code 1. The step failed.
However, our nightly backup job continued to report success every time it ran. SSRS didn't seem to have any problems either. The backup job does have multiple backup commands rolled up into one step:
BACKUP DATABASE [ReportServer] TO DISK = N'C:\Backup-SQLServer\ReportServer.bak' WITH NOFORMAT, COMPRESSION, CHECKSUM, INIT, NAME = N'ReportServer-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10
GO
BACKUP DATABASE [ReportServerTempDB] TO DISK = N'C:\Backup-SQLServer\ReportServerTempDB.bak' WITH NOFORMAT, COMPRESSION, CHECKSUM, INIT, NAME = N'ReportServerTempDB-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10
GO
BACKUP DATABASE [TEST-PH] TO DISK = N'C:\Backup-SQLServer\TEST-PH.bak' WITH NOFORMAT, COMPRESSION, CHECKSUM, INIT, NAME = N'TEST-PH-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10
GO
Step two of our backup job checks the files. It ran every night and reported success each time:
RESTORE VERIFYONLY FROM DISK= N'C:\Backup-SQLServer\ReportServer.bak'
GO
RESTORE VERIFYONLY FROM DISK= N'C:\Backup-SQLServer\ReportServerTempDB.bak'
GO
RESTORE VERIFYONLY FROM DISK= N'C:\Backup-SQLServer\TEST-PH.bak'
GO
This went on for a few days until a convenient downtime. I resolved the issue by restoring/overwriting ReportServerTempDB
with a nightly backup from prior to the outage.
Our databases' PAGE_VERIFY_OPTION
is set to CHECKSUM
.
What scenarios would explain this behavior?
Best Answer
sp_BlitzErik wrote in a comment (since deleted):
And it will only throw errors, even with the
CHECKSUM
option in use for the page verify option, if the page became corrupted after writing to disk. And I don't suppose they're necessarily asking the question just because of that post - my guess is they have the option set toCHECKSUM
already.The scenario described in the question can and does happen when a page corruption happens in memory (either because of bad memory, a memory scribbler, or a SQL Server corruption bug), and then the corrupt page is written to disk with a valid checksum.
Subsequent reads of the page, including testing of the page checksum, will succeed. However, when
DBCC CHECKDB
(or a hapless query) runs, the corruption will become evident. In this case, it doesn't matter whether theBACKUP
andRESTORE
statements useCHECKSUM
; they won't detect that corruption.This is why you cannot just rely on using
CHECKSUM
everywhere and not run consistency checks.Hope this helps explain what you saw.
From the errors you posted, it looks like the root page of that clustered index got whacked somehow.