SQL Server – How to Handle Database Corruption in QueryStore Internal Table

corruptionquery-storesql serversql-server-2016

This morning, the following email alert was received:

DATE/TIME: 2/28/2018 9:26:42 AM

DESCRIPTION: Attempt to fetch logical page (1:3948712) in database 9
failed. It belongs to allocation unit 72057594045857792 not to
72059184917512192.

COMMENT: (None)

JOB RUN: SQL Sentry 2.0 Alert Trap

Looking in the event log of the secondary replica there are three occurrences of the same message:

Source spid138

Message Attempt to fetch logical page (1:3948712) in database 9
failed. It belongs to allocation unit 72057594045857792 not to
72059184917512192.

Running the following on the secondary replica (2 node synchronous Availability Group):

DBCC TRACEON(3604)
dbcc page (9, 1,3948712,3)
go
DBCC TRACEOff(3604)

Snippet of the results from either replica:

Page @0x00000070DAB8C000

m_pageId = (1:3948712)              m_headerVersion = 1               
m_type = 3 m_typeFlagBits = 0x0                m_level = 0            
m_flagBits = 0x8200 m_objId (AllocUnitId.idObj) = 129   m_indexId
(AllocUnitId.idInd) = 256  Metadata: AllocUnitId = 72057594046382080  
Metadata: PartitionId = 72057594040811520                             
Metadata: IndexId = 1 Metadata: ObjectId = 197575742      
m_prevPage = 0:0)                  m_nextPage = (0:0) pminlen = 0                 
m_slotCnt = 2                       m_freeCnt = 1634 m_freeData = 6568
m_reservedCnt = 0                   m_lsn = (46041:1506360:18)
m_xactReserved = 0                  m_xdesId = (0:0)                  
m_ghostRecCnt = 0 m_tornBits = -99702035              DB Frag ID = 1

Running the following on the primary replica:

select OBJECT_NAME (197575742)
plan_persist_plan

Questions

  1. Am I right in saying that I have a clustered index corruption of the plan_persist_plan table which is part of Query Store?
  2. Is the best/only fix to run the following:

    ALTER DATABASE MyDatabase SET QUERY_STORE CLEAR; 
    
  3. If #2 is the best fix, is there any good way of preserving the data in Query Store that would be deleted?

  4. Does this kind of corruption indicate a problem with the IO subsystem?

Other info

  • I have QueryStore enabled obviously, it has a capacity of 350MB, is in Read-Write mode currently, flush interval 15 minutes, stats collection hourly, Capture mode ALL, Auto size based cleanup, 5 day stale query threshold.
  • DB id 9 is a business critical user database
  • The error details are Error: 605, Severity: 21, State: 3.

I have checked the Windows System Event log as per the guidance. This has yielded only "Informational" events, no errors.

DBCC CHECKTABLE ('sys.plan_persist_plan');  

results:

DBCC results for 'sys.plan_persist_plan'.
There are 12562 rows in 240 pages for object "sys.plan_persist_plan". 
DBCC execution completed. 
If DBCC printed error messages, contact your system administrator.

I cannot establish the correct command to rebuild the index, the following does not work:

ALTER INDEX PK_plan_persist_plan_cidx ON sys.plan_persist_plan REBUILD;

Best Answer

As noted in my comment above I had a similar corruption issue with a query store internal table.

As you yourself have suggested I used ALTER DATABASE MyDatabase SET QUERY_STORE CLEAR; to attempt to fix the issue and that did work fine. In SQL Server 2017, Microsoft added a repair procedure that can be attempted prior to clearing the data: sp_query_store_consistency_check (source)

If you want to preserve the data then probably the only method is to copy the tables - I can't find anyone who has created a script for that.

Usually with corruption I too would be worried about my disks, but in this case I'm a little suspicious that the issue is with query store itself.