Sql-server – Dumps after CHECKDB error

I have a little problem with a two-node mirror and customer requirements:

There is a maintenance window every night from 01:30 to 03:00, within which I have to run rebuild/reorg and checkdb (physical_only). If Murphy happens both jobs run on the same index/table and SQL Server creates a (mini-)dump. Due to the dump, the principal doesn't respond within the set timeout period and the mirror roles change, leading to app problems.

We just have that 90 minute window during the night and customer/PFE wants us to run both jobs in this window. DBCC runs for ~80min and rebuild/reorg for ~30min.

Any suggestions to get around those Murphy-moments or to avoid that dump?

SQL Server 2012 SP3 CU7 used by an older Axapta Installation DB-Size
900GB-1TB (data files on SSDs)

Index Maintenance with Ola Hallengren:

EXECUTE dbo.IndexOptimize @Databases = 'PROD',
@FragmentationLow = NULL,
@FragmentationMedium = NULL,
@FragmentationHigh = 'INDEX_REORGANIZE,INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE',
@FragmentationLevel1 = 50,
@FragmentationLevel2 = 80,
@SortInTempdb = 'Y',
@MaxDOP = 0,
@LogToTable = 'Y',
@TimeLimit = 3600

DBCC CHECKDB also from Ola Hallengren with Physical_only option.

I don't know if there is a "real" need to run those rebuild/reorgs every night but the customer told us to do so (also suggested by MSFT PFE).

We're in communication with the customer about expanding the maintenance window, but I want to know if there are other suggestions/tips.

The actual DBCC CHECKDB error message is:

DBCC CHECKDB (XXXX) WITH all_errormsgs, no_infomsgs, physical_only executed by Username found 2 errors and repaired 0 errors.
Table error: Object ID 111391516, index ID 2, partition ID 720 57595795734528, alloc unit ID 72057595826864128 (type In-row data), page (7:14130686).
Test (IS_OFF (BUF_IOERR, p BUF->bstat)) failed.
Values are 133129 and -4

Best Answer

Defragmenting indexes when those indexes are stored on Flash/SSD is close to pointless. I'd recommend performing the reindex far less frequently. See Paul's answer here regarding page density and fill factor for mitigating the effects of index fragmentation.

Updating statistics, on the other hand, will provide important clues to the query optimizer. If auto-update-stats is disabled for the given database, ensure you are updating statistics via a nightly job.

Ensure you run DBCC CHECKDB as often as possible to ensure corruption does not negatively affect your RPO. You may consider offloading DBCC CHECKDB to a non-production instance. This consists of automated backup-and-restore to the non-production instance, then automatically running DBCC CHECKDB on that non-production instance. You'll probably need to license the machine where you run DBCC CHECKDB since it would be dealing with production data, however I'd check with your Microsoft rep to be certain.

If you're seeing SQL Server creating memory dumps, that indicates a problem with SQL Server that should be brought to the attention of Microsoft Tech Support - they may either have a hotfix available to resolve the issue, or they may offer some further advice about how to resolve that issue. The problem may indeed be a bug that needs to be fixed.

Best Answer

Related Solutions

DBCC CHECKDB Consistency Error

Voluntarily produce an error with DBCC CheckDb

Related Question