If the sequence is:
- DBCC CHECKDB reported errors.
- REPAIR_REBUILD fixed the database, no errors reported.
- DBCC CHECKDB reporting errors again.
You probably have failed or failing disks in an array or some other component of the IO subsystem is broken, or breaking. Personally, I'd want off the problem server ASAP!
- Take a tail-log backup.
- Restore last known good full backup and diff/log sequence to a different server.
- Switch service to the second server.
- Get a fresh cuppa and start diagnosis on the problem server.
Addendum: Root cause of the problem:
as it turned out, I found that as we are using Diskeeper, we are (for corporate reasons) still using an old version : 14.0.896. With this version we run into messages like :
The operating system returned error 1784(failed to retrieve text for this error. Reason: 15100) to SQL Server during a write at offset 0x000000010a4000 in file 'D:\Application-Data\MSSQL\Data\<Database>.mdf:MSSQL_DBCC17'. Additional messages in the SQL Server error log and system event log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
Msg 3313, Level 21, State 2, Server <Server>, Line 1
During redoing of a logged operation in database '<Database>', an error occurred at log record ID (2349664:1503:4). Typically, the specific failure is previously logged as an error in the Windows Event Log service. Restore the database from a full backup, or repair the database.
Msg 1823, Level 16, State 1, Server <Server>, Line 1
A database snapshot cannot be created because it failed to start.
Msg 1823, Level 16, State 2, Server <Server>, Line 1
A database snapshot cannot be created because it failed to start.
Msg 7928, Level 16, State 1, Server <Server>, Line 1
The database snapshot for online checks could not be created. Either the reason is given in a previous error or one of the underlying volumes does not support sparse files or alternate streams. Attempting to get exclusive access to run checks offline.
Msg 5030, Level 16, State 12, Server <Server>, Line 1
The database could not be exclusively locked to perform the operation.
Msg 7926, Level 16, State 1, Server <Server>, Line 1
Check statement aborted. The database could not be checked as a database snapshot could not be created and the database or table could not be locked. See Books Online for details of when this behavior is expected and what workarounds exist. Also see previous errors for more details
Found an explanation on Paul Randal's blog here
Link to DiskKeeper Fix found here
Interesting Diskeeper white paper on database defragmentation here
DBCC CHECKDB
makes extensive use of tempdb. It relies on a mechanism similar to snapshot isolation to make sure it always sees a transactionally consistent image of the database, independent what else might be going on.
The error you receive is most likely due to the fact that there is not enough room on the drive that tempdb is on. However, the error also states to check any prior errors. Were there any?
Best Answer
First, find the person who put that job in place and have him/her tied to a chair listening to cats scratching a chalk board while you go for lunch.
When done with that, compare your resource utilization and performance data for extended periods before and after the job runs. I suspect you'll see perf dips and a number of resources getting really busy after the job executes. You can temporarily suspend the job and monitor closely for a week to see what happens. I doubt you'll run into problems but as with all changes to a prod system, still keep a close eye for a while and make sure you capture perf, resource usage and waits data.
These sprocs are typically run in a test environment, especially for performance testing, to see how specific queries or the overall workload behaves on a cold-start equivalent. It can also be used to see how specific parts of the system handles unexpected load spikes especially storage.
We have encountered a single support engineer whose standard perf tuning method involves doing these things on a production system; we couldn't get rid of him fast enough. Running all of them on a production system regularly is pretty close to rebooting your server daily to keep it healthy and doing it in the middle of a busy day for best results - really, really shouldn't.