SQL Server Error 825 – How to Resolve

Accidental DBA here (DB developer/jack of all trades), and first-time poster. So please be gentle. I have an alert on our SQL Server for error 825, and it fired in February. Then twice in June, and now three times in the past three weeks. The databases are for a vendor application. DBCC CHECKDB ran in production after the errors, and I also restored to a test server and ran again, with no errors. The errors occurred on two of the databases, one of which holds 99% of the application's data. No rows in suspect_pages.

I also wrote to the software/database vendor, and they pretty told me to do what I already did, and sent a bunch of cut-n-paste research that I've already found.

Everything I found says hardware or drivers. Our network/hardware is maintained by a third-party provider, and they say they don't see anything wrong with the disks. We run VMware on EMC – that's about all I know about hardware. SQL is Standard Edition 2008R2 SP3, running on Server 2008R2 Standard.

Oh, I changed DBCC CHECKDB to run daily.

We've been told we should hire a DBA because our disks are healthy.

Any advice?

Thanks,
Pat

Best Answer

Here you can find the explnation of this error by Paul Randal: A little-known sign of impending doom: error 825

From SQL Server 2005 onwards, if you ever see an 823 or 824, SQL Server has actually tried that I/O a total of 4 times before it finally declares a lost cause and surfaces the high-severity I/O error to the connection’s console, killing the connection into the bargain. The idea behind this read-retry logic came from Exchange, where adding the logic reduced the amount of immediate downtime that customers experienced. While in concept this was something I agreed with at the time, I didn’t agree with the way it was implemented.

If the I/O continues to fail, then the 823/824 is surfaced – that’s fine. But what if the I/O succeeds on one of the retries? No high-severity error is raised, and the query completes, blissfully unaware that anything untoward happened. However, something did go badly wrong – the I/O subsystem failed to read 8KB of data correctly until the read was attempted again. Basically, the I/O subsystem had a problem, which luckily wasn’t fatal this time. And that’s what I don’t like – the I/O subsystem went wrong but there are no flashing lights and alarm bells that fire for the DBA, as with an 823 or 824. If read-retry is required to get a read to complete, the only notification of this is a severity-10 informational message in the error log – error 825.
Additional messages in the SQL Server error log and system event log may provide more detail. This error condition threatens database integrity and must be corrected. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

What this message is really saying is that your I/O subsystem is going wrong and you must do something about it. And unless you’re regularly scanning the error log looking for these, you’ll be none-the-wiser.

Best Answer

Related Solutions

Sql-server – SQL 2005: logical consistency-based I/O error: incorrect checksum

Sql-server – SQL Server 2005 query erroring with message 7105, slot for LOB data type node does not exist

Related Question