Sql-server – Detecting and fixing database corruption on SQL Server mirrored database

corruptionmirroringsql serversql-server-2005

I’m new to my current company and we have a System Administrator but no DBA. In the past I’ve taken care of DBA related issues at other companies. I’ve run into and corrected database corruption in other databases (Foxpro and Access) and was able to at least document the data corrupted so it could be reentered. I’ve never encountered or suspected corruption with a SQL Server database. I suspect that I’m running into this with a SQL Server 2005 database that is being mirrored.

The situation is that simple delete commands passed to the database are not deleting data in certain tables. I literally have to go in through SQL Server Management Studio and issue the delete command to clear out these tables. There are anywhere from 144 thousand to 32 Million records in temp tables and this data is taking up more than 90% of the database (as determined after deleting the records and shrinking the database). After I do this clean up, then the delete commands passed to the database on the test server work.

My first line of attack would be to run DBCC CHECKDB against copies of the database. I ran it against a backup and there was no corruption found. We can’t get exact copies of the database as my System administrator has had issues in the past where the database server has taken up to 6 hours to get online and he does not want to disable both the live and mirrored SQL servers.

My first question is if I should trust a database backup to include corrupted data that would be found by DBCC CHECKDB. If the backup does not store the corruption, what is the best way to stop and then bring back both the live and mirrored database to get copies of the LDF and MDF files?

Finally if I do find corruption in testing how would I determine the contents of the page that could show the corruption?

Best Answer

It is highly unlikely what you are seeing is corruption.

The far more likely cause of this issue is the "app" is not issuing the DELETE FROM command correctly, or in the way that you think it is. Perhaps the DELETE is being rolled back for some reason, or perhaps the WHERE statement is preventing rows being deleted that you think should be deleted. If you can view the code in the app, add the code that is performing (or not performing) the DELETE to your question via the edit link. If you don't have access to the code, you could use SQL Server Profiler to run a trace so you can "see" the statements your app is issuing to the server to determine for certain what the app is doing.

It's possible there are some application-level data problems preventing the application from performing the delete. For instance, if there are no primary/foreign key relationships enforcing referential integrity, it's possible a DELETE FROM statement is using a join in the delete that is effectively excluding rows from being deleted as a result of those rows being orphaned. By way of an example, take a look at this:

USE tempdb;

IF OBJECT_ID(N'dbo.Child', N'U') IS NOT NULL
DROP TABLE dbo.Child;
IF OBJECT_ID(N'dbo.Parent', N'U') IS NOT NULL
DROP TABLE dbo.Parent;

CREATE TABLE dbo.Parent
(
    parent_id int NOT NULL
        CONSTRAINT Parent_pk
        PRIMARY KEY CLUSTERED
    , parent_data varchar(50) NOT NULL
);

CREATE TABLE dbo.Child
(
    child_id int NOT NULL
        CONSTRAINT Child_pk
        PRIMARY KEY CLUSTERED
    , parent_id int NOT NULL
    , child_data varchar(10) NOT NULL
);

INSERT INTO dbo.Parent (parent_id, parent_data)
VALUES (1, 'A')
    , (2, 'B');

INSERT INTO dbo.Child (child_id, parent_id, child_data)
VALUES (1, 1, 'C')
    , (2, 1, 'D')
    , (3, 2, 'E')
    , (4, 2, 'F')
    , (5, 3, 'G')
    , (6, 3, 'H');

SELECT *
FROM dbo.Child;
╔══════════╦═══════════╦════════════╗
║ child_id ║ parent_id ║ child_data ║
╠══════════╬═══════════╬════════════╣
║        1 ║         1 ║ C          ║
║        2 ║         1 ║ D          ║
║        3 ║         2 ║ E          ║
║        4 ║         2 ║ F          ║
║        5 ║         3 ║ G          ║
║        6 ║         3 ║ H          ║
╚══════════╩═══════════╩════════════╝

Now, if we want to delete all rows from dbo.Child, we might use this delete statement that deletes all child rows that have a parent in the parent table:

DELETE FROM dbo.Child
FROM dbo.Child c
INNER JOIN dbo.Parent p ON c.parent_id = p.parent_id;

If we had a properly defined foreign-key in the dbo.Child table, the 5th and 6th rows would never be able to exist, however since we don't have good relational integrity, rows can exist in the child table that have no parent, as you can see from this SELECT statement ran after the above delete runs:

SELECT *
FROM dbo.Child;
╔══════════╦═══════════╦════════════╗
║ child_id ║ parent_id ║ child_data ║
╠══════════╬═══════════╬════════════╣
║        5 ║         3 ║ G          ║
║        6 ║         3 ║ H          ║
╚══════════╩═══════════╩════════════╝

When SQL Server detects corruption in the primary database (the one that is currently on-line and serving clients), it will transfer the page that is corrupted from the mirror database, via Automatic Page Repair, in an attempt to fix the database automatically. Automatic Page Repair has been available since SQL Server 2005 Service Pack 1, when mirroring officially became a supported option.

Since you've ran DBCC CHECKDB with no corruption reported, I'd say with near 100% certainty that you're not experiencing corruption.