Sql-server – Corrupt DB for a noob

corruptionsql server

Please help me! I am not a DB admin, and I have been thrust into the role after taking a new job. I need as much step by step as you can give me…

I've already done the dbcc checkdb(‘DB-NAME’,REPAIR_REBUILD) and the dbcc checkdb(‘DB-NAME’,REPAIR_ALLOW_DATA_LOSS) to no avail.

I'm in a real pickle here.

This is the dbcc checkdb(‘DB-NAME’) with no_infomsgs result.

Msg 8966, Level 16, State 2, Line 1 Unable to read and latch page
(1:1379395) with latch type SH. 1(Incorrect function.) failed. Msg
8966, Level 16, State 2, Line 1 Unable to read and latch page
(1:1379938) with latch type SH. 1(Incorrect function.) failed. Msg
8966, Level 16, State 2, Line 1 Unable to read and latch page
(1:1379954) with latch type SH. 1(Incorrect function.) failed. Msg
8966, Level 16, State 2, Line 1 Unable to read and latch page
(1:1380221) with latch type SH. 1(Incorrect function.) failed. Msg
8966, Level 16, State 2, Line 1 Unable to read and latch page
(1:1380237) with latch type SH. 1(Incorrect function.) failed. CHECKDB
found 0 allocation errors and 5 consistency errors not associated with
any single object. Msg 2533, Level 16, State 1, Line 1 Table error:
page (1:1380237) allocated to object ID 1345738380, index ID 2,
partition ID 72057603738501120, alloc unit ID 72057603729653760 (type
In-row data) was not seen. The page may be invalid or may have an
incorrect alloc unit ID in its header. Msg 8976, Level 16, State 1,
Line 1 Table error: Object ID 1345738380, index ID 2, partition ID
72057603738501120, alloc unit ID 72057603729653760 (type In-row data).
Page (1:1380237) was not seen in the scan although its parent
(1:822100) and previous (1:1380236) refer to it. Check any previous
errors. Msg 8978, Level 16, State 1, Line 1 Table error: Object ID
1345738380, index ID 2, partition ID 72057603738501120, alloc unit ID
72057603729653760 (type In-row data). Page (1:1380238) is missing a
reference from previous page (1:1380237). Possible chain linkage
problem. CHECKDB found 0 allocation errors and 3 consistency errors in
table 'ttfgld482500' (object ID 1345738380). Msg 2533, Level 16, State
1, Line 1 Table error: page (1:1379395) allocated to object ID
1761739862, index ID 1, partition ID 72057603740991488, alloc unit ID
72057603732144128 (type In-row data) was not seen. The page may be
invalid or may have an incorrect alloc unit ID in its header. Msg
8976, Level 16, State 1, Line 1 Table error: Object ID 1761739862,
index ID 1, partition ID 72057603740991488, alloc unit ID
72057603732144128 (type In-row data). Page (1:1379395) was not seen in
the scan although its parent (1:2223267) and previous (1:1379394)
refer to it. Check any previous errors. Msg 8978, Level 16, State 1,
Line 1 Table error: Object ID 1761739862, index ID 1, partition ID
72057603740991488, alloc unit ID 72057603732144128 (type In-row data).
Page (1:1379396) is missing a reference from previous page
(1:1379395). Possible chain linkage problem. Msg 2533, Level 16, State
1, Line 1 Table error: page (1:1380221) allocated to object ID
1761739862, index ID 9, partition ID 72057603741515776, alloc unit ID
72057603732668416 (type In-row data) was not seen. The page may be
invalid or may have an incorrect alloc unit ID in its header. Msg
8976, Level 16, State 1, Line 1 Table error: Object ID 1761739862,
index ID 9, partition ID 72057603741515776, alloc unit ID
72057603732668416 (type In-row data). Page (1:1380221) was not seen in
the scan although its parent (1:1402332) and previous (1:1380220)
refer to it. Check any previous errors. Msg 8978, Level 16, State 1,
Line 1 Table error: Object ID 1761739862, index ID 9, partition ID
72057603741515776, alloc unit ID 72057603732668416 (type In-row data).
Page (1:1380222) is missing a reference from previous page
(1:1380221). Possible chain linkage problem. CHECKDB found 0
allocation errors and 6 consistency errors in table 'ttfgld495500'
(object ID 1761739862). Msg 2533, Level 16, State 1, Line 1 Table
error: page (1:1379954) allocated to object ID 1987172708, index ID 1,
partition ID 72057604003069952, alloc unit ID 72057603994222592 (type
In-row data) was not seen. The page may be invalid or may have an
incorrect alloc unit ID in its header. Msg 8976, Level 16, State 1,
Line 1 Table error: Object ID 1987172708, index ID 1, partition ID
72057604003069952, alloc unit ID 72057603994222592 (type In-row data).
Page (1:1379954) was not seen in the scan although its parent
(1:7233319) and previous (1:1379953) refer to it. Check any previous
errors. Msg 8978, Level 16, State 1, Line 1 Table error: Object ID
1987172708, index ID 1, partition ID 72057604003069952, alloc unit ID
72057603994222592 (type In-row data). Page (1:1379955) is missing a
reference from previous page (1:1379954). Possible chain linkage
problem. CHECKDB found 0 allocation errors and 3 consistency errors in
table 'twhltc300500' (object ID 1987172708). Msg 2533, Level 16, State
1, Line 1 Table error: page (1:1379938) allocated to object ID
2056692908, index ID 6, partition ID 72057603715563520, alloc unit ID
72057603706716160 (type In-row data) was not seen. The page may be
invalid or may have an incorrect alloc unit ID in its header. Msg
8976, Level 16, State 1, Line 1 Table error: Object ID 2056692908,
index ID 6, partition ID 72057603715563520, alloc unit ID
72057603706716160 (type In-row data). Page (1:1379938) was not seen in
the scan although its parent (1:1014679) and previous (1:1379937)
refer to it. Check any previous errors. Msg 8978, Level 16, State 1,
Line 1 Table error: Object ID 2056692908, index ID 6, partition ID
72057603715563520, alloc unit ID 72057603706716160 (type In-row data).
Page (1:1379939) is missing a reference from previous page
(1:1379938). Possible chain linkage problem. CHECKDB found 0
allocation errors and 3 consistency errors in table 'ttfgld106500'
(object ID 2056692908). CHECKDB found 0 allocation errors and 20
consistency errors in database 'infordb'. repair_allow_data_loss is
the minimum repair level for the errors found by DBCC CHECKDB
(infordb).

Best Answer

It looks as though you have a problem with your storage subsystem (somewhere from the drivers down to the actual disks, but it could be anywhere in that stack).

The good news:

  1. Most of the corruption is involved in non-clustered indexes. This means that if the underlying table is clean, the indexes can be rebuilt to fix the corruption. Object IDs 1345738380, 1761739862 (this has clustered index corruption as well), 2056692908
  2. It doesn't seem like any PFS/GAM/SGAM/etc system pages were damaged.

The bad news:

  1. There are some pages corrupt in the clustered index. This means there could (most likely will) be data loss without a recent backup. Object ID 1761739862
  2. You don't have a recent backup.
  3. You already ran checkdb with repair_allow_data_loss so there is no way to know what you've already lost or what has already been done which takes you an extra step back.
  4. It looks like there is an issue with the disk subsystem.

Where to go from here:

I'd start by taking a backup with continue_after_error to make sure that you have something of a record. Then restore that backup to an instance on the same patch level as the one it was taken from so that all testing can be completed on a copy of the database and not the database itself.

Use the copy and the older backup of the database you have on a second instance to see what data is lost and what may be able to be manually salvageable. This is time consuming and lengthy, but may be needed. If you're really stuck, call in a consultant to help you with this as corruption is a great thing to cut your teeth on and gives you some semblance of liability.

Check the objects associated with the corruption. If they are tables that don't have any super important data (say dictionary tables that can be rebuilt or don't change much) you might be able to get away with manually fixing the table by using the older backup to script out data.

Check your storage subsystem, make sure you have the latest drivers, firmware, etc. Check for any failed drives in the array/san/nas/etc. Double check Ethernet cables, fibre cables, switches, etc. Find the root cause of the corruption or this may happen again. Run a health check on your storage subsystem/motherboard to make sure something isn't faulty with the hardware or controllers.

Lastly, update your resume.