Sql-server – Unable to restore (error 3456)

recoveryrestoresql serversql-server-2008-r2

I have a situation that is not easy to figure out, and thought I'd ask on this forum if others might have suggestions.

I'm running SQL Server 2008 R2 Standard SP3 on Windows Server 2008R2 Enterprise.

A database needed some maintenance, and after the fact I needed to restore on another server. I have a full db backup done with COPY_ONLY plus a set of 4 tlog backups.

  1. before starting, create tlogbackup1
  2. change from FULL to BULK_LOGGED recovery model
  3. add new filegroup
  4. add file to newfilegroup
  5. set newfilegroup to be default
  6. select into table (on newfilegroup)
  7. drop original table
  8. delete original file
  9. delete original filegroup
  10. change name of new table to match original table
  11. change file name of newfilegroup to match original filegroup
  12. change file name in catalog to match original file name
  13. change file name at OS level to match original file name
  14. set default filegroup to be the original
  15. bring db online
  16. change from BULK_LOGGED to FULL recovery model
  17. After all steps have completed, create tlogbackup2

The restore of all backups must use WITH MOVE, due to drive letter changes on the restore server.

Recovery steps:

RESTORE database SomeDB FROM DISK = 'D:\REPRO\SomeDB.bak'   
WITH 
MOVE 'SystemData' TO 'D:\SQLDATA\SomeDB.mdf'
,MOVE 'SystemDataPDS' TO 'D:\SqlData\SomeDB.ndf'
,MOVE 'SystemData_log' TO 'D:\SQLLogs\SomeDB.LDF'
,NORECOVERY
,stats = 1

RESTORE LOG SomeDB FROM DISK = 'D:\REPRO\tlogbackup1.trn'   
WITH 
MOVE 'SystemData' TO 'D:\SQLDATA\SomeDB.mdf'
,MOVE 'SystemDataPDS' TO 'D:\SqlData\SomeDB.ndf'
,MOVE 'SystemData_log' TO 'D:\SQLLogs\SomeDB.LDF'
,NORECOVERY
,stats = 1

RESTORE LOG SomeDB FROM DISK = 'D:\REPRO\tlogbackup2.trn'   
WITH 
MOVE 'SystemData' TO 'D:\SQLDATA\SomeDB.mdf'
,MOVE 'SystemDataPDS' TO 'D:\SqlData\SomeDB.ndf'
,MOVE 'SystemData_log' TO 'D:\SQLLogs\SomeDB.LDF'
,NORECOVERY
,stats = 1

The final tlog restore gets to 100% and then fails with error 3456:

Processed 368 pages for database 'SomeDB', file 'SystemData' on file 1.

Processed 7656520 pages for database 'SomeDB', file 'SystemDataPDS' on file 1.

Processed 172430 pages for database 'SomeDB', file 'SystemData_log' on file 1.

Msg 3456, Level 16, State 1, Line 1
Could not redo log record (210388:123648:232), for transaction ID (0:1016710921), on page (4:8088), database 'SomeDB' (database ID 6). Page: LSN = (0:0:1), type = 11. Log: OpCode = 4, context 11, PrevPageLSN: (210388:122007:1). Restore from a backup of the database, or repair the database. Msg 3013, Level 16, State 1, Line 1 RESTORE LOG is terminating abnormally.

Just to verify that the full db backup was ok, I restored it ran CHECKDB, and there were no errors.

All feedback welcomed.

Thanks in advance,

Ned Otter

Best Answer

In order to understanding why error 3456 would be thrown, we need to take a little step back and understand how SQL Server handles this corner of recovery.

When SQL Server is redoing an operation, and that redo is a page modification, it makes a quick check. In the page header there is ultimately going to be a PageLSN, which is an indication of the last LSN that has modified that page, recorded by the page. Think about it like this, the page keeps track of the last LSN that has made modifications to it. This is the PageLSN.

Every time there is a logged page modification operation, that log record includes a few LSNs. Namely, the log record's LSN (think... Current LSN), and then it has what's called the Previous Page LSN (PrevPageLSN going forward). So when we modify a page, one of the pieces of data that is put into the log record is what the page indicates as being the last LSN before you to have modified the page.

Think about it like this... Your car needs to have work done on it. Mechanic John works on your car, and in the engine bay it has a little tag and Mechanic John writes "John worked on this car last". Then the next time you take your car in to another shop, Mechanic Mark looks in the engine bay and sees that Mechanic John worked on this car last. On his data sheet he writes this information. Same idea with SQL Server.

This can be somewhat confusing, so take a look at this image below on sequential page modifications, and how the PageLSN and PrevPageLSN relate:

enter image description here

Let's loop back around, as this all comes into play when you need to redo an operation on a page (restores, recovery, HA, etc.). When SQL Server needs to redo a page operation, it makes a sanity check to see if the PageLSN on the page matches the PrevPageLSN that the log record includes. If that is not equal, then you will see error 3456 get thrown.

Does PageLSN equal PrevPageLSN? No??? Stop and raise error 3456...

Let's analyze your error message, which includes the how:

Could not redo log record (210388:123648:232), for transaction ID (0:1016710921), on page (4:8088), database 'SomeDB' (database ID 6). Page: LSN = (0:0:1), type = 11. Log: OpCode = 4, context 11, PrevPageLSN: (210388:122007:1). Restore from a backup of the database, or repair the database. Msg 3013, Level 16, State 1, Line 1 RESTORE LOG is terminating abnormally.

I have bold'd the two pieces of data that have an inequality causing the error. You can see that our PageLSN is 0:0:1 (this was found in the page's header), and our PrevPageLSN is 210388:122007:1 (this was found in the data on the log record that was attempting to be redone). These are obviously not equal, hence err3456.

So in order to find out the why of this event, would be to find out why there is a disparity here. We really need to trace the lifecycle of page 4:8088 and see where the disconnect is. Unfortunately without further information, or hands-on troubleshooting there isn't much else I can do besides give you the background of this recovery operation and what causes the error.