Sql-server – Online page restore hitting 1000 limit

recoveryrestoresql serversql server 2014t-sql

I've been tasked to try to recover a database which suffered from corruption (due to I/O failure, which has been fixed since). I'm not familiar with the database or what it contains.

I've been given an old (~3 weeks) full backup and a series of transaction logs… however there are missing transaction logs, so I can only recover up to a certain date. There's like 2.5 weeks of data missing (and there's a lot of data being added to this database constantly).

I also have been given a copy of the corrupt database (which is accessible, but with a lot of pages corrupt/missing).

I've tried the typical DBCC CHECKDB commands (still no repair_allow_data_loss, that will be my last resort if nothing else works).

After many comes and goes to the database (the db is a 1.5 terabyte little monster and everything I do is slow and takes a while), I've tried to do an online page restore from the last known good backup for the corrupt pages.

To do that, I've done an script that creates many RESTORE DATABASE <foo> PAGE='pages' FROM DISK='<bar.bak>' commands from the DBCC CHECKDB output (básically a regex and a distinct)… so far so good, this worked up to a point where it said I had reached a limit of 1000 pages per file (there are 8 files on this db) per restore command.

So it asks me to "complete the online restore", but I'm at a loss at how to do that… I don't have a tail log or anything more complete than the full backup I'm starting with, so I basically don't know how to complete the restore to keep trying with the rest of pages.

I've tried a RESTORE DATABASE <foo> WITH RECOVERY but that didn't work either, it asks me for a log which I don't have.

Does anyone have any tips on how I could try to recover anything from here? Or how to "complete" the online restore so I can keep trying to recover more pages? Would I have the same problem if I try an offline restore (basically adding WITH NORECOVERY to everything and then try to bring it back at the end?)

Working out the database by hand is basically undoable… there are hundreds of tables with millions of rows and there's no clear meaning of what any of it are. The corrupt DB will fail on SELECT queries after some million of rows but I'm not sure I can work out where. I've tried rebuilding all non-clustered indexes, but there are corrupt pages with row data, so that didn't work either.

Some data loss would be acceptable, but consistency on the DB should at least try to be achieved.

The corrupt database is -still- online and clients are working on it (so it keeps getting new data), so any process I do on the lab bench should be reproducible on the production database afterwards (downtime will be hard for it).

This is SQL Server 2014 Enterprise

PS: I'm no DBA… I'm a programmer, but the client has tried some "expert" sql disaster recovery services and they have given up, so I've been asked to look at it and see if I could do anything.

Update: after many tests, the page by page restoring was a no-go, so we've ditched the idea. We are going for a manual recovery (manually selecting missing records from the corrupt tables and inserting them in the last known good backup), doing some automated tools for it (again, there are hundreds and hundreds of tables).

Best Answer

The standard procedure would be to:

Obtain the page IDs that have to be restored.
Start a page restore with a full database.
Apply the most recent differential backup.
Apply subsequent log backups.
Create new log backup.
Restore the new lob backup.

After the new log backup has been applied, the page restore is completed and the pages are then usable.

Example Restore

RESTORE DATABASE <database> PAGE='1:57, 1:202, 1:916, 1:1016'  
   FROM <file_backup_of_file_B>   
   WITH NORECOVERY;  
RESTORE LOG <database> FROM <log_backup>   
   WITH NORECOVERY;  
RESTORE LOG <database> FROM <log_backup>   
   WITH NORECOVERY;   
BACKUP LOG <database> TO <new_log_backup>;   
RESTORE LOG <database> FROM <new_log_backup> WITH RECOVERY;  
GO

Reference: Restore Pages (SQL Server) (Microsoft Docs)
Reference: RESTORE Statements (Transact-SQL) (Microsoft Docs)

However, you have holes in your TLOG backups, and restoring with above procedure might bring your database back into a state in time that you do not desire.

You are in a complicated situation.

Your database has corrupt pages and your company is constantly adding new data to a database with issues. This could result in a total downtime of the database. Do you want to risk that?
Somebody is going to be held responsible and the more you try to fix it, the more management might be inclined to decide that you might be that person in the end. Do you want to risk that?
You are putting yourself in a difficult situation by taking on a role that you were not employed for. You are trying to achieve something that neither your company DBAs nor your external consultant were capable of. While it may seem to be a noble gesture, you are putting yourself at risk. You might have "implicitly promised" something that you will never be able to fulfil. Do you want to risk that?
When somebody working with the database queries data that is corrupt, they are possibly going to receive an error message. Daily work is already being impacted. The longer you wait with the inevitable the more productivity will be impacted. Do you want to risk that? (This question could also be raised with management)
Your company's backup procedure seems to be faulty (otherwise how would TLOG backups be missing?) and you are still running your production database as if there were no issues. Do you want to risk that?

The best recommendation I can give you is to halt production and call Microsoft! Or at least call Microsoft and possibly halt production.

While my writing may seem overly cautious and slightly dramatised from your perspective, I can personally relate to an experience as DBA where data was lost in a similar situation. We only lost half a days data, but we had to re-synchronise a lot of data with surrounding systems.

The longer you wait the more expensive recovery could become.

As for the limitation on page restores, here a quote from the official documentation:

The maximum number of pages that can be restored into any single file in a restore sequence is 1000. However, if you have more than a small number of damaged pages in a file, consider restoring the whole file instead of the pages.

^{(emphasis mine)}

Reference: RESTORE Statements - Arguments (Transact-SQL) (Microsoft Docs)

When all is back to normal, the DBAs and/or external consultants might want to consider implementing a different backup/restore policy/procedure for your database. As it has to be up 7x24 you can't risk having a backup procedure that doesn't provide adequate restore capabilities for any situation.

Best Answer

Example Restore

Related Solutions

Sql-server – What data will be lost with DBCC CHECKDB (REPAIR_ALLOW_DATA_LOSS)

Sql-server – SQL Server 2005 Restore Backup Status Recovery Pending

Related Question