Sql-server – Recover Instance after Log File disk error

This is mainly theoretical, but I would like to have a documented list of options if this happens in the future.

Today we had a critical disk error on the SAN, meaning the disk holding the Transaction Log files for one of our production instances crashed and initially looked like it was dead. Obviously, down go the instance, databases, and therefore the applications running on it.

Our data center guys were busy working on the what, why and how's of the disk failure, meanwhile I was quickly coming up with a list of options for database recovery.

Ok, so the Data Center guys recovered the disk. It was a VPLEX error, rather than a physical hardware fault.

But what I found meanwhile was that I didn't have a great deal of options. The instance would not start, given that all log files for both Sys and User databases were out of reach. Would the instance have restarted if the Sys database log files were on a separate disk that was 'Up'

I could access the .mdf files, so I had the option of copying them over to another server, then attaching them with new log files on another volume. Either that or restoring the databases to another server\instance using our fairly resilient backups. Either option meant work for the App guys, because all the applications and associated services would need to re pointed.

I had another option of trashing the instance on the server and reinstalling it with the same instance name, then restoring all the databases from full ad log backups. Theoretically, this meant no work for the App team but had a serious time overhead for me (the only DBA).

Am I missing any options here? I only started this job recently, and its fair to say documentation here is limited. I've been busy this past few months putting together inventories of our SQL Estate, looking at patching/upgrade gaps etc, and getting involved in several projects. I think its fair to say that a documented disaster recovery plan for just this type of scenario is now at the top of our hierarchy's agenda.

Any help appreciated.

Best Answer

Disaster Recovery Plan depends on

1.Acceptable Downtime
2.Acceptable Dataloss

Any Disaster recovery plan you devise will revolve around the above two points only

Either option meant work for the App guys, because all the applications and associated services would need to re pointed

Theoretically, this meant no work for the App team but had a serious time overhead for me (the only DBA).

your Disaster Recovery option doesn't depend on the amount of work involved and who has what share in that..

You will have to sit together with Business Guys and get Clarity on what is the downtime and dataloss they can afford .Based on this you have to devise plans(using High availability options) and test it(Mock Drills) from time to time.

Best Answer

Related Solutions

Sql-server – Log File Management – Space on Disk

SQL Server – Single vs Multiple Virtual Drives for Database Files

Related Question