Sql-server – SQL server on VMWare restart loses five days of data

sql-server-2008-r2vmware

I am just now in the midst of a strange issue. A client moved their SQL database server from Hyper-V to VMWare on Saturday. Yesterday they called me complaining about performance. In the brief time and limited access I had I was able to see a lot of indicators of poor I/O (and many other problems) so I made a few recommendations.

They restarted their SQL server to add on the VMWare tools and take care of a few other tasks. Upon restart their data reverted (from their perspective) to the moment they brought it up on the new VM. They found that the new database was pointed to the old iSCSI drives instead of the new vmdk files and wasn't writing anything at all to disk. They tried restoring their backup but it fails every time before completion.

Did the database just fill up the dirty cache, never writing to disk? How in the world did the database continue to function for five days without presenting anything more than slow I/O to the end users? A complicating factor is they didn't have any alerts set up on the database that might have warned them of these kinds of problems. The event logs on the server itself show that there were numerous write errors.

Best Answer

The problem was never definitively resolved. What was found was that at some point their database was completely truncated by an unknown command. Every time they tried to restore the database and roll forward, they were re-truncating their DB.

They started rolling forward a little bit at a time until they found the moment when it went bad. At that point they cut me out of the troubleshooting loop so I was not able to participate in the post-mortem. My guess is that someone did something really stupid on the database and they didn't want me to dig too deeply.