Sql-server – Databases in Suspect state after enabling TDE

sql serversql-server-2008-r2transparent-data-encryption

A few of my databases are in a suspect state. How can I get them online? I don’t know what info is relevant and what isn’t, but I’ll give as much as I can. Here is what happened:

I enabled TDE in about 100 databases by the below. (The cert tde_certificate was created on another SQL server, same version, if it matters.)

use master
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'StrongPass'
--the key/cert was generated on another sql server (same version), if it matters
CREATE CERTIFICATE tde_certificate
FROM FILE = '\\dir\cert.bak'
WITH PRIVATE KEY (
file = '\\dir\key.pvk'
,DECRYPTION BY PASSWORD = 'StrongPass')

--then for each database:
use [SomeDb];
create database encryption key with algorithm = aes_256
encryption by server certificate tde_certificate

alter database [SomeDb] set encryption on

Then TDE appeared to be “stuck” at 0 percent, for all TDE enabled databases. It was in this state for a while, at least a couple hours (some dbs are really small too).

The error message in log showed:

Please create a master key in the database or open the master key in the session before performing this operation. Database encryption scan for database ‘dbname’ was aborted. Reissue ALTER DB to resume the scan.

After there was a failover to the other node. All databases, except for the system and non-tde enabled ones were stuck in "RECOVERY_PENDING”.

I was able to get most of the databases online by opening the master encryption key and setting the db’s online by:

use master
open master key decryption by password = 'ReallyStrongPassword'
alter database [SomeDb] set online

This worked for all of the databases except for 3. Those db’s are now in a SUSPECT state.

I tried a repair with no luck:

use master
open master key decryption by password = 'ReallyStrongPassword'

ALTER DATABASE [SomeDb] set emergency
ALTER DATABASE [SomeDb] set single_user
DBCC CHECKDB ([SomeDb], REPAIR_ALLOW_DATA_LOSS) WITH ALL_ERRORMSGS;
alter DATABASE [SomeDb] SET multi_user
alter DATABASE [SomeDb] SET online

Results:
Msg 926, Level 14, State 1, Line 30
Database 'SomeDb' cannot be opened. It has been marked SUSPECT by recovery. See the SQL Server errorlog for more information.
DBCC results for 'SomeDb'.
CHECKDB found 0 allocation errors and 0 consistency errors in database 'SomeDb'.

Also, I tried to restart the encryption scan, no luck:

use master
open master key decryption by password = 'StrongPass'
DBCC TRACEON(5004)
DBCC TRACEOFF(5004)

exec sys.sp_resetstatus
    @DBName = 'SomeDb'

ALTER DATABASE SomeDb
SET ENCRYPTION on

Result:
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
Warning: You must recover this database prior to access.

Msg 5069, Level 16, State 1, Line 131
ALTER DATABASE statement failed.
Msg 9001, Level 21, State 1, Line 131
The log for database 'SomeDb' is not available. Check the event log for related error messages. Resolve any errors and restart the database.

I noticed that whenever I fail over to the other node, all databases (except system) are all back in a RECOVERY_PENDING state. I need to open the encryption key and explicitly set the database online to get them online (all except for the 3 SUSPECT dbs). This happens every time.

I am on SQL Server 2008 R2 Enterprise, clustered with 2 nodes. I was able to enable TDE using the same cert in 4 other environments without any issues.

Any ideas on what I can do to get these databases online?

Edit: Here are the results from the error log. I filtered it down to "one of each" databases. One that was suspect and another that was in a recovery_pending state.

http://pastebin.com/wasNk7Xw

Edit: The files referenced within sys.master_files physically exist. I noticed that the STATE = ONLINE as well, sys.databases.state_desc = SUSPECT.

Best Answer

For the 3 suspect databases can you check the log files referred to in sys.master_files physically exist on disk and have the same access permissions as the other working database log files? Just that one of the error messages you posted said the log file is missing, and unless it's a red herring, that would make their problem separate to your TDE issues and explain the suspect status.

I've noticed this in your log: 04/13/2016 15:18:23,spid61,Unknown,Database SuspectDb was shutdown due to error 9001 in routine 'XdesRMReadWrite::RollbackToLsn'.

This can happen if there are transactions in the log file which cannot be rolled back due to a lack of space, forcing the db into suspect mode. Can you confirm the underlying disk has free space, and also that you have not set and hit a size limit on that database's files (e.g. that auto-grow is enabled and set to grow by a sane fixed amount)?