Sql-server – Shrink database operation in maintenance plan failed

maintenancesql server

See updates below

I have a nightly DB maintenance plan and it's causing some strange behaviour.

About 1 out of 10 runs it causes the DB to not response to queries. It takes about 5 minutes for the DBM plan to complete at which point it starts responding.

The error logs don't seem to point to any problem at all:

2013-01-13 00:00:56.73 spid23s     This instance of SQL Server has been using a process ID of 1488 since 12/28/2012 10:06:36 AM (local) 12/28/2012 6:06:36 PM (UTC). This is an informational message only; no user action is required.
2013-01-13 22:00:06.07 spid60      Configuration option 'user options' changed from 0 to 0. Run the RECONFIGURE statement to install.
2013-01-13 22:00:06.07 spid60      FILESTREAM: effective level = 0, configured level = 0, file system access share name = 'MSSQLSERVER'.
2013-01-13 22:01:03.78 spid64      DBCC CHECKDB (RackAttackNorthAmerica) WITH no_infomsgs executed by sa found 0 errors and repaired 0 errors. Elapsed time: 0 hours 0 minutes 57 seconds.  Internal database snapshot has split point LSN = 0000942b:000000d9:0001 and first LSN = 0000942b:000000d8:0001.
2013-01-13 22:02:33.13 spid80      I/O is frozen on database model. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.
2013-01-13 22:02:33.13 spid81      I/O is frozen on database msdb. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.
2013-01-13 22:02:33.13 spid82      I/O is frozen on database RackAttackNorthAmerica. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.
2013-01-13 22:02:34.13 spid83      I/O is frozen on database master. No user action is required. However, if I/O is not resumed promptly, you could cancel the backup.
2013-01-13 22:02:34.48 spid83      I/O was resumed on database master. No user action is required.
2013-01-13 22:02:34.48 spid80      I/O was resumed on database model. No user action is required.
2013-01-13 22:02:34.48 spid81      I/O was resumed on database msdb. No user action is required.
2013-01-13 22:02:34.70 spid82      I/O was resumed on database RackAttackNorthAmerica. No user action is required.
2013-01-13 22:02:35.07 Backup      Database backed up. Database: master, creation date(time): 2012/12/28(10:05:56), pages dumped: 379, first LSN: 685:336:83, last LSN: 685:384:1, number of dump devices: 1, device information: (FILE=1, TYPE=VIRTUAL_DEVICE: {'{6E5AF82F-6BBD-4343-9DA3-286FD7EA5C0E}4'}). This is an informational message only. No user action is required.
2013-01-13 22:02:35.56 Backup      Database backed up. Database: model, creation date(time): 2003/04/08(09:13:36), pages dumped: 170, first LSN: 101:1440:37, last LSN: 101:1464:1, number of dump devices: 1, device information: (FILE=1, TYPE=VIRTUAL_DEVICE: {'{6E5AF82F-6BBD-4343-9DA3-286FD7EA5C0E}1'}). This is an informational message only. No user action is required.
2013-01-13 22:02:35.69 Backup      Database backed up. Database: msdb, creation date(time): 2008/07/09(16:46:27), pages dumped: 1948, first LSN: 1647:152:131, last LSN: 1647:216:1, number of dump devices: 1, device information: (FILE=1, TYPE=VIRTUAL_DEVICE: {'{6E5AF82F-6BBD-4343-9DA3-286FD7EA5C0E}2'}). This is an informational message only. No user action is required.
2013-01-13 22:02:35.77 Backup      Database backed up. Database: RackAttackNorthAmerica, creation date(time): 2011/01/21(10:37:38), pages dumped: 193713, first LSN: 37940:102:1, last LSN: 37946:493:1, number of dump devices: 1, device information: (FILE=1, TYPE=VIRTUAL_DEVICE: {'{6E5AF82F-6BBD-4343-9DA3-286FD7EA5C0E}3'}). This is an informational message only. No user action is required.
2013-01-13 22:06:48.94 Backup      Database backed up. Database: RackAttackNorthAmerica, creation date(time): 2011/01/21(10:37:38), pages dumped: 168202, first LSN: 37974:59:36, last LSN: 37974:83:1, number of dump devices: 1, device information: (FILE=1, TYPE=DISK: {'D:\DB_Backups\RackAttackNorthAmerica_backup_2013_01_13_220612_2311062.bak'}). This is an informational message only. No user action is required.

I'm not quite sure what info is needed to diagnose the problem.

Please let me know what you need and I'll do my best to get it.

I managed to find this error, not sure if it helps.

Executing the query "DBCC SHRINKDATABASE(N'RackAttackNorthAmerica', 10,..." failed with the following error: "Backup, file manipulation operations (such as ALTER DATABASE ADD FILE) and encryption changes on a database must be serialized. Reissue the statement after the current backup or file manipulation operation is completed.". Possible failure reasons: Problems with the query, "ResultSet" property not set correctly, parameters not set correctly, or connection not established correctly.

dm_os_waiting_tasks

waiting_task_address    session_id  exec_context_id wait_duration_ms    wait_type   resource_address    blocking_task_address   blocking_session_id blocking_exec_context_id    resource_description
0x0000000003FAE508  3   0   4907732 XE_DISPATCHER_WAIT  NULL    NULL    NULL    NULL    NULL
0x0000000003FAE988  15  0   11555964    BROKER_EVENTHANDLER NULL    NULL    NULL    NULL    NULL
0x00000000040E8748  5   0   33  LAZYWRITER_SLEEP    NULL    NULL    NULL    NULL    NULL
0x0000000004102508  6   0   9962    LOGMGR_QUEUE    0x0000000003413528  NULL    NULL    NULL    NULL
0x0000000004102748  17  0   1480291018  BROKER_TRANSMITTER  NULL    NULL    NULL    NULL    NULL
0x000000000411C508  4   0   199 REQUEST_FOR_DEADLOCK_SEARCH 0x00000000089A01F8  NULL    NULL    NULL    NULL
0x000000000411C748  10  0   741 SLEEP_TASK  NULL    NULL    NULL    NULL    NULL
0x0000000004136508  7   0   1480284374  KSOURCE_WAKEUP  NULL    NULL    NULL    NULL    NULL
0x0000000004150508  12  0   55058   CHECKPOINT_QUEUE    0x00000000034135A0  NULL    NULL    NULL    NULL
0x0000000004150BC8  NULL    NULL    12814   FT_IFTS_SCHEDULER_IDLE_WAIT NULL    NULL    NULL    NULL    NULL
0x000000000416A508  11  0   1480314263  ONDEMAND_TASK_QUEUE 0x00000000033F3BE0  NULL    NULL    NULL    NULL
0x000000000416A988  16  0   1480291026  BROKER_TRANSMITTER  NULL    NULL    NULL    NULL    NULL
0x0000000004904748  2   0   17621   XE_TIMER_EVENT  NULL    NULL    NULL    NULL    NULL
0x00000000804222C8  9   0   705 SQLTRACE_BUFFER_FLUSH   NULL    NULL    NULL    NULL    NULL

Update:

DB Shrink removed. Still hanging. I'm 99% it's the rebuild index task. Here are a few example lines from the generated SQL

ALTER INDEX [childCategories] ON [dbo].[category] REBUILD PARTITION = ALL WITH ( PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = OFF, ONLINE = OFF, SORT_IN_TEMPDB = OFF ) 
ALTER INDEX [mfgID] ON [dbo].[category] REBUILD PARTITION = ALL WITH ( PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = OFF, ONLINE = OFF, SORT_IN_TEMPDB = OFF ) 

Update 2013-07-09: So I'm continuing to have problems with this maintenance plan. The shrink operation was removed a few months ago and the rebuild index task was modified. I'm continuing to have failures and the web site that is driven by this DB continues to go down for a few minutes on each failure. I managed to dig up the job history task and it appears that the update statistics task is now failing. See log below:

Executed as user: WORKGROUP\WNN1106$. ...
Source: Update Statistics Task

Executing query "use [RackAttackNorthAmerica]  ".: 31% complete  End Progress  Progress: 2013-07-08 23:04:37.03
  Source: Update Statistics Task

Executing query "UPDATE STATISTICS [dbo].[news]
WITH FULLSCAN  ".: 32% complete  End Progress  Progress: 2013-07-08 23:04:37.03
  Source: Update Statistics Task

Executing query "use [RackAttackNorthAmerica]  ".: 33% complete  End Progress  Progress: 2013-07-08 23:04:42.65
  Source: Update Statistics Task

Executing query "UPDATE STATISTICS [dbo].[orderItems]
WITH FULLSC...".: 33% complete  End Progress  Progress: 2013-07-08 23:04:42.65
  Source: Update Statistics Task

Executing query "use [RackAttackNorthAmerica]  ".: 34% complete  End Progress  Progress: 2013-07-08 23:04:42.74
  Source: Update Statistics Task

Executing query "UPDATE STATISTICS [dbo].[orderItemType]
WITH FUL...".: 35% complete  End Progress  Progress: 2013-07-08 23:04:42.74
  Source: Update Statistics Task

Executing query "use [RackAttackNorthAmerica]  ".: 36% complete  End Progress  Progress: 2013-07-08 23:04:42.74
  Source: Update Statistics Task

Executing query "UPDATE STATISTICS [dbo].[orderPaymentType]
WITH ...".: 37% complete  End Progress  Progress: 2013-07-08 23:04:42.74
  Source: Update Statistics Task

Executing query "use [RackAttackNorthAmerica]  ".: 37% complete  End Progress  Progress: 2013-07-08 23:05:43.25
  Source: Update Statistics Task

Executing query "UPDATE STATISTICS [dbo].[orders]
WITH FULLSCAN  ".: 38% complete  End Progress  Progress: 2013-07-08 23:05:43.25
  Source: Update Statistics Task

Executing query "use [RackAttackNorthAmerica]  ".: 39% complete  End Progress  Progress: 2013-07-08 23:05:43.30
  Source: Update Statistics Task

Executing query "UPDATE STATISTICS [dbo].[orders_deleted]
WITH FU...".: 40% complete  End Progress  Progress: 2013-07-08 23:05:43.30
  Source: Update Statistics Task

Executing query "use [RackAttackNorthAmerica]  ".: 41% complete  End Progress  Progress: 2013-07-08 23:05:43.30
  Source: Update Statistics Task

Executing query "UPDATE STATISTICS [dbo].[orderSource]
WITH FULLS...".: 41% complete  End Progress  Progress: 2013-07-08 23:05:43.30
  Source: Update Statistics Task

Executing query "use [RackAttackNorthAmerica]  ".: 42% complete  End Progress  Progress: 2013-07-08 23:05:43.55
  Source: Update Statistics Task

Executing query "UPDATE STATISTICS [dbo].[orderSplit]
WITH FULLSC...".: 43% complete  End Progress  Progress: 2013-07-08 23:05:43.55
  Source: Update Statistics Task

Executing query "use [RackAttackNorthAmerica]  ".: 44% complete  End Progress  Progress: 2013-07-08 23:05:43.60
  Source: Update Statistics Task

Executing query "UPDATE STATISTICS [dbo].[paymentProcessType]
WIT...".: 45% complete  End Progress  Progress: 2013-07-08 23:05:43.60
  Source: Update Statistics Task

Executing query "use [RackAttackNorthAmerica]  ".: 45% complete  End Progress  Progress: 2013-07-08 23:05:43.64
  Source: Update Statistics Task

Executing query "UPDATE STATISTICS [dbo].[priceHistory]
WITH FULL...".: 46% complete  End Progress  Progress: 2013-07-08 23:05:43.64
  Source: Update Statistics Task

Executing query "use [RackAttackNorthAmerica]  ".: 47% complete  End Progress  Progress: 2013-07-08 23:05:43.71
  Source: Update Statistics Task

Executing query "UPDATE STATISTICS [dbo].[priceUpdate]
WITH FULLS...".: 48% complete  End Progress  Progress: 2013-07-08 23:05:43.71
  Source: Update Statistics Task

Executing query "use [RackAttackNorthAmerica]  ".: 49% complete  End Progress  Progress: 2013-07-08 23:05:43.75
  Source: Update Statistics Task

Executing query "UPDATE STATISTICS [dbo].[priceUpdateCAD]
WITH FU...".: 50% complete  End Progress  Progress: 2013-07-08 23:0...  The package execution fa...  The step failed.

Should I not be doing an update statistics task in my nightly backup? Or is this pointing to a different issue?

Update 2013-07-09:

The maintenance plan logs to file.

Here is an entry from 2013-07-08:

Execute T-SQL Statement Task (WNN1106)
Execute TSQL on Local server connection
Execution time out: 120
Task start: 2013-07-08T23:01:10.
Task end: 2013-07-08T23:03:11.
Failed:(-1073548784) Executing the query "ALTER INDEX [email] ON [dbo].[customerFeedback] RE..." failed with the following error: "Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.". Possible failure reasons: Problems with the query, "ResultSet" property not set correctly, parameters not set correctly, or connection not established correctly.

Related SQL:

ALTER INDEX [email] ON [dbo].[customerFeedback] REBUILD PARTITION = ALL WITH ( PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = OFF, ONLINE = OFF, SORT_IN_TEMPDB = OFF )

Here is an entry from 2013-07-04:

Failed:(-1073548784) Executing the query "ALTER INDEX [cEmail] ON [dbo].[orders] REBUILD PAR..." failed with the following error: "Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.". Possible failure reasons: Problems with the query, "ResultSet" property not set correctly, parameters not set correctly, or connection not established correctly.

Related SQL:

ALTER INDEX [cEmail] ON [dbo].[orders] REBUILD PARTITION = ALL WITH ( PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = OFF, ONLINE = OFF, SORT_IN_TEMPDB = OFF )

Here is an entry from 2013-07-03:

Failed:(-1073548784) Executing the query "ALTER INDEX [email] ON [dbo].[customerFeedback] RE..." failed with the following error: "Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.". Possible failure reasons: Problems with the query, "ResultSet" property not set correctly, parameters not set correctly, or connection not established correctly.

Related SQL:

ALTER INDEX [email] ON [dbo].[customerFeedback] REBUILD PARTITION = ALL WITH ( PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = OFF, ONLINE = OFF, SORT_IN_TEMPDB = OFF )

All of the maintenance plan tasks are generated by the GUI wizard tool. As you can see they fail on different steps everytime.

What should I do here?

Should I increase the time out? Or will that just take longer to fail?

Should I modify the queries? ALLOW_ROW_LOCKS = OFF ? ONLINE = ON?

Should I remove the rebuild index task completely?

Should I remove the update statistics task completely?

Thanks for your input,

Tomas

Best Answer

What you're seeing is blocking as a result of the DBCC SHRINKDB operation. That is causing your database not to respond. Shrink operations move your data pages around on disk in order to create space within the data files to shrink. As a result, the engine must take either page or table locks in order to move the appropriate data pages. While these should be short term, they can block queries.

Additionally, shrink operations are file manipulations (just as adding, modifying, or removing data files) and can not run while a backup of any sort is executing. This is why you see your other error regarding file commands must be serialized.

To observe the blocking, you can run sp_who2 (or better yet, sp_whoisactive) while your DBCC SHRINKDB is running. Identify the spid the command is running under and you should see it causing blocking (if there's enough activity).

Unfortunately, there's no way to avoid the blocking. This, along with other reasons, is why you should not perform any sort of automated shrink operation. The best plan for resolution is to remove any "Shrink Database" tasks from your maintenance plans.

Best practices are to avoid shrinking databases whenever possible, per Paul Randall's excellent article that was linked by JohnS. If you must shrink your files, you should use the following rules of thumb:

  1. Perform shrinking manually so you can monitor it's progress and manage as necessary. If it is interrupting user processes, you can kill the shrink with minimal impact. Interrupting a shrink will not "rollback" any of the page movement and you will retain any progress the shrink has already made.
  2. Keep your shrink increments to small values so as to minimize page movement and keep your maintenance/shrink period manageable.