Sql-server – Long running backups on small databases

backupperformancesql serversql-server-2012

We normally perform backups across the network to a file share on a different SAN. This works very well for all but 1 of our database servers. The problem machine is a new server running Windows Server 2012R2 as the OS, and SQL Server 2012 Standard Edition for the database instance. The network latency between the server and backup share is sub-MS.

The actual backup of the data and log file happen quickly (sub-second), however, it seems to get stuck at 100% completed (when viewed through sp_WhoIsActive) with a BACKUPTHREAD wait type. If I perform the same backup but to a local drive, it does not experience this problem.

Here is a sample backup command we are running for the MODEL (only 5MBs) database:

USE MASTER
GO

BACKUP DATABASE MODEL
TO  DISK = N'\\<FileShareServer>\SQL_Backup\MODEL.BAK'     
WITH  NAME = N'MODEL-Full Database Backup'
    ,NOFORMAT, INIT,  SKIP, NOREWIND, NOUNLOAD,  STATS = 10
    ,CHECKSUM, COPY_ONLY, COMPRESSION;

GO

Here are the stats for the backup, with Trace Flag 3604, 3213 turned on:

Backup/Restore buffer configuration parameters

Memory limit: 36855MB

BufferCount:                7

Sets Of Buffers:            3

MaxTransferSize:            64 KB

Min MaxTransferSize:        64 KB

Total buffer space:         1 MB

Tabular data device count:  1

Fulltext data device count: 0

Filestream device count:    0

TXF device count:           0

Filesystem i/o alignment:   512

Media Buffer count:            7

Media Buffer size:            64KB

Encode Buffer count:           7

10 percent processed.
20 percent processed.
30 percent processed.
40 percent processed.
51 percent processed.
61 percent processed.
71 percent processed.
81 percent processed.
91 percent processed.

Processed 312 pages for database 'MODEL', file 'modeldev' on file 1.
100 percent processed.
Processed 2 pages for database 'MODEL', file 'modellog' on file 1.
-- Up to here happens sub-second as expected
BACKUP DATABASE successfully processed 314 pages in 252.372 seconds (0.009 MB/sec).

I've tried playing around with stripped backups as well as setting the BUFFERCOUNT and MAXTRANSFERSIZE to higher numbers, to no avail.

Thanks in advanced for your input!

Based on Aaron Bertrand's suggestion, I turned on trace flags 3605, 3004, and 3014 and got the following additional information (only pasting in the portion from where it gets to 100% and then sits waiting):

慂正灵慄慴慢敳退邐邐邐邐坖呁啁荈⃬튅襌⑴䱘粉怤ཁ쒔譌䣹솁̘ : Log files done

Backup: Trailing config done

Backup: MBC done

BackupDatabase: Writing history records

Writing backup history records

BACKUP DATABASE successfully processed 314 pages in 251.106 seconds (0.009 MB/sec).
BackupDatabase: Finished

Just to clarify the entries above, starting with, what I'm guessing are Chinese characters, only display once the backup completes. However, looking in the log file, up to the "Backup: MBC done" entry has the same timestamp as all the previous entries. "BackupDatabase: Writing history records" has a timestamp 4 minutes later.

Best Answer

Since the additional trace flags allowed you to narrow the delay down to the stage where the process tries to write to msdb, I'm willing to bet one or both of these returns high numbers:

SELECT MAX(row_count), SUM(in_row_reserved_page_count)
  FROM msdb.sys.dm_db_partition_stats 
  WHERE [object_id] = OBJECT_ID('msdb.dbo.backupset');

SELECT database_name, [type], c = COUNT(*)
  FROM msdb.dbo.backupset
  GROUP BY database_name, [type]
  ORDER BY c DESC;

You can consider whittling away at some of the data in that table. Of course, only you can know if you have reasons to keep backup history that otherwise should probably be cleaned up in most cases (for example, the first rule I declare to identify junk history data is for databases that no longer exist - but what if you have renamed a database recently?). So please think about each of these rules and only delete backups where the rules make sense to you based on these scenarios. For example:

CREATE TABLE #bs(id INT, ms INT);

-- Databases that no longer exist on this server:
INSERT #bs(id,ms) SELECT backup_set_id, media_set_id
FROM msdb.dbo.backupset 
WHERE database_name NOT IN (SELECT name FROM sys.databases);

-- All copy_only backups (because who cares?):
INSERT #bs(id,ms) SELECT backup_set_id, media_set_id
FROM msdb.dbo.backupset 
WHERE is_copy_only = 1;

-- All diffs except the most recent:
;WITH x AS 
(
  SELECT backup_set_id, media_set_id, 
    rn = ROW_NUMBER() OVER (PARTITION BY database_name 
      ORDER BY backup_set_id DESC)
  FROM msdb.dbo.backupset
  WHERE [type] = 'I'
  AND backup_finish_date IS NOT NULL
)
INSERT #bs(id,ms) SELECT backup_set_id, media_set_id
FROM x WHERE rn > 1;

...

-- All log backups prior to the most recent full:
INSERT #bs(id,ms) SELECT backup_set_id, media_set_id
FROM msdb.dbo.backupset AS l
WHERE [type] = 'L' AND EXISTS
(
  SELECT 1 
    FROM msdb.dbo.backupset AS bs2
    WHERE database_name = l.database_name
    AND [type] = 'D' 
    AND backup_finish_date > l.backup_finish_date
);

-- All fulls except the most recent successful non-copy-only full backup:
;WITH x AS 
(
  SELECT backup_set_id, media_set_id, 
    rn = ROW_NUMBER() OVER (PARTITION BY database_name 
      ORDER BY backup_set_id DESC)
  FROM msdb.dbo.backupset
  WHERE [type] = 'D'
  AND is_copy_only = 0
  AND backup_finish_date IS NOT NULL
)
INSERT #bs(id,ms) SELECT backup_set_id, media_set_id
FROM x WHERE rn > 1;

Need to also delete some file and filegroup information:

DELETE msdb.dbo.backupfile
WHERE backup_set_id IN (SELECT id FROM #bs);

DELETE msdb.dbo.backupfilegroup
WHERE backup_set_id IN (SELECT id FROM #bs);

DELETE mf
FROM msdb.dbo.backupmediafamily AS mf
WHERE media_set_id IN (SELECT ms FROM #bs)
AND NOT EXISTS (SELECT 1 FROM msdb.dbo.backupset
  WHERE media_set_id = mf.media_set_id);

DELETE ms
FROM msdb.dbo.backupmediaset AS ms
WHERE media_set_id IN (SELECT ms FROM #bs)
AND NOT EXISTS (SELECT 1 FROM msdb.dbo.backupset
  WHERE media_set_id = ms.media_set_id);

You also need to delete associated restore history:

DELETE msdb.dbo.restorefilegroup WHERE restore_history_id IN
(SELECT restore_history_id FROM msdb.dbo.restorehistory
  WHERE backup_set_id IN (SELECT id FROM #bs));

DELETE msdb.dbo.restorefile WHERE restore_history_id IN
(SELECT restore_history_id FROM msdb.dbo.restorehistory
  WHERE backup_set_id IN (SELECT id FROM #bs));

DELETE msdb.dbo.restorehistory
  WHERE backup_set_id IN (SELECT id FROM #bs);

And then finally the main rows can be deleted without violating any constraints:

DELETE msdb.dbo.backupset 
WHERE backup_set_id IN (SELECT id FROM #bs);

DROP TABLE #bs;

This whole process will not likely be fast - I highly recommend doing it during a maintenance window or slow period, and most certainly suspend all backup jobs while doing this.

On Standard Edition there's no real clean way to do this as an online operation, but after you've deleted all those rows, you'll likely benefit most from a full rebuild:

ALTER INDEX ALL ON msdb.dbo.backupset REBUILD;

The above was based partially on the logic included in the system procedure dbo.sp_delete_backuphistory, which leaves out some of the above criteria, has a reputation for locking up the entire set of backup- and restore-related msdb tables, and only allows a singular filter on date (which may not be very helpful depending on a variety of factors, including the percentage of bloat attributed to log backups prior to their most recent fulls, never mind the fact that different databases may be on completely different schedules). You can see where most of my logic came from, though, by looking at the definition for yourself:

EXEC msdb.sys.sp_helptext @objname = N'dbo.sp_delete_backuphistory';