We normally perform backups across the network to a file share on a different SAN. This works very well for all but 1 of our database servers. The problem machine is a new server running Windows Server 2012R2 as the OS, and SQL Server 2012 Standard Edition for the database instance. The network latency between the server and backup share is sub-MS.
The actual backup of the data and log file happen quickly (sub-second), however, it seems to get stuck at 100% completed (when viewed through sp_WhoIsActive) with a BACKUPTHREAD wait type. If I perform the same backup but to a local drive, it does not experience this problem.
Here is a sample backup command we are running for the MODEL (only 5MBs) database:
USE MASTER
GO
BACKUP DATABASE MODEL
TO DISK = N'\\<FileShareServer>\SQL_Backup\MODEL.BAK'
WITH NAME = N'MODEL-Full Database Backup'
,NOFORMAT, INIT, SKIP, NOREWIND, NOUNLOAD, STATS = 10
,CHECKSUM, COPY_ONLY, COMPRESSION;
GO
Here are the stats for the backup, with Trace Flag 3604, 3213 turned on:
Backup/Restore buffer configuration parameters
Memory limit: 36855MB
BufferCount: 7
Sets Of Buffers: 3
MaxTransferSize: 64 KB
Min MaxTransferSize: 64 KB
Total buffer space: 1 MB
Tabular data device count: 1
Fulltext data device count: 0
Filestream device count: 0
TXF device count: 0
Filesystem i/o alignment: 512
Media Buffer count: 7
Media Buffer size: 64KB
Encode Buffer count: 7
10 percent processed.
20 percent processed.
30 percent processed.
40 percent processed.
51 percent processed.
61 percent processed.
71 percent processed.
81 percent processed.
91 percent processed.
Processed 312 pages for database 'MODEL', file 'modeldev' on file 1.
100 percent processed.
Processed 2 pages for database 'MODEL', file 'modellog' on file 1.
-- Up to here happens sub-second as expected
BACKUP DATABASE successfully processed 314 pages in 252.372 seconds (0.009 MB/sec).
I've tried playing around with stripped backups as well as setting the BUFFERCOUNT and MAXTRANSFERSIZE to higher numbers, to no avail.
Thanks in advanced for your input!
Based on Aaron Bertrand's suggestion, I turned on trace flags 3605, 3004, and 3014 and got the following additional information (only pasting in the portion from where it gets to 100% and then sits waiting):
慂正灵慄慴慢敳退邐邐邐邐坖呁啁荈⃬튅襌⑴䱘粉怤ཁ쒔譌䣹솁̘ : Log files done
Backup: Trailing config done
Backup: MBC done
BackupDatabase: Writing history records
Writing backup history records
BACKUP DATABASE successfully processed 314 pages in 251.106 seconds (0.009 MB/sec).
BackupDatabase: Finished
Just to clarify the entries above, starting with, what I'm guessing are Chinese characters, only display once the backup completes. However, looking in the log file, up to the "Backup: MBC done" entry has the same timestamp as all the previous entries. "BackupDatabase: Writing history records" has a timestamp 4 minutes later.
Best Answer
Since the additional trace flags allowed you to narrow the delay down to the stage where the process tries to write to
msdb
, I'm willing to bet one or both of these returns high numbers:You can consider whittling away at some of the data in that table. Of course, only you can know if you have reasons to keep backup history that otherwise should probably be cleaned up in most cases (for example, the first rule I declare to identify junk history data is for databases that no longer exist - but what if you have renamed a database recently?). So please think about each of these rules and only delete backups where the rules make sense to you based on these scenarios. For example:
...
Need to also delete some file and filegroup information:
You also need to delete associated restore history:
And then finally the main rows can be deleted without violating any constraints:
This whole process will not likely be fast - I highly recommend doing it during a maintenance window or slow period, and most certainly suspend all backup jobs while doing this.
On Standard Edition there's no real clean way to do this as an online operation, but after you've deleted all those rows, you'll likely benefit most from a full rebuild:
The above was based partially on the logic included in the system procedure
dbo.sp_delete_backuphistory
, which leaves out some of the above criteria, has a reputation for locking up the entire set of backup- and restore-relatedmsdb
tables, and only allows a singular filter on date (which may not be very helpful depending on a variety of factors, including the percentage of bloat attributed to log backups prior to their most recent fulls, never mind the fact that different databases may be on completely different schedules). You can see where most of my logic came from, though, by looking at the definition for yourself: