This is a known issue and expected behavior (it is by design). Essentially, the backup reserves the total amount of space that it thinks it might need, then shrinks as the last step. To change the behavior you can experiment with trace flag 3042. From http://msdn.microsoft.com/en-us/library/bb964719.aspx#Allocation :
For compressed backups, the size of the final backup file depends on how compressible the data is, and this is unknown before the backup operation finishes. Therefore, by default, when backing up a database using compression, the Database Engine uses a pre-allocation algorithm for the backup file. This algorithm pre-allocates a predefined percentage of the size of the database for the backup file. If more space is needed during the backup operation, the Database Engine grows the file. If the final size is less than the allocated space, at the end of the backup operation, the Database Engine shrinks the file to the actual final size of the backup.
To allow the backup file to grow only as needed to reach its final size, use trace flag 3042. Trace flag 3042 causes the backup operation to bypass the default backup compression pre-allocation algorithm. This trace flag is useful if you need to save on space by allocating only the actual size required for the compressed backup. However, using this trace flag might cause a slight performance penalty (a possible increase in the duration of the backup operation).
If I recall correctly, PostgreSQL now zeroes the end of the WAL segment after an archive timeout, before archiving the file. (Update): This is true for 9.4; for prior versions you still need to use pg_clearxlogtail
utility to zero the end of any WAL segment as part of your archive script before you gzip
your archive files.
In 9.4 and above you just need to gzip
the file, with no need for pg_clearxlogtail
.
The change was made commit 9a20a9b, but it's only in tag REL9_4_BETA1
, so the change will first appear in 9.4:
commit 9a20a9b
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Mon Jul 8 11:23:56 2013 +0300
Improve scalability of WAL insertions.
This patch replaces WALInsertLock with a number of WAL insertion slots,
allowing multiple backends to insert WAL records to the WAL buffers
concurrently. This is particularly useful for parallel loading large amounts
of data on a system with many CPUs.
This has one user-visible change: switching to a new WAL segment with
pg_switch_xlog() now fills the remaining unused portion of the segment with
zeros. This potentially adds some overhead, but it has been a very common
practice by DBA's to clear the "tail" of the segment with an external
pg_clearxlogtail utility anyway, to make the WAL files compress better.
With this patch, it's no longer necessary to do that.
BTW, you should really be using an archive timeout, rather than manually switching xlogs. Consider having a streaming replica for additional protection.
Update: As Jeff noted, pg_clearxlogtail
hasn't been updated for 9.3. The incompatibility is very simple though; the definition of XLogRecPtr
changed from
typedef struct XLogRecPtr
{
uint32 xlogid; /* log file #, 0 based */
uint32 xrecoff; /* byte offset of location in log file */
} XLogRecPtr;
to
typedef uint64 XLogRecPtr;
when support for platforms with compilers that don't support 64-bit long integers was removed from PostgreSQL in 9.3. So all we need to do is add a macro or function that tests equality in a way that works for both.
The following patch adds support for 9.3, but is completely untested so use entirely at your own risk. If it eats your data then sets fire to your house, well, that's how things are. Patch here, as an attachment to the linked pgfoundry issue.
Best Answer
If you delete 40GB of indexes, then certain backups will be smaller, such as:
Some backups won't generally be smaller, such as:
Differential backups (Full, File, Filegroup) may or may not be much smaller as it depends if any objects those non-clustered indexes are created on are actually having data modifications happening.
Some backup sizes might not go down by 40GB, if for example compression is already being used or those indexes are on a
read_only
filegroup that isn't generally backed up via filegroup backups.However, overall, the answer is "Yes, it should be smaller".