PostgreSQL – How to Compress Database

compressionpostgresql

I have a large PostgreSQL database over 500GB in size which is too large. Is there anyway to compress the database down to a more manageable size? I have attempted to do this with SquashFS and the database compressed down to 177GB however PostgreSQL requires that the database have write access and Squashed systems are Read-Only. Do more experienced database users have any suggestions to accomplish this goal?

The database holds GIS data for the planet and will be used locally on a deployed system. Currently it sits on a 1TB SSD, however, I am trying to avoid slapping in an additional hard drive simply to accommodate a large database. The database performs as desired with no issue, I would simply like to compress it down to a more manageable size and avoid placing it on a separate drive.

Best Answer

File system

A very popular method of doing this is with the file system. BTRFS and ZFS works under the database at the file system level. Both can be used in a loopback device so you can provide a compressed tablespace without having another partition. There are caveats with this, if the tablespace fails it may take your cluster too.

ZFS

ZFS is the big one here. It's what I would go for.

Btrfs

Btrfs is a strong contender but it's been in active development for a very long time, and the lack of major distros picking it up as a default has many people questioning whether or not it's ready for "prime time."

https://btrfs.wiki.kernel.org/index.php/Compression

PostgreSQL

GIS Methods (PostGIS)

For Polygons, one method is to simplify polygons by dropping vertices with ST_Simplify.
For Points, one method is spatial clustering.

Both of these result in a loss of information. PostGIS, like most of the features of the database, doesn't have a transparent "magic compression" option.

`cstore_fdw`

There is also cstore_fdw which is a columnar store that offers compression. It has a different performance profile so ymmv.

Related Solutions

Sql-server – Compress backup file using T-SQL

Backup compression was introduced in SQL 2008 Enterprise, and in SQL2008R2 and later, added to Standard Edition.

When creating a backup, you can specify the WITH COMPRESSION keyword, which will ensure that the database backup size is compressed to approximately a similar size as a zipped 'normal' backup file.

For SQL2005 or older, the best way really (other than using a specific tool like RedGate) is to ensure that xp_cmdshell is enabled on the instances, and then use a command line to compact using for e.g. WinRar.

I use a cmd file that looks something like this:

@echo off
Set "winrarPath=C:\Program Files\WinRAR"
"%winrarPath%\winrar.exe" a -r "ZippedBackup.zip" "BackupFile.BAK"

You can then execute this cmd file from your 2000 / 2005 instances. You can also play around with passing %1-type variables to the cmd file, if your filenames are not generic.

Postgresql query slowed with table growth

You are assuming that the problem is the switch to the bitmap scan. That is very unlikely.

The likely problem is that your data had to be read from disk, either because it has grown too large to fit in cache (in which case more RAM would help), or the cache was cold when you happened to run the test.

The switch to bitmap scan is likely a response to the same thing that caused the problem, the growth in data.

You could cluster the table on the index t_feature_ind, which would make it more efficient to pull out all of the data with any given feature.