Postgresql – Store large compressed documents in Postgresql

compressionperformancepostgresqlpostgresql-performance

Where I work we keep approximately 500,000 image files related to processing for our customer in a file system attached to our Redhat 5.11 production server. We are planning an upgrade to Redhat 7.4 and have already done this for our development database servers. The number of documents is increasing and while we do need to archive older documents, we are talking about moving storage of the documents into a database where we have access to auditing and database level security. If we store the objects in Oracle, they won't be compressed unless we pay for the advanced compression option. We would like top avoid that cost.

Does Postgresql have similar compression for large binary objects? Does it make sense to try using Postgresql for this? I would use Database Heterogeneous Connectivity from Oracle to read and write the large objects. Are there other tools that people have used to store large objects in databases? Thanks

Best Answer

Postgres compresses automatically large objects chunks, as well as bytea through the TOAST mechanism.

But most PDFs are already compressed. As an example I have 1000 dedup-ed pdfs in my mailbox stored as large objets totalling 281MB, and the compression ratio brought by postgres on these is only 11%.

For large objects you may query the compressed size, as a superuser, with

SELECT sum(pg_column_size(data)-4) FROM pg_largeobject WHERE loid = :id

Compression of BYTEA might be a bit better because it compresses the entire contents as a whole (limited to 1GB), whereas large objects are compressed as individual chunks of 2048 bytes or less (in fact, pagesize/4)

The compression ratio of bytea contents can be measured by comparing octet_length(column) with pg_column_size(column)-4

Does it make sense to try using Postgresql for this?

You should probably import a few hundred of your files in Postgres and see how they compress. For large objects it's a simple as \lo_import filename in psql.