Sql-server – PostgreSQL Initial Database Size

database-sizefragmentationpostgresqlsql server

There are 2 parts to my question.

Is there a way of specifying the initial size of a database in PostgreSQL?
If there isn't, how do you deal with fragmentation when the database grows over time?

I've recently migrated from MSSQL to Postgres, and one of the things we did in the MSSQL world when creating a database was to specify the initial size of the database and transaction log. This reduced fragmentation and increased performance, especially if the "normal" size of the database is known beforehand.

The performance of my database drops as the size grows. For example, the workload I'm putting it through normally takes 10 minutes. As the database grows, this time increases. Doing a VACUUM, VACUUM FULL and VACUUM FULL ANALYSE do not appear to solve the issue. What does solve the performance problem is stopping the database, de-fragmenting the drive and then doing a VACUUM FULL ANALYSE takes the performance of my test back to the original 10 minutes. This leads me to suspect that fragmentation is what's causing me pain.

I've not been able to find any reference to reserving tablespace/database space in Postgres. Either I'm using the wrong terminology and thus finding nothing, or there is a different way of mitigating filesystem fragmentation in Postgres.

Any pointers?

The Solution

The supplied answers helped confirm what I'd begun to suspect. PostgreSQL stores the database across multiple files and this is what allows the database to grow without worry of fragmentation. The default behaviour is to pack these files to the brim with table data, which is good for tables that rarely change but is bad for tables that a frequently updated.

PostgreSQL utilizes MVCC to provide concurrent access to table data. Under this scheme, each update creates a new version of the row that was updated (this could be via time stamp or version number, who knows?). The old data is not immediately deleted, but marked for deletion. The actual deletion occurs when a VACUUM operation is performed.

How does this relate to the fill factor? The table default fill factor of 100 fully packs the table pages, which in turn means that there is no space within the table page to hold updated rows, i.e. updated rows will be placed in a different table page from the original row. This is bad for performance, as my experience shows. As my summary tables get updated very frequently (up to 1500 rows/sec), I opted to set a fill factor of 20, i.e. 20% of the table will be for inserted row data and 80% for update data. While this may seem excessive, the large amount of space reserved for updated rows means that the updated rows stay within the same page as the original and there's a the table page isn't full by the time the autovacuum daemon runs to remove obsolete rows.

To "fix" my database, I did the following.

Set the fill factor of my summary tables to 20. You can do this at creation time by passing a parameter to CREATE TABLE, or after the fact via ALTER TABLE. I issued the following plpgsql command: ALTER TABLE "my_summary_table" SET (fillfactor = 20);
Issued a VACUUM FULL, as this writes a completely new version of the
table file and thus by implication writes a new table file with the
new fill factor.

Rerunning my tests, I see no performance degradation even when the database is as large as I need it to be with many millions of rows.

TL;DR – File fragmentation wasn't the cause, it was table space fragmentation. This is mitigated by tweaking the table's fill factor to suit your particular use case.

Best Answer

No the only thing close to that is when you compile the server with the --with-segsize switch, this might help if your table is taking up more space than a gig and your file system can handle a single file being over a gig. If your inserting 20 gigs it will have to create 20 files if you don't use this switch. If your file system can handle a file over a gig you can just set it to a large value most likely see some benefit, worst case a small benefit.
Take a look at CLUSTER http://www.postgresql.org/docs/9.1/static/sql-cluster.htmland FILLFACTOR http://www.postgresql.org/docs/9.1/static/sql-createtable.html, http://www.postgresql.org/docs/9.1/static/sql-createindex.html

Note that FILLFACTOR can be applied to both tables and indexes.

Best Answer

Related Solutions

Postgresql – Proper storage size estimation technique for PostgreSQL

Sql-server – SQL Server clustered index high fragmentation

Related Question