Preventing Disk Space Issues During PostgreSQL Queries

postgresqlpostgresql-9.3

Do SELECT queries (w/ table JOINs) on a database affect disk space whilst they're running?

Background: I have a Django app with a Postgresql backend (9.3.10). My DB resides in a VM that was running critically low on disk space (around ~400MB left).

I queried a few tables to assess which data to deprecate in order to release disk space (these included joins across tables). These analytical-type queries were bundled behind a single url, and run in unison. When I hit the url, the VM containing my DB went out of space after around half a minute.

I'm an accidental DBA and still learning the ropes. Can anyone explain why I went out of space in this scenario? Are some kind of temporary files created in such operations? I'll share my config details in case they're needed.

Best Answer

Of course, there are actually many ways for a SELECT to take disk space. Just look at the docs for work_mem

work_mem (integer) Specifies the amount of memory to be used by internal sort operations and hash tables before writing to temporary disk files. The value defaults to four megabytes (4MB). Note that for a complex query, several sort or hash operations might be running in parallel; each operation will be allowed to use as much memory as this value specifies before it starts to write data into temporary files. Also, several running sessions could be doing such operations concurrently. Therefore, the total memory used could be many times the value of work_mem; it is necessary to keep this fact in mind when choosing the value. Sort operations are used for ORDER BY, DISTINCT, and merge joins. Hash tables are used in hash joins, hash-based aggregation, and hash-based processing of IN subqueries.

Even with a high work_mem a SELECT can chew away disk space. You can for instance, have a function required that writes to disk directly.

You may be interested in mitigating this by using temp_file_limit

temp_file_limit (integer) Specifies the maximum amount of disk space that a process can use for temporary files, such as sort and hash temporary files, or the storage file for a held cursor. A transaction attempting to exceed this limit will be canceled. The value is specified in kilobytes, and -1 (the default) means no limit. Only superusers can change this setting.