Postgresql – What’s the problem with PostgreSQL temporary files

postgresql

I'm studying PostgreSQL architecture and in almost every book or blog post I read people say that we have to track any increasing of temp_files numbers and find the best work_mem size to avoid new temp_file creation.

But none explain why these temp_files are so bad. Could anyone clarify to me why we need to keep the number of temp_files low?

Best Answer

Memory is faster than disk.

Anything that can be done only using memory will work faster instead of using the hard disks to store temporary content.

PostgreSQL documentation says:

work_mem (integer)

Specifies the amount of memory to be used by internal sort operations and hash tables before writing to temporary disk files.

So if you find the proper value of it (which depends on total amount of RAM, number of workers, SQL queries done, etc.) then everything should be able to run from RAM only, hence fast.

If this value is exceeded for some operation, temporary files will be created on disk, and hence the operation will be slower.

So tracking amount of temporary files and investigate if it grows is a possible strategy to get better performances (maybe work_mem is just too low, its default value being 4MB; or it is good but it maxes out currently available RAM and then you need a box with more RAM).

Hence the following tools:

1) temp_files in pg_stat_database whose value is "Number of temporary files created by queries in this database. All temporary files are counted, regardless of why the temporary file was created (e.g., sorting or hashing), and regardless of the log_temp_files setting."

and

2) log_temp_files configurable item whose documentation is:

Controls logging of temporary file names and sizes. Temporary files can be created for sorts, hashes, and temporary query results. A log entry is made for each temporary file when it is deleted. A value of zero logs all temporary file information, while positive values log only files whose size is greater than or equal to the specified number of kilobytes. The default setting is -1, which disables such logging. Only superusers can change this setting.