Combing through the docs, I see the following (accidental incorrect details will hopefully (or surely) be pointed out by others):
There is a separate server process called the background writer, whose
function is to issue writes of "dirty" (new or modified) shared
buffers.
BW does its work according to its config parameters. It writes at most bgwriter_lru_maxpages
dirty shared buffers, then has a bgwriter_delay
millisecond rest. Now, when it comes to checkpoint_segments
log segments (or checkpoint_timeout
seconds, if it comes first), it flushes all (remaining) dirty segments to data files. Now the database data has a consistent state on disk, with a corresponding entry in the log as well. It is a good starting point for a subsequent REDO if necessary.
And who does make the checkpoints?
The server's background writer process automatically performs a
checkpoint [...]
As for your 2nd point, WAL is WAL (stored in log segment files) and data files are data files. The two will be consistent with each other but not merged.
During its operation, PostgreSQL records changes to transaction log files, but it doesn't immediately flush them to the actual database tables. It usually just keeps the changes in memory, and returns them from memory when they are requested, unless RAM starts getting full and it has to write them out.
This means that if it crashes, the on disk tables won't be up to date. It has to replay the transaction logs, applying the changes to the on-disk tables, before it can start back up. That can take a while for a big, busy database.
For that reason, and so that the transaction logs do not keep growing forever, PostgreSQL periodically does a checkpoint where it makes sure the DB is in a clean state. It flushes all pending changes to disk and recycles the transaction logs that were being used to keep a crash recovery record of the changes.
This flush happens in two phases:
- Buffered
write()
s of dirty shared_buffers
to the tables; and
fsync()
of affected files to make sure the changes really hit disk
Both of those can increase disk I/O load. Contention caused by these writes can slow down reads, and can also slow down flushing of WAL segments that's required in order to commit transactions.
It's been a longstanding challenge, but it's getting worse as we see systems with more and more RAM so they can buffer more data and take longer to write it out. There's discussion between the Linux and PostgreSQL communities on how to deal with this at the moment, as discussed in this LWN.net article. (LWN.net won't be able to keep writing this sort of great work if people don't subscribe. I'm a subscriber and sharing this link because it's useful and informative. Please consider subscribing if you want to see more of this sort of thing.)
The main thing you can do to reduce the impact of checkpoints at the moment is to spread checkpoint activity out by increasing checkpoint_completion_target
so that more of the data has been written out by the time the final checkpoint arrives. This has a cost, though - if you update a page (say) ten times, it might be written to disk multiple times before the checkpoint with a high completion target, even though it only strictly had to be written out once for crash safety. A higher completion target makes for smoother I/O patterns but more overall I/O overhead.
The other thing you can do to help is tell your operating system to immediately start writing data when it gets buffered writes. This is like the kernel side of setting checkpoint_completion_target
and has a similar trade-off. See the linux vm documentation, in particular dirty_background_bytes
, dirty_background_ratio
, dirty_expire_centisecs
.
Best Answer
When the hdd fills the DB could no longer write and it will crash. But that is the easy thing to solve & monitor- whenever you reach 85% capacity just add more or clear old data.
The harder/more important part is to ensure performance is not impacted by the growth in data (more rows to scan) Usually this is handled by partitions and cleanup processes which somewhat mitigate the first issue as well.
Regards Jony