Postgresql – xlog flush request is not satisfied in log files

postgresql

Today I saw that Postgres produces error messages related to possible data corruption (?). Everything's working as expected so I didn't realize anything was wrong.

ERROR:  xlog flush request 6E/82FFED10 is not satisfied --- flushed only to 3D/CA02E920
CONTEXT:  writing block 1008 of relation base/118517/118823
CONTEXT:  writing block 1008 of relation base/118517/118823
WARNING:  could not write block 1008 of base/118517/118823
LOG:  request to flush past end of generated WAL; request 6E/82FFED10, currpos 3D/CA02E920
DETAIL:  Multiple failures --- write error might be permanent.

PostgreSQL 11.
Configuration:

max_connections = 100
shared_buffers = 16GB
effective_cache_size = 48GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.7
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 41943kB
min_wal_size = 1GB
max_wal_size = 2GB
max_worker_processes = 8
max_parallel_workers_per_gather = 4
max_parallel_workers = 8

pg_control version number:            1100
Catalog version number:               201809051
Database system identifier:           6775130872383000604
Database cluster state:               in production
pg_control last modified:             Thu Mar 12 04:46:20 2020
Latest checkpoint location:           85/802C9268
Latest checkpoint's REDO location:    85/802C9230
Latest checkpoint's REDO WAL file:    000000010000008500000080
Latest checkpoint's TimeLineID:       1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          0:69393025
Latest checkpoint's NextOID:          511846
Latest checkpoint's NextMultiXactId:  11365
Latest checkpoint's NextMultiOffset:  23069
Latest checkpoint's oldestXID:        562
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  69393025
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint:            Thu Mar 12 04:45:50 2020
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location:     0/0
Min recovery ending loc's timeline:   0
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
wal_level setting:                    replica
wal_log_hints setting:                off
max_connections setting:              100
max_worker_processes setting:         8
max_prepared_xacts setting:           0
max_locks_per_xact setting:           64
track_commit_timestamp setting:       off
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Size of a large-object chunk:         2048
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value
Data page checksum version:           0
Mock authentication nonce:            edec50c3ef6ee1a42351c2e593de539feb25343ad239e12b813b4c212ae2a1d6

I did a backup and then I restored it to another database and removed the old one. It didn't work. I still getting those errors.

P.S. PostgreSQL did not crash lately. There is also a plenty of space on SSD.

Best Answer

Yes that looks like corruption, perhaps caused by storage hardware problems.

Perform a pg_dumpall of the cluster and restore it to a newly created cluster, that should get rid of the problem.

Check your hardware.

Related Solutions

Postgresql – How to request a flush of the postgresql transaction logs

Most likely what you're seeing is a huge checkpoint_segments value and long checkpoint_timeout; alternately, they might have set wal_keep_segments to a very large value if it's supposed to support streaming replication.

You can force a checkpoint with the CHECKPOINT command. This may stall the database for some time if it has accumulated a huge amount of WAL and hasn't been background-writing it. If checkpoint_completion_target is low (less than 0.8 or 0.9) then there's likely to be a big backlog of work to do at checkpoint time. Be prepared for the database to become slow and unresponsive during the checkpoint. You cannot abort a checkpoint once it begins by normal means; you can crash the database and restart it, but that just puts you back to where you were.

I'm not certain, but I have the feeling a checkpoint could also result in growth of the main database - and do so before any space is freed in the WAL, if it is at all. So a checkpoint could potentially trigger you running out of space, something that's very hard to recover from without adding more storage at least temporarily.

Now would be a very good time to get a proper backup of the database - use pg_dump -Fc dbname to dump each database, and pg_dumpall --globals-only to dump user definitions etc.

If you can afford the downtime, stop the database and take a file-system level copy of the entire data directory (the folder containing pg_xlog, pg_clog, global, base, etc). Do not do this while the server is running and do not omit any files or folders, they are all important (well, except pg_log, but it's a good idea to keep the text logs anyway).

If you'd like more specific comment on the likely cause (and so I can be more confident in my hypothesis is) you can run the following queries and paste their output into your answer (in a code-indented block) then comment so I'm notified:

SELECT version();

SELECT name, current_setting(name), source
  FROM pg_settings
  WHERE source NOT IN ('default', 'override');

It is possible that setting checkpoint_completion_target = 1 then stopping and restarting the DB might cause it to start aggressively writing out queued up WAL. It won't free any until it does a checkpoint, but you could force one once write activity slows down (as measured with sar, iostat, etc). I have not tested to see if checkpoint_completion_target affects already-written WAL when changed in a restart; consider testing this on a throwaway test PostgreSQL you initdb on another machine first.

Backups have nothing to do with WAL retention and growth; it isn't backup related.

See:

Postgresql – Postgres log files not rotating

Try the following log parameters. After modify them, remember to reload the configuration file ($PGDATA/postgresql.conf).

# These are only used if logging_collector is on:
log_directory = '/var/applog/pg_log/1921'               # directory where log files are written,
                                        # can be absolute or relative to PGDATA
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # log file name pattern,
                                        # can include strftime() escapes
log_file_mode = 0600                    # creation mode for log files,
                                        # begin with 0 to use octal notation
log_truncate_on_rotation = on           # If on, an existing log file with the
                                        # same name as the new log file will be
                                        # truncated rather than appended to.
                                        # But such truncation only occurs on
                                        # time-driven rotation, not on restarts
                                        # or size-driven rotation.  Default is
                                        # off, meaning append to existing files
                                        # in all cases.
log_rotation_age = 1d                  # Automatic rotation of logfiles will
                                        # happen after that time.  0 disables.
log_rotation_size = 10MB               # Automatic rotation of logfiles will

Best Answer

Related Solutions

Postgresql – How to request a flush of the postgresql transaction logs

Postgresql – Postgres log files not rotating

Related Question