PostgreSQL – Resolving PSQLException: ERROR: canceling statement due to conflict with recovery

data synchronizationmaster-slave-replicationpostgresqlpostgresql-9.4

I currently have a PostgreSQL slave node that is presenting problems, that don't arise in the master node. Seems to be related with the node sync process.

Complete stacktrace:

org.postgresql.util.PSQLException: ERROR: canceling statement due to conflict with recovery
  Detail: User query might have needed to see row versions that must be removed.
    at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2103)
    at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1836)
    at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
    at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:512)
    at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
    at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:273)
    at org.apache.tomcat.dbcp.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:82)
    at org.apache.tomcat.dbcp.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:82)
    at cl.waypoint.mailer.reportes.BasicReport.getSingleColumn(BasicReport.java:542)
    at cl.waypoint.mailer.reportes.BasicReport.getSingleColumn(BasicReport.java:518)
    at cl.waypoint.mailer.reportes.StatusSemanalClientes.updateIgnicion(StatusSemanalClientes.java:448)
    at cl.waypoint.mailer.reportes.StatusSemanalClientes.access$2(StatusSemanalClientes.java:447)
    at cl.waypoint.mailer.reportes.StatusSemanalClientes$TempAndDoorLocator.call(StatusSemanalClientes.java:414)
    at cl.waypoint.mailer.reportes.StatusSemanalClientes$TempAndDoorLocator.call(StatusSemanalClientes.java:1)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:722)

DB Version:

PostgreSQL 9.4.9 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit

OS Version:

Distributor ID: Debian
Description:    Debian GNU/Linux 8.6 (jessie)
Release:    8.6
Codename:   jessie

The question is: how can I debug the actual cause of the problem and/or solve/avoid it? Don't hesitate to ask for more info if needed to pinpoint the problem, right now just don't know what else to provide.

PS: Error does NOT appear if the offending query is repeated, so it's not strictly related to the query itself, but perhaps with a table update/sync transient.

Best Answer

What you see is a common experience for anyone running standbys and bigger queries on the master. There are a couple of possible causes:

There are also additional types of conflict that can occur with Hot Standby. These conflicts are hard conflicts in the sense that queries might need to be canceled and, in some cases, sessions disconnected to resolve them. The user is provided with several ways to handle these conflicts. Conflict cases include:

Access Exclusive locks taken on the primary server, including both explicit LOCK commands and various DDL actions, conflict with table accesses in standby queries.

  • Dropping a tablespace on the primary conflicts with standby queries using that tablespace for temporary work files.
  • Dropping a database on the primary conflicts with sessions connected to that database on the standby.
  • Application of a vacuum cleanup record from WAL conflicts with standby transactions whose snapshots can still "see" any of the rows to be removed.
  • Application of a vacuum cleanup record from WAL conflicts with queries accessing the target page on the standby, whether or not the data to be removed is visible.

When a query will be killed for one of the above reasons is hard to predict, but usually there are some that appear more often. It is also configurable, for options read the documentation page further.

The most important option is hot_standby_feedback, which helps against the most common cause (VACUUM removing recently-dead rows). It has its own costs (might cause table bloat), but the outcome is usually still better than running the same queries on the master.