I currently have a PostgreSQL slave node that is presenting problems, that don't arise in the master node. Seems to be related with the node sync process.
Complete stacktrace:
org.postgresql.util.PSQLException: ERROR: canceling statement due to conflict with recovery
Detail: User query might have needed to see row versions that must be removed.
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2103)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1836)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:512)
at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:273)
at org.apache.tomcat.dbcp.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:82)
at org.apache.tomcat.dbcp.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:82)
at cl.waypoint.mailer.reportes.BasicReport.getSingleColumn(BasicReport.java:542)
at cl.waypoint.mailer.reportes.BasicReport.getSingleColumn(BasicReport.java:518)
at cl.waypoint.mailer.reportes.StatusSemanalClientes.updateIgnicion(StatusSemanalClientes.java:448)
at cl.waypoint.mailer.reportes.StatusSemanalClientes.access$2(StatusSemanalClientes.java:447)
at cl.waypoint.mailer.reportes.StatusSemanalClientes$TempAndDoorLocator.call(StatusSemanalClientes.java:414)
at cl.waypoint.mailer.reportes.StatusSemanalClientes$TempAndDoorLocator.call(StatusSemanalClientes.java:1)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
DB Version:
PostgreSQL 9.4.9 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit
OS Version:
Distributor ID: Debian
Description: Debian GNU/Linux 8.6 (jessie)
Release: 8.6
Codename: jessie
The question is: how can I debug the actual cause of the problem and/or solve/avoid it? Don't hesitate to ask for more info if needed to pinpoint the problem, right now just don't know what else to provide.
PS: Error does NOT appear if the offending query is repeated, so it's not strictly related to the query itself, but perhaps with a table update/sync transient.
Best Answer
What you see is a common experience for anyone running standbys and bigger queries on the master. There are a couple of possible causes:
When a query will be killed for one of the above reasons is hard to predict, but usually there are some that appear more often. It is also configurable, for options read the documentation page further.
The most important option is
hot_standby_feedback
, which helps against the most common cause (VACUUM
removing recently-dead rows). It has its own costs (might cause table bloat), but the outcome is usually still better than running the same queries on the master.