Postgresql – Question regarding PSQLException: User was holding shared buffer pin for too long

amazon-rdspostgresql

I recently encountered a strange error that I don't quite understand what could cause it:

caused by org.postgresql.util.PSQLException: ERROR: canceling statement due to conflict with recovery Detail: User was holding shared buffer pin for too long.

I have a fixed timeout 10s set for each query, so any long run query should get terminated and the error message I expected to see User query get cancelled by user request. However, I see this sometimes, but the above as well which is really confusing. Furthermore, it only show up when I see high lag from our RDS replica, then resolve itself after 5, 10 minutes.

I tried to reproduce this issue locally but no luck. Any thought would be greatly appreciated!

Best Answer

If the replay process is already behind by more than max_standby_streaming_delay (or max_standby_archive_delay), then it will immediately terminate anything which is in its way. If it is behind by less, then it is willing to wait only until it reaches that delay. In other words, the max_standby_streaming_delay sets a grace period for the entire system, not for each individual action within the system.

So if max_standby_streaming_delay is set to 30 seconds, but the replay process is already 28 seconds behind, then "too long" to hold a buffer pin could be 2 seconds. If is already more than 30 seconds behind, then a few nanoseconds could be "too long"--just long enough to detect the blockage and then send, receive, and process a signal.