PostgreSQL pgpool-II replication mode doesn’t work

pgpoolpostgresqlreplication

I'm experimenting with PostgreSQL and pgpool-II and trying to setup native replication mode on pgpool. The pgpool documentation says In the native replication mode, Pgpool-II is responsible for synchronizing databases.. So I didn't change any configuration on PostgreSQL, just set up two nodes via Puppet, made proper firewall configuration, so I can directly connect to nodes from other hosts on the network. pgpool-II logs seem Ok, don't show any obvious errors and pgpool starts normally, it can even connect to one of the nodes Actually it always connects to backend_hostname0. I even don't see any messages about replication itself. And when I stop the backend_hostname0 it seems like it doesn't see my second node and says: DETAIL: all backend nodes are down, pgpool requires at least one valid node.

My pgpool configuration is like the following (I just copied that standard example config and added my posgresql backends):

replication_mode = on

backend_hostname0 = 'pg2.example.com'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/var/postgres/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'

backend_hostname1 = 'localhost'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/postgres/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'

Maybe I missed something, can't get this thing to work.
Thank you in advance.

Best Answer

Postgres instances are normally replicated using Streaming replication which uses the WAL(Write Ahead Logs), which are transferred to secondary database to be replicated. So all incoming queries from application will go to the primary database and gets executed. These executions generate WALs which are then transferred to the secondary database and applied so that the secondary comes in sync with primary database.

As per pgpool documentation, the replication it uses is a statement based replication where the application connects to pgpool server instead of database(primary/secondary), and the incoming queries are then forwarded to database instances to be replicated.

There are a lot of disadvantages of using pgpool replication setup. Functions like random(), CURRENT_TIMESTAMP, and sequences can have different values on different databases, also if the pgpool crashes, no query will execute, since the application is connected to pgpool, not the database.