Postgresql – pgpool-II configuring multiple master databases

pgpoolpostgresql

I could not find a solution for this anywhere, not even in the pgpool manuals.

How do I configure multiple – master postgresql databases with pgpool-II? Additional requirement is that each master should have its own set of slaves using streaming replication. The reason for multiple master is that each master holds a partition of a large data set.

It is a large data set (say about 100 million users, their account information, plus additional information about their accounts) which is partitioned; so it is siloed. The data model is being designed so that the silos do not need to talk to each other.

Any help and guidance is much appreciated.

Best Answer

OK your first decision is WHERE to split the silos out, in the application layer, or in the db layer. If your application will keep track of the silos then you can run individual pgbouncer or pgpool servers on different ports for each silo.

If you want the application to not have to know or deal with how siloing is done, then something like plproxy can work wonders. Basically you have 1 or more plproxy servers that distribute all the queries / data based on some pre-determined modulo math or some other method for breaking it up, and your app just calls the plproxy server with it's queries and the plproxy server is where the logic for siloing exists.

Each method has it's pluses and minuses. The nice thing about it being in the app layer is that your database layer is fairly simple. just a bunch of master/slave pairs. If you add a new silo you just edit the config file for it in the app and it goes online. The disadvantage is that all apps hitting your data HAVE to be silo aware etc.

With plproxy the advantage is that the app never knows its siloed. If you write a fairly simple SOAP type interface to talk to the plproxy servers then every app your write can use that single interface and never have to be told which silo does what. Now though the complexity is moved into the db layer. Any changes to the farm structure require work on the db level. Also plproxy works by basically wrapping all your database work into functions ahead of time. If you want to select data from a table joined to another table you need to create a function to do that and call that function on the plproxy machine. It seems overly complex at first but actually it's not that hard, just quite different from slinging random SQL at your db servers.