This is pretty broad. Let's answer (2) first:
http://www.postgresql.org/support/professional_support/
Now, on to (1):
Is there any good conceptual design that could offer reliable pgsql
cluster - reliability, data corruption prevention, high availability,
no SPOF, load balancing and access to DB through one (virtual?) IP
address?
The question is not "if" but "which".
You are describing a heartbeat/failover cluster. There aren't really fully canned solutions for this, you'll need your own heartbeat and STONITH using appropriate non-PostgreSQL-specific tools.
You can use PostgreSQL 9.1 streaming replication and log shipping for the replication. The replication manager repmgr
can help automate much of the management.
Other replication options are described on the PostgreSQL wiki. All will require added heartbeat and STONITH for a fully transparent cluster with virtual IP, etc.
You might also want to look at pgbouncer
. It's a useful tool not only for connection pooling, but also for failover and load balancing.
See the PostgreSQL manual on high availability, load balancing, and replication for more information.
Given the fact that you mentioned you have Master-Master replication mode
, I would not recommend any automatic failover unless you properly account for Replication Lag. After all, MySQL Replication is asynchronous.
It is theoretically possible to have the following:
DBServer1
as Master to DBServer2
DBServer2
as Master to DBServer1
- DBVIP pointing at DBServer1
DBServer2
is 180 seconds behind
DBServer1
goes down
- Automatic Failover moves DBVIP to
DBServer2
With this scenario, DBServer2
could have auto increment keys that do not exist yet. Upon failover, the DBVIP will allow WebServers to connect to DBServer2
and ask for data that does not exist yet.
This would therefore require background processes running on each DBServer.
For the above scenario:
- DBVIP is on
DBServer1
DBServer1
runs HeartBeat
DBServer2
runs HeartBeat
- Background Process on
DBServer1
to monitor
- a) Data Mount Availability
- b) Data Mount Writeability
- c) MySQL Connectivity
- Once a,b, or c fail, kill HeartBeat
Background Process on DBServer2
to make sure DBVIP is pingable
What should killing HeartBeat do? Trigger the startup script defined for it.
What should the startup script on DBServer2
look for?
- Loop until DBVIP is unreachable via ping
- Connect to MySQL and
- Run
SHOW SLAVE STATUS\G
in a Loop until Seconds_Behind_Master
is NULL
- RUn
SHOW SLAVE STATUS\G
in a Loop until `Exec_Master_Log_Pos stops changing
- Assign DBVIP to
DBServer2
via ip addr add
This is essentailly the algorithm for failing over safely to a Passive Master in a Master/Master Replication Cluster.
ALTERNATIVE
If ALL your data is InnoDB, I recommend something with less rigor. Perhaps you should look into using DRBD and HeartBeat. Here is why:
DRBD provides network RAID-1 for a Block Device on two servers.
You would essentially do this:
- Have
DBServer1's
DRBD Block Device as Primary
- Have
DBServer2's
DRBD Block Device as Secondary
- Mount
DBServer1's
DRBD Device on /var/lib/mysql
- Startup MySQL on
DBServer1
- Have HeartBeat Monitor Ping Activity Between Servers
What would startup script look like in a DRBD scenario?
- Loop until DBVIP is unreachable via ping
- Kill HeartBeart
- Disconnect DRBD
- Promote DRBD to Primary
- Mount DRBD on /var/lib/mysql
- Start MySQL (InnoDB Crash Recovery Fills in Missing Data)
- Assign DBVIP via
ip addr add
This is a lot more straightforward because only one side is Active. The Passive side (DRBD Secondary) is a Synchronous Disk Copy of the Active Side (DRBD Primary).
CAVEAT
If all or most of the working set data is MyISAM, do not touch DRBD. Crash scnearios quickly result in MyISAM tables being marked crashed and need auto-repair (which can be paintfully slow to wait for).
UPDATE 2012-12-29 08:00 EDT
Here are my past posts on using DRBD with MySQL
Best Answer
1a. Warm standby is a "live", incremental backup fed with complete blocks of changes (wal segments) 16 mb each, that are send to the standby node once they're filled. You cannot query a warm standby node. 16 mb of changes (by default) can mean a lot of transactions, should the master fail, they'll be lost.
1b. Hot Standby. (also a "live" incremental backup) .small changes are send to the slave (wal records, which are tiny parts of a wal segment). You can query (read only) the hot standby node. The window for lost transactions should the master fail is very tiny. There are synchronous and asynchronous hot standby nodes, a synchronous node will force the master to wait for it to confirms the application of changes and then the master will commit the transaction.In asynchronous replication the master sends the wal records and does not wait for confirmation. The former requires a very reliable and fast link between the master and the slave,also adds overhead to the master but guarantees no data loss.
Regarding incremental backups: 1. You take a base copy of your whole database installation. 2. Ship it to the slave. 3. Configure it to catch up for changes.
Streaming Replication (hot standby) is the winner here. I personally prefer asynchronous replication as it does not impose a considerable burden on the master and the replication lag is very small (a couple of seconds in many cases)
One complement to this setup is pg-pool. It acts as a proxy between the application and the servers participating in a replication configuration like the one described above, it has load balancing and parallel query capabilities. It's able also to provide automatic failover. http://www.pgpool.net/pgpool-web/contrib_docs/simple_sr_setting/index.html