To be truly highly available you will need some logic built into the application to handle this. You can do this sort of thing with pg_pool, but what if pg_pool breaks?
I think there are many ways you could work this one, but the way I would do it would be:
Master DB (lets call it A), and two slaves (lets call them B and C). B pulls changes from A and C pulls changes from B in a cascading replication setup. This means if A fails and is a gonner promote B, no need to change C as its already a slave of B, when A is back you make it a slave of C and so on.
You will need to make changes to your application so that it has some intelligence in it regarding the state of the database. This would be typically done by having a layer that manages the connections, and it runs a test query (like "SELECT current_timestamp") if the query passes all is fine and it gives the connection to your application to use. Each application will need to know a list of servers to use in which order so the one that writes will use A first then B. So in the event the primary fails it tries the secondary (and potentially the third and so on), until it gets one where the test query works.
With this in place if C fails its only read only, the applications will just fail over to B or A and start using C if you get it back. If B fails, same deal though you might want to do something with C pointing it to A. If A fails then the system will go read only (but will still work for all read queries). There will then need to be some human intervention to say "yep A is gone", promote B (which will only take seconds), when B is promoted the writers will go though the connection layer, it will test A and fail test B and pass to the application will get a connection to B which is now a master so writes will be allowed.
I appreciate that this is still not a full HA solution (as there is still a manual step, but its a very simple and quick one), but I think we are going to have to wait for PostgreSQL 9.5 for that.
As I say I dont think there is one dead cert right answer to this, but this is how I would have done it.
Hope it helps
Best Answer
Bill Karwin has already given the best answer that I can think of... but I think some additional explanation might be in order as to why the link you posted in the original question probably does not provide a sufficient solution.
Your DBA is, at least in one sense, correct. The problem is that the illustration is deceptive. The three distributed nodes in the middle look like they are database servers sending updates back to the master, which MySQL can't really offer. You'll notice that the 3 nodes are actually labeled as "web servers."
It is apparent from the configuration examples that all the MySQL slave instances are doing in this setup is providing the application with read-only access to local replicas of the master database, but the web (application) server is actually sending all of its writes back to the master database -- not through to the local MySQL instances.
That's what MySQL can do -- scale out your ability to read from asynchronous replicas. It is unlikely that you'd want the ability for applications to write to the database to be dependent on the single point of failure that the document includes. Yes, you could have multiple masters, but depending on how the application works, it's only safe to direct writes to one such master at a time because there's no provision in multi-master asynchronous (ordinary MySQL) replication to handle the case when conflicting data gets written simultaneously from two (or more) masters.
PXC and MariaDB in cluster configuration don't have that same problem because the cluster nodes don't operate asynchronously... although there's a different problem, since if one node gets isolated from the other two, it can't be written to because in the case of a network split, the server that is left in the minority won't accept writes (to avoid inconsistency). If each of them is isolated from all of the others, then none of them will work, because they are all in the minority and have no idea what has happened to their peers. That's not to say it's not a great solution -- it's just not a perfect one.
But there is no perfect solution.
If your data is such that the following is practical, there's a third approach. This would be a scenario where each site has its own set of uniquely-named schema+table sets (that is, table_schema + table_name is unique to each site, table names could be the same if database names are different) that are written to, but needs to be able to read the data from the other sites.
MariaDB multi-source replication could be configured such that you have a centralized server that is configured as a slave to all of the remote servers, and serves as master to all of the remote servers. Each replication event received by this central server would be written to the binlog and sent out to all of the remotes, which would execute it locally. The remotes would log their local changes, sending replication events back to the master for distribution to all of the other remotes... sort of a hub-and-spoke configuration. The central server would only consolidate the data, it wouldn't be used by an instance of the application... and the remote servers wouldn't necessarily even have to be MariaDB -- only the central one -- I have MySQL 5.5 and MariaDB 10.0 systems interconnected and replicating to each other (5.5 <-> 10.0) without issue in my operation. The current limit of "spoke" servers would be 64.
But again, this is only practical if each site needs to see, but not change, the data from the other sites, and/or access a common set of read-only data while updating a uniquely-named local set of tables. In this scenario, a site's isolation would only delay their access to the latest data from outside the local environment but would not impair their ability to write to the things they should be able to write.