Mysql – How to determine master in thesql master-slave

MySQLreplication

I am setting up MySQL Master-slave replication and I am trying to figure out how to handle the failover situation where I promote the slave to master (in the event that the master goes down).

My application server needs to direct all writes to the current master, but I cannot use server level HA between the master and slave (heartbeat, keepalived) since the two db servers are on completely different subnets in different physical locations.

I think this is something that I need to handle at the application level. I can query the two servers and ask which one is a master, then perform all queries to that one.

Is there a query in MySQL to see if the current server is a master in a master-slave replica?

Best Answer

@RolandoMySQLDBA has answered the question accurately ... but he also pointed out that his solution was "quick-and-dirty."

And that is a very true statement. :)

The thing that concerns me here is not with that answer, but rather is that the original question appears to make an incorrect assumption:

I can query the two servers and ask which one is a master, then perform all queries to that one.

The problem is that in MySQL replication, the master is never truly aware that it is the master.

The concept of "promotion to master" is not really a concept in MySQL asynchronous replication. "Promoting" a MySQL server to the master role is something that happens "external to" MySQL servers, as opposed to something that happens "internal to" MySQL servers.

"Promotion to master" not done by any kind of server provisioning, because, technically speaking, every MySQL server that has binary logging enabled is a master, even if it never has a slave. SHOW MASTER STATUS works exactly the same way and returns exactly the same result, slaves or not, and a master with 2 slaves is no more or less a master than a master with 1 slave or 0 slaves. Similarly, a master whose slaves are all offline is still just as much a master, because when the slaves come back online, they'll pick up replicating where they left off.

In a sense, the only "awareness" on the part of either server is not whether it is a master, but rather, whether it is a slave (or "not").

That's what Rolando's solution asks: "are you a slave?" If the answer is no, then the assumption is that this must be the master... which he also pointed out as a flawed assumption if STOP SLAVE; is issued. But a stopped slave is still a slave, so "not a slave" (at any moment in time) doesn't equate to "being a master."

A similar test could be done on the presumed master:

SELECT COUNT(1) FROM information_schema.processlist
 WHERE user = 'the_username_used_by_the_slave';

or

SELECT COUNT(1) FROM information_schema.processlist
 WHERE command = 'binlog dump';

If the value is zero, then the slave's IO thread is not connected. This test has a similar defect, in that if the slave is disconnected administratively, isolated, or failed, then it won't be connected. So this doesn't really solve anything either.

Worse still (for either of these scenarios) the information_schema.processlist "table" is a virtual table that gets materialized every time it is selected from, and this takes time and costs resources. The busier your server, the more it costs, because each thread's activity has to be peered-into.

A more lightweight solution would be:

SELECT @@global.read_only;

On a slave, you could/should set the global variable read_only so that users without the SUPER privilege cannot unintentionally write to it (and your application should not have SUPER). If you manually "promote" the slave to the master role, you SET GLOBAL read_only = OFF to enable writes. (Replication can always write to the slave, no matter how this is set).

But this still, I think, misses an important point:

I would propose that the application should not be making this decision heuristically in a master/slave setup, and certainly not on a connection-by-connection basis. The application should use either a hard configuration option, or the application should remain unaware and have the database connection destination handled by something else.

Or, at minimum, the application should never switch over until the master fails, and then it should never switch back on its own.

Here's why I say that: once the "decision" is made -- by whoever or whatever -- to make another server the master, the application cannot be allowed for any reason to switch back to the original master, even after it comes back online, without intervention.

Let's say you hit a bug and there's a software-forced crash; mysqld_safe dutifully restarts mysqld, and InnoDB crash recovery performs flawlessly. But that takes a few minutes.

Meanwhile, the master is down so your application has switched to the slave. Transactions have been created, orders placed, funds transferred, comments posted, blogs edited, whatever your system does.

Now, the original master comes back online.

If your application switches back to the original master, you are in an absolute world of hurt, because the very next thing that is likely to happen is that replication stops due to an inconsistency, because your application has changed data on the slave in the mean time. You now have two database servers with inconsistent data that you will have to reconcile manually. If there are dollars or points or credits involved, you now have mismatched balances.

So it's critical that the application not be allowed to switch back to the original master without your intervention.

Wait, did you just find the problem with this scenario as I described it? The master has failed but your application won't be using the slave, because it thinks the slave is still the slave and not the master... the information_schema.processlist query on the slave will still return non-zero even if the master server is powered down.

So there isn't much point in the application discovering anything, since you'll have to manually STOP SLAVE for that test to be useful.

Perhaps a better approach if you want the application to be able to switch would be to configure the servers with circular replication.

Circular replication has inherent problems of its own, but as long as your application is only always ever writing to one server at a time, most of those issues become non-issues. In other words, both machines are always and simultaneously both master and slave, in a replication sense, but your application, via some mechanism, is always only pointing to one machine at a time as the "master" to which it can and should write.

You can't deploy HA tools on the MySQL servers due to their separation, but you could implement it with HAProxy running on the application server(s). The application connects to "MySQL" on localhost, which isn't MySQL at all, but is actually HAProxy ... and it forwards the TCP connection to the appropriate MySQL machine.

HAProxy can test the connections to the MySQL servers and only offer traffic to a MySQL machine that is accepting connections and allowing authentication.

The combination of HAProxy running on the application server (its demand for resources won't be substantial compared to everything else the application server has to do -- it's pretty much just tying sockets together and ignoring their payload) ... and MySQL circular replication would be the approach I would probably take in this instance, based on what's known from the question.

Or, for a strictly manual setup, go with something much simpler than "discovery," like an entry in the app server's /etc/hosts file with a hostname that the application uses to connect to MySQL, which you could manually update -- assuming promotion of slave to master is intended to be a manual process.

Or, something more complex, using Percona XtraDB Cluster. For this, though, you'd want to add a third server, because with 3 nodes in PXC, if 2 servers can see each other but get isolated from 1 server (if all three are still running) the 2 servers keep running happily but the 1 server curls up into a little ball and refuses to do anything since it realizes it must be the odd one out. This works because the 2 realize they still constitute a majority of the nodes that were online before the network split and the 1 realizes that it isn't. With PXC, it doesn't really matter which server your application connects to.

I say all of this is to say "don't have the application poll the servers to see which one is the master" because it will bite you sooner or later and it will nibble away at your performance up until the day it bites.