MySQL Replication Lag Check

MySQLreplication

we are about to replicate our database servers into several slaves in different physical locations.

The company which is doing replication for us, has asked to give them a sql select statement on one of our tables to check lag and latency like

SELECT id FROM table1 ORDER BY id DESC LIMIT 1

Is it a common practice to check the lag between several instances?

Thank you in advance.

Best Answer

It's unclear what they want to measure. To see the network lag, all they need is ping in the OS. That should be much less than a second, even to the other side of the world. (Unless they have a sloppy network.)

Seconds_behind_master is problematical. It is a handy tool for the DBA, but it can be "wrong" for many reasons.

  • When running a long query, it shows how long ago the query started on the Master. So, Seconds_behind_master will rise as long as that query is running on the Slave.
  • If you flood the Master with lots of writes from lots of connections, the Slave can get "behind". This is congestion because of single-threaded Slave, not because of network lag.

One thing it can tell (I think) -- If the timezones are inconsistent, or they don't have a good clock sync (NTP).

A SELECT will not propagate through replication, only writes. So, I see no use for your suggested SELECT.

If you are running at least 5.6.4: CREATE a DATABASE and a login limited to that database (GRANT ALL ON db.* ...). Give them the user name an password. Let them INSERT into the Master and see what pops up in the Slave. I think this is what is needed (but I have not tried it):

-- Create on Master and propagate to Slave:
CREATE TABLE Timer (
    id INT NOT NULL,
    raw_time TIMESTAMP(6) NOT NULL,
    sys_time TIMESTAMP(6) NOT NULL,
    PRIMARY KEY(id)
) ENGINE=InnoDB;
-- Run on Master:
REPLACE INTO Timer (id, raw_time, sys_time) VALUES (1, NOW(6), SYSDATE(6));
-- Run on Slave:
SELECT * FROM Timer;

The difference of the two columns will say how much delay where was -- Including

  • A tiny delay between NOW and SYSDATE on the Master
  • Network delay
  • Slave backlog (if something was running on the Slave when the REPLACE was being replicated; this can be minimized by using multi-threaded replication)

Note: The (6) everywhere gives you microsecond precision. This is more than sufficient to measure the milliseconds for replicating cross-country.

Note: That technique won't work if you run with --sysdate-is-now.

Point the Company at this Answer; challenge them to top it.