Mariadb replication issues

mariadbreplication

First, I am not a DBA. Second, the person who set up mariadb left company recently.

On a corporate system, we have mariadb 5.5.2 running on two Dell servers, CentOS 7.3. Servers are about 16 months old. Never rebooted until July 2017. Noticed after that that data from server 1 (master) was not updating on server 2, which is slave. Missing some rows.

Following notes on mariadb replication here: https://mariadb.com/kb/en/the-mariadb-library/setting-up-replication/

Can connect slave server 2 to master server 1 with this:

CHANGE MASTER TO
MASTER_HOST='SOMEIP',       
MASTER_USER='esg',       <<< -- replication user, i think
MASTER_PASSWORD='*********',                
MASTER_PORT=3306,
MASTER_LOG_FILE='mysql-bin.000012', 
MASTER_LOG_POS=5067,
MASTER_CONNECT_RETRY=10;

On server 2, SHOW SLAVE STATUS shows a connection to the master SOMEIP.

When I run this command, from the link above:

GRANT REPLICATION SLAVE ON *.* TO esg;

I get:

ERROR 1045 (28000): Access denied for user 'esg'@'localhost' (using password: YES)

Running same command on master server 1 as --skip-grant-tables, user root, says operation not permitted under skip-grant-tables.

Wondering which way to go here. esg is only user I know about.


I set up replication with these directions:

http://www.techrepublic.com/article/how-to-set-up-database-replication-with-mariadb/

Currently see this on slave:

Slave_IO_State: Waiting for master to send event.

All other variables like Seconds_Behind_Master=0 look good. Slave_SQL_Running, Slave_IO_Running are YES. Never see replication tables/databases on slave.

Why do tutorials say to copy databases from master to slave? Replication should handle this.

Background

Added configurations for bind-address=0.0.0.0 to master/slave on both master and slave and create unique user ids for each of two affected servers. Get master_log_pos and master_log_file from master. Run these commands:

GRANT REPLICATION SLAVE ON *.* TO root;
​GRANT ALL PRIVILEGES ON *.* TO 'root'@'someIP' IDENTIFIED BY 'somePassword' WITH GRANT OPTION;

On slave, connect to master with command like this:

CHANGE MASTER TO MASTER_HOST='SOMEIP', MASTER_USER='esg',    
MASTER_PASSWORD='*********', MASTER_LOG_FILE='master1-bin.000012',MASTER_LOG_POS=5067;
START SLAVE;

Best Answer

If the esg user was already used for replication, you shouldn't need to grant privileges again. Login to the master server with he esg user, and execute show grants. This will confirm whether or not the esg user has REPLICATION SLAVE privilege.

If you don't already have it, get the root password set to something you know. That will be helpful for troubleshooting and setup/resync.

If you are unsure of the replication state, execute SHOW SLAVE STATUS and check the fields SLAVE_SQL_Running and SLAVE_IO_Running. If either of those are NO, then check the status for Error messages and #'s. If Replication is down, then that may be a cause of missing data.

If your SLAVE threads are running, check Seconds_Behind_Master. If that is not 0 (or something relatively close to 0), then your slave is behind, and that could be the missing data.

If the slave is up to date, and your missing some data, another possible cause is that another user with write privileges accidentally inserted/updated/deleted on the Slave instead of the master. Yes, this does happen.

Another possibility is data inconsistency from using Statement Based Replication, as opposed to Row Based Replication. Check show global variable like 'bingo_format' to determine if you're using statement or row based replication.

Beyond that, you might need to break your question down a bit more to understand what you're doing (rebuilding rep, vs restarting) and focusing in on one piece at a time.