Percona Xtradb Cluster 5.5: node can’t join cluster – SST fails with auth problem

perconaxtradb-cluster

I have 3-nodes PXC 5.5 cluster. Settings on node03 in my.cnf [mysqld] section:

wsrep_cluster_address=gcomm://<ip_node1>,<ip_node2>
wsrep_cluster_name=cluster
wsrep_provider_options="gcache.size=128M"
wsrep_node_address=<ip_node3>
wsrep_node_name=node03
wsrep_slave_threads=16
wsrep_provider=/usr/lib/libgalera_smm.so
wsrep_sst_method=xtrabackup-v2
wsrep_sst_auth=sst:XXXXXXXXXXXXXX

Settings on other 2 nodes are the same. node03 can not join cluster:

160807  5:30:18 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '<ip_node3>' --auth 'XXXXXXXXXXXXXX' --datadir '/var/lib/mysql/data/' --defaults-file '/etc/mysql/my.cnf' --parent '9532': 2 (No such file or directory)

Errors in donor node error.log:

160807  5:13:30 [ERROR] WSREP: Command did not run: wsrep_sst_xtrabackup-v2 --role 'donor' --address '<ip_node3>:4444/xt
rabackup_sst' --auth '(null)' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/data/' --defaults-file '/et
c/mysql/my.cnf' --gtid 'f964a515-38a3-11e6-8d63-47f60c48dba5:38453358'

Errors in donor node /var/lib/mysql/data/innobackup.backup.log:

160807 05:24:35 innobackupex: Starting the backup operation

IMPORTANT: Please check that the backup run completes successfully.
           At the end of a successful backup run innobackupex
           prints "completed OK!".

160807 05:24:35 Connecting to MySQL server host: localhost, user: mysql, password: not set, port: 3306, socket: /var/run/mysqld/mysqld.sock
    Failed to connect to MySQL server: Access denied for user 'mysql'@'localhost' (using password: NO).

--auth '(null)' and Access denied for user 'mysql'@'localhost' indicate that wsrep_sst_auth is not set.

But it exists in my.cnf, as you can see. I checked sst:XXXXXXXXXXXXXX on donor node on localhost, it works.

Why joiner node does not use wsrep_sst_auth from config?

Best Answer

I managed to perform SST by creating default user mysql@localhost without password on donor node, with required grants RELOAD, LOCK TABLES, REPLICATION CLIENT. Ugly workaround, but it worked.

On the next day I stopped mysql on node03 and started it after several minutes. SST started normally, with user and password from wsrep_sst_auth variable. Bug disappeared, I don't know what it was.