MariaDB – How to Fix WSREP Failed to Recover Position Error on Ubuntu

galeramariadbmariadb-10.1MySQLUbuntu

I'm using 10.1.16-MariaDB-1~xenial from the official MariaDB apt repository for 10.1 [stable], via the University of Texas mirror.

I had a perfectly functioning MariaDB Galera cluster setup on 3 Ubuntu 16.04 servers.

Then I upgraded them. Now I have nothing.

The upgrade to 10.1.16 failed, and quickly brought down the whole cluster. I don't have the output, but dpkg failed on setting up mariadb-server and mariadb-server-10.1.

I have backups, so I purged all traces of MariaDB/MySQL/Galera from my servers (including removing /var/lib/mysql/, /etc/mysql/, and /var/log/mysql/) and started over. However, now, with a clean install on each server, none of the standard system startup scripts work. I suspect this is why the upgrade process through apt failed, too.

I've tried each of the following on my first node:

galera_new_cluster
service mysql bootstrap
service mysql bootstrap --wsrep-new-cluster
service mysql bootstrap --wsrep-cluster-address="gcomm://"
service mysql start
service mysql start --wsrep-new-cluster
service mysql start --wsrep-cluster-address="gcomm://"
systemctl start mariadb
systemctl start mariadb --wsrep-new-cluster
systemctl start mariadb --wsrep-cluster-address="gcomm://"

Every single one gives me the same output:

Job for mariadb.service failed because the control process exited with error code. See "systemctl status mariadb.service" and "journalctl -xe" for details.

systemctl status mariadb.service:

● mariadb.service - MariaDB database server
   Loaded: loaded (/lib/systemd/system/mariadb.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/mariadb.service.d
           └─migrated-from-my.cnf-settings.conf
   Active: failed (Result: exit-code) since Fri 2016-07-22 13:29:45 CDT; 42s ago
  Process: 10799 ExecStartPre=/bin/sh -c VAR=`/usr/bin/galera_recovery`; [ $? -eq 0 ] &&   systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=1/FAILURE)
  Process: 10794 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
 Main PID: 16865 (code=exited, status=0/SUCCESS)

Jul 22 13:29:41 sql2 systemd[1]: Starting MariaDB database server...
Jul 22 13:29:45 sql2 mysqld[10799]: WSREP: Failed to recover position: '2016-07-22 13:29:41 140110745778432 [Note] /usr/sbin/mysqld (mysqld 10.1.16-MariaDB-1~xenial) starting as process 11080 ...'
Jul 22 13:29:45 sql2 systemd[1]: mariadb.service: Control process exited, code=exited status=1
Jul 22 13:29:45 sql2 systemd[1]: Failed to start MariaDB database server.
Jul 22 13:29:45 sql2 systemd[1]: mariadb.service: Unit entered failed state.
Jul 22 13:29:45 sql2 systemd[1]: mariadb.service: Failed with result 'exit-code'.

The only way I can start my servers now is by manually executing:

sudo -u mysql mysqld --wsrep-cluster-address='gcomm://'

On the first node, and then:

sudo -u mysql mysqld --wsrep-cluster-address='gcomm://ip1,ip2,ip3'

On the other two nodes. That works, and I have a working cluster again. But now, systemd/systemctl have no idea the service is running. It seems like the systemd startup scripts can't use the wsrep-cluster-address setting in my configuration files at all. Specifying it to service or systemctl command line does not work either.

How am I supposed to start mariadb?

Best Answer

There was a bug in galera_recovery.sh script. https://jira.mariadb.org/browse/MDEV-10396

Related Solutions

MySQL Won’t Start After DB Import and Password Change – Fix Guide

I keep seeing permission denied errors in the log.

You didn't run the chown command recursively

Please do this

chown -R mysql:mysql ./path/to/rh-mysql56

I also see errors like

2016-06-30 14:14:48 9138 [ERROR] InnoDB: Tablespace open failed for '"mysql"."innodb_index_stats"', ignored.

When you installed mysql before, it created 5 InnoDB system tables (See my answers Cannot open table mysql/innodb_index_stats and InnoDB: Error: Table "mysql"."innodb_table_stats" not found after upgrade to mysql 5.6).

You need to recreate them.

First you need to delete the orphaned .frm files

cd ./path/to/rh-mysql56/mysql
rm -f innodb_index_stats.frm
rm -f innodb_table_stats.frm
rm -f slave_master_info.frm
rm -f slave_relay_log_info.frm
rm -f slave_worker_info.frm
rm -f innodb_index_stats.ibd
rm -f innodb_table_stats.ibd
rm -f slave_master_info.ibd
rm -f slave_relay_log_info.ibd
rm -f slave_worker_info.ibd

Then, you can recreate them with

USE mysql
CREATE TABLE `innodb_index_stats` (
  `database_name` varchar(64) COLLATE utf8_bin NOT NULL,
  `table_name` varchar(64) COLLATE utf8_bin NOT NULL,
  `index_name` varchar(64) COLLATE utf8_bin NOT NULL,
  `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `stat_name` varchar(64) COLLATE utf8_bin NOT NULL,
  `stat_value` bigint(20) unsigned NOT NULL,
  `sample_size` bigint(20) unsigned DEFAULT NULL,
  `stat_description` varchar(1024) COLLATE utf8_bin NOT NULL,
  PRIMARY KEY (`database_name`,`table_name`,`index_name`,`stat_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0;
CREATE TABLE `innodb_table_stats` (
  `database_name` varchar(64) COLLATE utf8_bin NOT NULL,
  `table_name` varchar(64) COLLATE utf8_bin NOT NULL,
  `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `n_rows` bigint(20) unsigned NOT NULL,
  `clustered_index_size` bigint(20) unsigned NOT NULL,
  `sum_of_other_index_sizes` bigint(20) unsigned NOT NULL,
  PRIMARY KEY (`database_name`,`table_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin STATS_PERSISTENT=0;
CREATE TABLE `slave_master_info` (
  `Number_of_lines` int(10) unsigned NOT NULL COMMENT 'Number of lines in the file.',
  `Master_log_name` text CHARACTER SET utf8 COLLATE utf8_bin NOT NULL COMMENT 'The name of the master binary log currently being read from the master.',
  `Master_log_pos` bigint(20) unsigned NOT NULL COMMENT 'The master log position of the last read event.',
  `Host` char(64) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '' COMMENT 'The host name of the master.',
  `User_name` text CHARACTER SET utf8 COLLATE utf8_bin COMMENT 'The user name used to connect to the master.',
  `User_password` text CHARACTER SET utf8 COLLATE utf8_bin COMMENT 'The password used to connect to the master.',
  `Port` int(10) unsigned NOT NULL COMMENT 'The network port used to connect to the master.',
  `Connect_retry` int(10) unsigned NOT NULL COMMENT 'The period (in seconds) that the slave will wait before trying to reconnect to the master.',
  `Enabled_ssl` tinyint(1) NOT NULL COMMENT 'Indicates whether the server supports SSL connections.',
  `Ssl_ca` text CHARACTER SET utf8 COLLATE utf8_bin COMMENT 'The file used for the Certificate Authority (CA) certificate.',
  `Ssl_capath` text CHARACTER SET utf8 COLLATE utf8_bin COMMENT 'The path to the Certificate Authority (CA) certificates.',
  `Ssl_cert` text CHARACTER SET utf8 COLLATE utf8_bin COMMENT 'The name of the SSL certificate file.',
  `Ssl_cipher` text CHARACTER SET utf8 COLLATE utf8_bin COMMENT 'The name of the cipher in use for the SSL connection.',
  `Ssl_key` text CHARACTER SET utf8 COLLATE utf8_bin COMMENT 'The name of the SSL key file.',
  `Ssl_verify_server_cert` tinyint(1) NOT NULL COMMENT 'Whether to verify the server certificate.',
  `Heartbeat` float NOT NULL,
  `Bind` text CHARACTER SET utf8 COLLATE utf8_bin COMMENT 'Displays which interface is employed when connecting to the MySQL server',
  `Ignored_server_ids` text CHARACTER SET utf8 COLLATE utf8_bin COMMENT 'The number of server IDs to be ignored, followed by the actual server IDs',
  `Uuid` text CHARACTER SET utf8 COLLATE utf8_bin COMMENT 'The master server uuid.',
  `Retry_count` bigint(20) unsigned NOT NULL COMMENT 'Number of reconnect attempts, to the master, before giving up.',
  `Ssl_crl` text CHARACTER SET utf8 COLLATE utf8_bin COMMENT 'The file used for the Certificate Revocation List (CRL)',
  `Ssl_crlpath` text CHARACTER SET utf8 COLLATE utf8_bin COMMENT 'The path used for Certificate Revocation List (CRL) files',
  `Enabled_auto_position` tinyint(1) NOT NULL COMMENT 'Indicates whether GTIDs will be used to retrieve events from the master.',
  PRIMARY KEY (`Host`,`Port`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 STATS_PERSISTENT=0 COMMENT='Master Information';
CREATE TABLE `slave_relay_log_info` (
  `Number_of_lines` int(10) unsigned NOT NULL COMMENT 'Number of lines in the file or rows in the table. Used to version table definitions.',
  `Relay_log_name` text CHARACTER SET utf8 COLLATE utf8_bin NOT NULL COMMENT 'The name of the current relay log file.',
  `Relay_log_pos` bigint(20) unsigned NOT NULL COMMENT 'The relay log position of the last executed event.',
  `Master_log_name` text CHARACTER SET utf8 COLLATE utf8_bin NOT NULL COMMENT 'The name of the master binary log file from which the events in the relay log file were read.',
  `Master_log_pos` bigint(20) unsigned NOT NULL COMMENT 'The master log position of the last executed event.',
  `Sql_delay` int(11) NOT NULL COMMENT 'The number of seconds that the slave must lag behind the master.',
  `Number_of_workers` int(10) unsigned NOT NULL,
  `Id` int(10) unsigned NOT NULL COMMENT 'Internal Id that uniquely identifies this record.',
  PRIMARY KEY (`Id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 STATS_PERSISTENT=0 COMMENT='Relay Log Information';
CREATE TABLE `slave_worker_info` (
  `Id` int(10) unsigned NOT NULL,
  `Relay_log_name` text CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  `Relay_log_pos` bigint(20) unsigned NOT NULL,
  `Master_log_name` text CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  `Master_log_pos` bigint(20) unsigned NOT NULL,
  `Checkpoint_relay_log_name` text CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  `Checkpoint_relay_log_pos` bigint(20) unsigned NOT NULL,
  `Checkpoint_master_log_name` text CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  `Checkpoint_master_log_pos` bigint(20) unsigned NOT NULL,
  `Checkpoint_seqno` int(10) unsigned NOT NULL,
  `Checkpoint_group_size` int(10) unsigned NOT NULL,
  `Checkpoint_group_bitmap` blob NOT NULL,
  PRIMARY KEY (`Id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 STATS_PERSISTENT=0 COMMENT='Worker Information';

SQL Server – Unable to Start on Ubuntu 16.04

SQL Server did not start because the OS did not have 3250MB of memory. Was tracked by below command

journalctl -u mssql-server.service -b

Nov 30 00:43:21 OraServer sqlservr[4075]: 
 sqlservr: This program requires a machine with at least 3250 megabytes of memory.

Memory was added and SQL Server was started:

systemctl status mssql-server

Best Answer

Related Solutions

MySQL Won’t Start After DB Import and Password Change – Fix Guide

SQL Server – Unable to Start on Ubuntu 16.04

Related Question