Mariadb – Galera – single database backup & restore

backupgaleramariadbrestore

I have a MariaDB Galera cluster. One of its nodes is dedicated for the backups (pc.weight=0 and the MaxScale SQL proxy doesn't know it exists).

I'm using mariabackup --backup --galera-info --target-dir=/path/to/my/dir to do the daily backups. So this backs up everything.

I want to be able to do 2 kinds of restore:

  1. full restore in case of disaster (e.g. data center explosion)
  2. single database restore in case of data loss (e.g. human error/hacking)

The case 1 is easy, I just take the backup I have and use mariabackup --prepareand mariabackup --copy-back on the new freshly installed Galera nodes.

But how can I achieve case 2?

I know I could do a mariabackup --prepare and then for each table of the database I want to restore:

  1. discard the table space of table X
  2. copy back the file(s) of table X from my backup
  3. import the table space

The problem is that this process is too low-level for Galera/wsrep to notice it and the changes are of course not replicated to the other nodes. So I would need to repeat this process on all the nodes.

I could also:

  1. take a logical backup (mysqldump)
  2. drop the database on one of the Galera nodes
  3. re-create the database
  4. import the dump I just did

This would be replicated to all the nodes. But it's highly inefficient and I would need to setup a temporary MariaDB server to run the mysqldump on the backed up data.

There must be a better way, right? How are you doing it?

Your answers are much appreciated!

Best Answer

There are multiple ways to backup and restore but as always this depends on your needs and varies among setups, here is just an idea that could help minimize the recovery on time point in case need to restore.

Before, is important to understand that in a MariaDB Galera cluster, all the nodes are "masters" and by using pc.weight=0 you are just modifying the quorum check Weighted Quorum for a Master and Multiple Slaves Scenario, from the docs:

node1: pc.weight = 1
node2: pc.weight = 0

Under this pattern, if node1 dies, all remaining nodes end up as non-primary components. If any other node dies, the Primary Component is preserved. In the case of network partitioning, node1 always remains as the Primary Component.

But this can't be used as a backup strategy since once your data has being modified it will be replicated among all the nodes.

Regarding the backup and restore scenario in case of a human error, hacked DB or hardware malfunction, etc, it will depend on how periodically you were backing up, your last on time restore point will be the latest date of your backup. You could create backups daily or even every 12 hours, a heavy time-consuming process depending on your DB size.

As an easy but more "expensive" alternative to this method, you could attach one or more slaves to your cluster, One to be in sync (no delayed) and a second one using a delayed replication in where you could define a delay for example 1 hour. (you could set the delay to your "reaction" time for example)

However, is important to monitor the state of the cluster(masters) so in case something unexpected happened you could stop the replication (at least on the delayed slave) since you would like to use it as your earlies restore on time point.