The Basics
For *nix OSes, the document uses a shell expansion for simple naming of folders with “todays” date+time, eg
shell:~ # d="$(date '+%d%b_%H%M')" shell:~ # echo $d 13Nov_1338
It is, of course, possible to adjust this as any existing naming conventions dictate.
Simple high level backup / restore commands:
creates an image backup ( recommend optionally using –skip-binlog for a more efficient backup size )
mysqlbackup --defaults-file=/etc/my.cnf --backup-image=/backups/my.mbi_$(date '+%d%b_%H%M') --backup-dir=/backups/backup_$(date '+%d%b_%H%M') --user=root -pD0lph1n5! --host=127.0.0.1 backup-to-image
The above command line will be extrapolated to ( assuming the variable $d as explained above ):
mysqlbackup --defaults-file=/etc/my.cnf --backup-image=/backups/my.mbi_13Nov_1338 --backup-dir=/backups/backup_13Nov_1338 --user=root -pD0lph1n5! --host=127.0.0.1 backup-to-image
restores backup
mysqlbackup --defaults-file=/etc/my.cnf --backup-image=/backups/my.mbi_13Nov_1338 --backup-dir=/tmp/restore_$(date +%d%m_%H%M) copy-back-and-apply-log
Example – Failed node
Consider the existing REPLICATION GROUP;
root@one238 [(none)]> select * from performance_schema.replication_group_members; +---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION | +---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+ | group_replication_applier | 0f77e8be-dfc6-11e8-b5c0-0800278e6619 | one236.fitz.lan | 3306 | ONLINE | SECONDARY | 8.0.13 | | group_replication_applier | 4ebecd8b-dfd3-11e8-bed5-0800278e6619 | one223.fitz.lan | 3306 | ONLINE | SECONDARY | 8.0.13 | | group_replication_applier | 9f742055-dfd2-11e8-b021-0800271600ae | one238.fitz.lan | 3306 | ONLINE | PRIMARY | 8.0.13 | +---------------------------+--------------------------------------+-----------------+-------------+--------------+-------------+----------------+ 3 rows in set (0.00 sec)
Primary node is currently : one238.
It will be stopped and the full data directory will be deleted , to simulate a unrecoverable node failure, in which a component needs replaced and the OS and all software has been reinstalled.
This is simulated as follows
one236:# ~ # systemctl stop mysqld one236:# ~ # rm -rf /var/lib/mysql/* one236:# ~ #
1. Take backup
A recent backup has been taken of a secondary node:
one223:# ~ # mysqlbackup --defaults-file=/etc/my.cnf --backup-image=/backups/my.mbi_$(date '+%d%b_%H%M') --backup-dir=/backups/backup_$(date '+%d%b_%H%M') --user=root -p --host=127.0.0.1 backup-to-image MySQL Enterprise Backup version 8.0.13 Linux-4.1.12-112.14.13.el7uek.x86_64-x86_64 [2018-10-07 09:31:51] Copyright (c) 2003, 2018, Oracle and/or its affiliates. All Rights Reserved. [Build ID : 13776648.3672fb2ad9f6c91cd2527866d46fa583b70bfefd] 181104 15:21:23 MAIN INFO: A thread created with Id '140714919446656' 181104 15:21:23 MAIN INFO: Starting with following command line ... mysqlbackup --defaults-file=/etc/my.cnf --backup-image=/backups/my.mbi_04Nov_1521 --backup-dir=/backups/backup_04Nov_1521 --user=root -pxxxxxxxxx --host=127.0.0.1 backup-to-image .... 181104 15:21:35 MAIN INFO: Group replication setup detected. 181104 15:21:35 MAIN INFO: Backup operation is running on a secondary node member. 181104 15:21:35 MAIN INFO: Trying to connect to one236.fitz.lan:3306. Backup will update the History table in this host. This entry will be replicated. 181104 15:21:35 MAIN INFO: Full Image Backup operation completed successfully. 181104 15:21:35 MAIN INFO: Backup image created successfully. 181104 15:21:35 MAIN INFO: Image Path = /backups/my.mbi_04Nov_1521 181104 15:21:35 MAIN INFO: MySQL binlog position: filename one223_3306.000002, position 19183 181104 15:21:35 MAIN INFO: GTID_EXECUTED is 46566e90-dfe4-11e8-8662-08002708448b:1-12,6031ccad-dfe4-11e8-a7ae-08002708448b:1-14 ------------------------------------------------------------- Parameters Summary ------------------------------------------------------------- Start LSN : 19410432 End LSN : 19410602 ------------------------------------------------------------- mysqlbackup completed OK! with 1 warnings one223:# ~ #
one223:# /backups # ls -lrt total 59588 drwxr-x--- 4 root root 97 Nov 4 15:21 backup_04Nov_1521 -rw-r--r-- 1 root root 61016659 Nov 4 15:21 my.mbi_04Nov_1521 one223:# /backups #
A new feature in 8.0 is that MEB will write the history record of the backup to the PRIMARY node, even if taken from a SECONDARY. Consulting the mysql.backup_history table ( from a SECONDARY node ) for the most recent backup, shows;
mysql> select * from mysql.backup_history order by backup_id desc limit 1\G *************************** 1. row *************************** backup_id: 15413052924422160 tool_name: mysqlbackup --defaults-file=/etc/my.cnf --backup-image=/backups/my.mbi_04Nov_1521 --backup-dir=/backups/backup_04Nov_1521 --user=root -pxxxxxxxx --host=127.0.0.1 backup-to-image start_time: 2018-11-04 15:21:32 end_time: 2018-11-04 15:21:34 binlog_pos: 19183 binlog_file: one223_3306.000002 compression_level: 0 engines: CSV:InnoDB:PERFORMANCE_SCHEMA innodb_data_file_path: ibdata1:12M:autoextend start_lsn: 19410432 end_lsn: 19410602 backup_type: FULL backup_format: IMAGE mysql_data_dir: /var/lib/mysql/ innodb_data_home_dir: innodb_log_group_home_dir: /var/lib/mysql/ innodb_log_files_in_group: 2 innodb_log_file_size: 268435456 backup_destination: /backups/backup_04Nov_1521 lock_time: 1.139 exit_state: SUCCESS last_error: NO_ERROR last_error_code: 0 start_time_utc: 1541305292197515 end_time_utc: 1541305294658281 consistency_time_utc: 1541305294650919 meb_version: 8.0.13 server_uuid: 2e9b514c-dfe4-11e8-8bb0-0800278e6619 1 row in set (0.00 sec)
As mentioned, the primary node crashes and encounters irreconcilable corruption. After a period of time the node is rebuilt but all the data on the server was lost.
2. Copy backup image to remote host
one223:# /backups # scp my.mbi_04Nov_1521 one236:~/ root@one236's password: my.mbi_04Nov_1521 100% 58MB 37.2MB/s 00:01 one223:# /backups #
3. Restore
The backup should then be restored as per documentation;
one236:# ~ # mysqlbackup --defaults-file=/etc/my.cnf --backup-image=/root/my.mbi_04Nov_1521 --backup-dir=/tmp/restore_$(date '+%d%b_%H%M') copy-back-and-apply-log MySQL Enterprise Backup version 8.0.13 Linux-4.1.12-112.14.13.el7uek.x86_64-x86_64 [2018-10-07 09:31:51] Copyright (c) 2003, 2018, Oracle and/or its affiliates. All Rights Reserved. [Build ID : 13776648.3672fb2ad9f6c91cd2527866d46fa583b70bfefd] 181104 15:27:24 MAIN INFO: A thread created with Id '139810938710144' 181104 15:27:24 MAIN INFO: Starting with following command line ... mysqlbackup --defaults-file=/etc/my.cnf --backup-image=/root/my.mbi_04Nov_1521 --backup-dir=/tmp/restore_04Nov_1527 copy-back-and-apply-log ... 181104 15:27:29 PCR1 INFO: We were able to parse ibbackup_logfile up to lsn 19410602. 181104 15:27:29 PCR1 INFO: Last MySQL binlog file position 0 19183, file name one223_3306.000002 181104 15:27:29 PCR1 INFO: The first data file is '/var/lib/mysql/ibdata1' and the new created log files are at '/var/lib/mysql/' 181104 15:27:29 MAIN INFO: No Keyring file to process. 181104 15:27:29 MAIN INFO: Apply-log operation completed successfully. 181104 15:27:29 MAIN INFO: Full Backup has been restored successfully. mysqlbackup completed OK! with 3 warnings one236:# ~ #
MySQL Shell / InnoDB Cluster
If running GROUP replication via MySQL SHELL, it is advisable to conduct the node recovery using MySQL Shell. This is due in part to MySQL Shell creating its own user to control the replication within the node group and to avoid any possible manual interactions / permissions issues.
MySQL one236:33060+ ssl JS > cluster.status(); { "clusterName": "devCluster", "defaultReplicaSet": { "name": "default", "primary": "one236:3306", "ssl": "REQUIRED", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "one223:3306": { "address": "one223:3306", "mode": "R/O", "readReplicas": {}, "role": "HA", "status": "ONLINE" }, "one236:3306": { "address": "one236:3306", "mode": "R/W", "readReplicas": {}, "role": "HA", "status": "ONLINE" }, "one238:3306": { "address": "one238:3306", "mode": "R/O", "readReplicas": {}, "role": "HA", "status": "ONLINE" } } }, "groupInformationSourceMember": "mysql://root@one236:3306" } MySQL one236:33060+ ssl JS >
The recovery process still requires that the backup be restored to the node manually, via mysqlbackup copy-back-and-apply-logs and the gtid purged command to be executed.
https://dev.mysql.com/doc/dev/mysqlsh-api-javascript/8.0/classmysqlsh_1_1dba_1_1_cluster.html
cluster.status()
cluster.status() output showing a failed node.
var i1='root:D0lph1n5@one223'; var i2='root:D0lph1n5@one236'; var i3='root:D0lph1n5@one238'; shell.connect(i1); var c=dba.getCluster();
MySQL one223:33060+ ssl JS > var c=dba.getCluster(); MySQL one223:33060+ ssl JS > c.status(); { "clusterName": "devCluster", "defaultReplicaSet": { "name": "default", "primary": "one223:3306", "ssl": "REQUIRED", "status": "OK_NO_TOLERANCE", "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active", "topology": { "one223:3306": { "address": "one223:3306", "mode": "R/W", "readReplicas": {}, "role": "HA", "status": "ONLINE" }, "one236:3306": { "address": "one236:3306", "mode": "R/O", "readReplicas": {}, "role": "HA", "status": "(MISSING)" }, "one238:3306": { "address": "one238:3306", "mode": "R/O", "readReplicas": {}, "role": "HA", "status": "ONLINE" } } }, "groupInformationSourceMember": "mysql://root@one223:3306" }
cluster.rescan()
Issuing a rescan identifies the node which has failed.
MySQL one223:33060+ ssl JS > var c=dba.getCluster(); MySQL one223:33060+ ssl JS > c.rescan() Rescanning the cluster... Result of the rescanning operation: { "defaultReplicaSet": { "name": "default", "newlyDiscoveredInstances": [], "unavailableInstances": [ { "host": "one236:3306", "label": "one236:3306", "member_id": "46566e90-dfe4-11e8-8662-08002708448b" } ] } } The instance 'one236:3306' is no longer part of the HA setup. It is either offline or left the HA group. You can try to add it to the cluster again with the cluster.rejoinInstance('one236:3306') command or you can remove it from the cluster configuration. Would you like to remove it from the cluster metadata? [Y/n]: Removing instance from the cluster metadata... The instance 'one236:3306' was successfully removed from the cluster metadata. MySQL one223:33060+ ssl JS > c.status(); { "clusterName": "devCluster", "defaultReplicaSet": { "name": "default", "primary": "one223:3306", "ssl": "REQUIRED", "status": "OK_NO_TOLERANCE", "statusText": "Cluster is NOT tolerant to any failures.", "topology": { "one223:3306": { "address": "one223:3306", "mode": "R/W", "readReplicas": {}, "role": "HA", "status": "ONLINE" }, "one238:3306": { "address": "one238:3306", "mode": "R/O", "readReplicas": {}, "role": "HA", "status": "ONLINE" } } }, "groupInformationSourceMember": "mysql://root@one223:3306" }
It is important to note that the same steps to restore the backup are performed here as per a manual GR configuration.
- confirm mysql is not running
- remove existing Data Directory contents if not empty
- Perform restore operation
- Modify the node to skip replication starting on boot, via config file
- Purge gtids
- stop node and remove recent config entries
- start node
cluster.addInstance()
It is now possible to add the Instance back ( once the actual mysqld instance has been restarted )
MySQL one223:33060+ ssl JS > c.addInstance(i2); A new instance will be added to the InnoDB cluster. Depending on the amount of data on the cluster this might take from a few seconds to several hours. Adding instance to the cluster ... Validating instance at one236:3306... This instance reports its own address as one236.fitz.lan Instance configuration is suitable. The instance 'root@one236' was successfully added to the cluster. MySQL one223:33060+ ssl JS > c.status(); { "clusterName": "devCluster", "defaultReplicaSet": { "name": "default", "primary": "one223:3306", "ssl": "REQUIRED", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "one223:3306": { "address": "one223:3306", "mode": "R/W", "readReplicas": {}, "role": "HA", "status": "ONLINE" }, "one236:3306": { "address": "one236:3306", "mode": "R/O", "readReplicas": {}, "role": "HA", "status": "ONLINE" }, "one238:3306": { "address": "one238:3306", "mode": "R/O", "readReplicas": {}, "role": "HA", "status": "ONLINE" } } }, "groupInformationSourceMember": "mysql://root@one223:3306" } MySQL one223:33060+ ssl JS >
Standlone Group Replicaiton
Multi Primary : Set Read Only
In multi primary mode, extra precautions need to be taken to prevent writes to the DB during the RESTORE ( meb ) and RECOVERY ( GR phase ).
Typically ( depending on how the Group is accessed from the clients) it is possible for clients to execute DML on the instance the moment it is accessible on the network, even prior to applying the BINLOGS to rejoin the GROUP.
Super_read_only
The current recommendation is to configure the node as super_read_only in the global config file for the instance, prior to starting from the restored datadir.
As an additional check, the event_scheduler can also be turned off while the replication is caught up.
[mysqld] super_read_only=1 event_scheduler=off
Adequate error handling should exist in the clients to recognize that they are ( albeit temporarily ) prevented from performing DML during this period.
4. Check File system permissions
Verify the files / folders which have just been restored are the same as the user which mysql runs as, eg mysql
chown -R mysql:mysql /var/lib/mysql
5. Reset Master
Start the newly restored node and issue a reset master. This is precautionary and ensures no errors relating to the local node being ahead of the group ( which will prevent the node from joining )
one231/var/lib/mysql # mysql mysql> reset master; Query OK, 0 rows affected (0.01 sec)
6. Reset the Slave
Reset the relay log info also, as the restored backup will have the relay log files associated with the original one229 node. The node can then be rejoined back into the Group;
mysql> reset slave; Query OK, 0 rows affected (0.00 sec)
7. Execute GTID Purge
A typical MEB restore directory will look like;
one236:# /tmp/restore_04Nov_1527/meta # ll total 148 -rw-r--r-- 1 root root 96644 Nov 4 15:27 backup_content.xml -rw-r--r-- 1 root root 6398 Nov 4 15:27 backup_create.xml -rw-r--r-- 1 root root 303 Nov 4 15:27 backup_gtid_executed.sql -rw-r--r-- 1 root root 932 Nov 4 15:27 backup_variables.txt -rw-r--r-- 1 root root 19217 Nov 4 15:27 image_files.xml -rw-r--r-- 1 root root 12912 Nov 4 15:27 MEB_2018-11-04.15-27-24_copy_back_img_to_datadir.log one236:# /tmp/restore_04Nov_1527/meta #
The restored database will need to configure its GTID value so that it can replay the binary logs from the correct location as per the manual;
https://dev.mysql.com/doc/mysql-enterprise-backup/4.1/en/advanced.slave.html
mysql>set SQL_LOG_BIN=0; mysql>source /tmp/restore_04Nov_1527/meta/backup_gtid_executed.sql; mysql>set SQL_LOG_BIN=1;
8. Start Group Replication ( or via MySQLShell addInstance() )
mysql> CHANGE MASTER TO MASTER_USER='root', MASTER_PASSWORD='D0lph1n5' FOR CHANNEL 'group_replication_recovery'; Query OK, 0 rows affected, 2 warnings (0.02 sec) mysql> START GROUP_REPLICATION; Query OK, 0 rows affected (2.11 sec)
9. Multi Primary : Enable Read/Write
Once the Node is back online within the Group, it can be re-enabled as an active DML node;
mysql>set SQL_LOG_BIN=0; mysql>set global super_read_only=0; mysql>set global event_scheduler=1 mysql>set SQL_LOG_BIN=1;
Conclusion
In Summary, irrespective of using MySQL Shell or a manually configured GR, the basics steps are the same , it is just the method of implementation which differs :
- Restore recent backup into an empty datadir ( ensure mysqld isn’t running prior )
- Modify the config file to skip replication / set super_read_only , etc on Instance start
- reset Master;reset slave;Purge gtids executed
- Restart GR or use MySQL Shell to repopulate GROUP
- Revert any temporary config changes made