The Problem
A replication slave cannot continue replicating from its Master. One such example is that after being stopped for a long ( or longer than anticipated ) timeframe, the BINARY LOGS required on the Master have been purged.
[ERROR] [MY-013114] [Repl] Slave I/O for channel 'group_replication_recovery': Got fatal error 1236 from master when reading data from binary log: 'Cannot replicate because the master purged required binary logs. Replicate the missing transactions from elsewhere, or provision a new slave from backup. Consider increasing the master's binary log expiration period. To find the missing transactions, see the master's error log or the manual for GTID_SUBTRACT.', Error_code: MY-013114
The Solution
There are many possible causes of Slave being unable to replicate. For example:
- Slave cannot access required Files on Master, likely due to expire_logs_days or similar.
- Slave has experienced corruption.
- DML has been executed on the Slave.
A regular backup schedule is a must for an enterprise environment. Below example assumes a nightly full as per below schedule. It can, however, be a weekly full + incremental, however, restore times will increase as expected.
Sample Backup schedule
server:# /6/BACKUPS/MySQLBACKUPS # ls -lR 2019-03-25_04-00-01 2019-03-25_04-00-01: total 1485296 -rw-r--r-- 1 backups backups 1520905049 Mar 25 04:01 backup.img -rw-r--r-- 1 backups backups 356 Mar 25 04:00 backup-my.cnf drwxr-x--- 2 backups backups 10 Mar 25 04:01 datadir drwxr-x--- 2 backups backups 4096 Mar 25 04:01 meta -rw-r--r-- 1 backups backups 16611 Mar 25 04:01 server-all.cnf -rw-r--r-- 1 backups backups 5260 Mar 25 04:01 server-my.cnf 2019-03-25_04-00-01/datadir: total 0 2019-03-25_04-00-01/meta: total 316 -rw-r--r-- 1 backups backups 219455 Mar 25 04:01 backup_content.xml -rw-r--r-- 1 backups backups 6366 Mar 25 04:00 backup_create.xml -rw-r--r-- 1 backups backups 811 Mar 25 04:01 backup_variables.txt -rw-r--r-- 1 backups backups 59996 Mar 25 04:01 image_files.xml -rw-r--r-- 1 backups backups 26034 Mar 25 04:01 MEB_2019-03-25.04-00-01_compress_img_backup.log
server:# /6/BACKUPS/MySQLBACKUPS # id backups uid=10007(backups) gid=1005(backups) groups=1005(backups),27(mysql) server:# /6/BACKUPS/MySQLBACKUPS # crontab -l -u backups 0 4 * * 1 /home/backups/backup_mysql.sh server:# /6/BACKUPS/MySQLBACKUPS # cat /home/backups/backup_mysql.sh #!/bin/bash mysqlbackup --defaults-file=~/.my.cnf --backup-dir=/6/BACKUPS/MySQLBACKUPS/ backup-to-image --with-timestamp --compress-level=9 --backup-image=backup.img server:# /6/BACKUPS/MySQLBACKUPS #
This shows a dedicated backup user on the host which is in the mysql group. The files and directories under and including the datadir ( plus any other database directories ) all have appropriate group permissions, allowing for READ and WRITE, eg 770.
Copy to Slave
server:# /6/BACKUPS/MySQLBACKUPS # scp 2019-03-25_04-00-01/backup.img slave.example.org:/home/restore/ root@slave.example.org's password: backup.img 100% 1450MB 85.3MB/s 00:17 server:# /6/BACKUPS/MySQLBACKUPS #
Copy the backup to the SLAVE however, your environment/OS allows.
Prepare Restore
Either move the existing Slave DATADIR to a secure location, or simply delete the existing DATADIR contents. Ensure the database is stopped prior.
slave:# /home # ps -ef|grep mysqld root 26336 26077 0 11:34 pts/0 00:00:00 grep --color=auto mysqld slave:# /home # systemctl stop mysqld slave:# /home # systemctl status mysqld ● mysqld.service - MySQL Server Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled) Active: inactive (dead) since Fri 2019-03-29 11:31:38 AEDT; 2min 42s ago Docs: man:mysqld(8) http://dev.mysql.com/doc/refman/en/using-systemd.html Process: 26199 ExecStart=/usr/sbin/mysqld $MYSQLD_OPTS (code=exited, status=0/SUCCESS) Process: 26176 ExecStartPre=/usr/bin/mysqld_pre_systemd (code=exited, status=0/SUCCESS) Main PID: 26199 (code=exited, status=0/SUCCESS) Status: "SERVER_SHUTTING_DOWN" Mar 29 11:31:32 slave.example.org systemd[1]: Starting MySQL Server... Mar 29 11:31:33 slave.example.org systemd[1]: Started MySQL Server. Mar 29 11:31:35 slave.example.org systemd[1]: Stopping MySQL Server... Mar 29 11:31:38 slave.example.org systemd[1]: Stopped MySQL Server. slave:# /home # slave:# /home # rm -rf /var/lib/mysql/*
Restore Backup
slave:# /home/restore # /usr/bin/mysqlbackup --backup-dir=/home/restore/ --with-timestamp --uncompress --backup-image=/home/restore/backup.img copy-back-and-apply-log ... 190329 11:39:43 PCR1 INFO: We were able to parse ibbackup_logfile up to lsn 30670310615. 190329 11:39:43 PCR1 INFO: Last MySQL binlog file position 0 381892696, file name binlog.000001 190329 11:39:43 PCR1 INFO: The first data file is '/var/lib/mysql/ibdata1' and the new created log files are at '/var/lib/mysql' 190329 11:39:43 MAIN INFO: No Keyring file to process. 190329 11:39:43 MAIN INFO: Apply-log operation completed successfully. 190329 11:39:43 MAIN INFO: Full Backup has been restored successfully. mysqlbackup completed OK! with 4 warnings slave:# /home/restore # chown mysql -R /var/lib/mysql
Start the new slave server.
Restart Replication
Verify Slave prerequisites are still met;
https://dev.mysql.com/doc/refman/8.0/en/replication-setup-slaves.html
Legacy / NO GTID
Slave will rely on the binary log file and its offset to restart Replication from the correct location.
slave:# /home/restore # grep "Last MySQL binlog" 2019-03-29_11-39-04/meta/MEB_2019-03-29.11-39-04_copy_back_cmprs_img_to_datadir.log 190329 11:39:43 PCR1 INFO: Last MySQL binlog file position 0 381892696, file name binlog.000001 slave:# /home/restore # mysql mysql> change master to master_host='server',master_log_file='binlog.000001', master_log_pos=381892696; Query OK, 0 rows affected, 1 warning (0.00 sec)
Use the values found from the MEB restore log, to configure replication.
GTID
Execute the following statement to reset the binary log:
mysql> RESET MASTER;
And execute the following statement to stop the binary logging:
mysql> SET sql_log_bin=0;
When a server using the GTID feature is backed up, mysqlbackup produces a file named backup_gtid_executed.sql, which can be found in the restored data directory of the new slave server. The file contains a SQL statement that sets the GTID_PURGED configuration option on the slave:
# On a new slave, issue the following command if GTIDs are enabled: SET @@GLOBAL.GTID_PURGED='f65db8e2-0e1a-11e5-a980-080027755380:1-3';
It also contains a commented-out CHANGE MASTER TO statement for initializing the slave:
# Use the following command if you want to use the GTID handshake protocol: # CHANGE MASTER TO MASTER_AUTO_POSITION = 1;
Uncomment the command and add any needed connection and authentication parameters to it (for example, MASTER_HOST, MASTER_USER, MASTER_PASSWORD, and MASTER_PORT):
# Use the following command if you want to use the GTID handshake protocol: CHANGE MASTER TO MASTER_HOST='127.0.0.1', MASTER_USER='muser', MASTER_PASSWORD='mpass', MASTER_PORT=18675, MASTER_AUTO_POSITION = 1;
Execute the file with the mysql client:
mysql> source /path-to-backup_gtid_executed.sql/backup_gtid_executed.sql
Start Slave
mysql> start slave user='repl' password='repl'; Query OK, 0 rows affected, 1 warning (0.00 sec)
Verify Slave
mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: server Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: binlog.000001 Read_Master_Log_Pos: 510350800 Relay_Log_File: slave-relay-bin.000003 Relay_Log_Pos: 16526902 Relay_Master_Log_File: binlog.000001 Slave_IO_Running: Yes Slave_SQL_Running: Yes