环境说明:
DB:MySQL 双主
OS:Redhat 7.9
172.16.6.137 cjc-db-01
172.16.6.138 cjc-db-02
相关参数
[mysql@cjc-db-02 scripts]$ cat /etc/my.cnf|grep sync
sync_binlog=10
[mysql@cjc-db-02 scripts]$ cat /etc/my.cnf|grep shutdown
innodb_fast_shutdown=1
参数说明
1.sync_binlog
sync_binlog=0,当事务提交之后,MySQL不做fsync之类的磁盘同步指令刷新binlog_cache中的信息到磁盘,而让Filesystem自行决定什么时候来做同步,或者cache满了之后才同步到磁盘。
sync_binlog=n,当每进行n次事务提交之后,MySQL将进行一次fsync之类的磁盘同步指令来将binlog_cache中的数据强制写入磁盘。
在MySQL中系统默认的设置是sync_binlog=0,也就是不做任何强制性的磁盘刷新指令,这时候的性能是最好的,但是风险也是最大的。因为一旦系统Crash,在binlog_cache中的所有binlog信息都会被丢失。而当设置为“1”的时候,是最安全但是性能损耗最大的设置。因为当设置为1的时候,即使系统Crash,也最多丢失binlog_cache中未完成的一个事务,对实际数据没有任何实质性影响。
1236相关的结果错误信息
---1 Got fatal error 1236 from master when reading data from binary log: 'Client requested master to start replication from position > file size' ---2 Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file' ---3 Got fatal error 1236 from master when reading data from binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been'
查看版本
MySQL [(none)]> select version(); +------------+ | version() | +------------+ | 5.7.36-log | +------------+ 1 row in set (0.13 sec)
查看主从状态
172.16.6.137 cjc-db-01
MySQL [(none)]> MySQL [(none)]> show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 172.16.6.138 Master_User: repl Master_Port: 13309 Connect_Retry: 10 Master_Log_File: mysql-bin.000003 Read_Master_Log_Pos: 234 Relay_Log_File: mysql-relay-bin.000005 Relay_Log_Pos: 407 Relay_Master_Log_File: mysql-bin.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 234 Relay_Log_Space: 820 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 6138 Master_UUID: 17733276-6720-11ed-862d-0800275cac40 Master_Info_File: /mysqldata/13309/data/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 17733276-6720-11ed-862d-0800275cac40:1-3 Executed_Gtid_Set: 17733276-6720-11ed-862d-0800275cac40:1-3, 4a8a6572-6650-11ed-b79c-0800272944a2:1-9 Auto_Position: 0 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 sec) ERROR: No query specified
查看gtid相关信息
MySQL [(none)]> show global variables like '%gtid%'; +----------------------------------+------------------------------------------------------------------------------------+ | Variable_name | Value | +----------------------------------+------------------------------------------------------------------------------------+ | binlog_gtid_simple_recovery | ON | | enforce_gtid_consistency | ON | | gtid_executed | 17733276-6720-11ed-862d-0800275cac40:1-3, 4a8a6572-6650-11ed-b79c-0800272944a2:1-9 | | gtid_executed_compression_period | 1000 | | gtid_mode | ON | | gtid_owned | | | gtid_purged | | | session_track_gtids | OFF | +----------------------------------+------------------------------------------------------------------------------------+ 8 rows in set (0.02 sec)
查看主从状态
172.16.6.138 cjc-db-02
MySQL [(none)]> show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 172.16.6.137 Master_User: repl Master_Port: 13309 Connect_Retry: 10 Master_Log_File: mysql-bin.000003 Read_Master_Log_Pos: 234 Relay_Log_File: mysql-relay-bin.000005 Relay_Log_Pos: 407 Relay_Master_Log_File: mysql-bin.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 234 Relay_Log_Space: 820 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 6137 Master_UUID: 4a8a6572-6650-11ed-b79c-0800272944a2 Master_Info_File: /mysqldata/13309/data/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 4a8a6572-6650-11ed-b79c-0800272944a2:8-9 Executed_Gtid_Set: 17733276-6720-11ed-862d-0800275cac40:1-3, 4a8a6572-6650-11ed-b79c-0800272944a2:8-9 Auto_Position: 0 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 sec) ERROR: No query specified
查看GTID相关信息
MySQL [(none)]> show global variables like '%gtid%'; +----------------------------------+------------------------------------------------------------------------------------+ | Variable_name | Value | +----------------------------------+------------------------------------------------------------------------------------+ | binlog_gtid_simple_recovery | ON | | enforce_gtid_consistency | ON | | gtid_executed | 17733276-6720-11ed-862d-0800275cac40:1-3, 4a8a6572-6650-11ed-b79c-0800272944a2:8-9 | | gtid_executed_compression_period | 1000 | | gtid_mode | ON | | gtid_owned | | | gtid_purged | | | session_track_gtids | OFF | +----------------------------------+------------------------------------------------------------------------------------+ 8 rows in set (0.01 sec)
场景一:
1.模拟主库执行大量小事务时,意外中断主库后。
2.新主库继续执行大量小事务。
3.启动原主库,两边主从关系正常,没有模拟出1236错误。
实验如下:
137 执行插入脚本
MySQL [(none)]> select count(*) from cjc.t1; +----------+ | count(*) | +----------+ | 11953 | +----------+ 1 row in set (0.02 sec)
插入脚本
[mysql@cjc-db-02 scripts]$ more 01_insert.sql use cjc; insert into t1 values(5,'a',now()); insert into t1 values(5,'a',now()); ...... [mysql@cjc-db-02 scripts]$ cat 01_insert.sql |wc -l 1266483
插入
[mysql@cjc-db-01 scripts]$ mysql -uroot -p -vvv < 01_insert.sql > 01_insert.log
意外中断数据库,模拟故障
[mysql@cjc-db-01 ~]$ ps -ef|grep mysql|grep -v grep mysql 3750 3679 0 10:16 pts/0 00:00:04 mysqld --defaults-file=/etc/my.cnf --user=mysql [mysql@cjc-db-01 ~]$ kill -9 3750
137先不启动
查看138数据量
MySQL [(none)]> select count(*) from cjc.t1; +----------+ | count(*) | +----------+ | 29953 | +----------+ 1 row in set (0.04 sec)
继续向138插入新数据,模拟VIP切换
[mysql@cjc-db-02 scripts]$ mysql -uroot -p -vvv < 01_insert.sql > 01_insert.log
再次查看数据量
MySQL [(none)]> select count(*) from cjc.t1; +----------+ | count(*) | +----------+ | 65635 | +----------+ 1 row in set (0.07 sec)
启动137数据库
[mysql@cjc-db-01 scripts]$ mysqld --defaults-file=/etc/my.cnf --user=mysql&
检查137数据量
MySQL [(none)]> select count(*) from cjc.t1; +----------+ | count(*) | +----------+ | 65635 | +----------+ 1 row in set (0.13 sec)
检查137从库状态
MySQL [(none)]> show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 172.16.6.138 Master_User: repl Master_Port: 13309 Connect_Retry: 10 Master_Log_File: mysql-bin.000003 Read_Master_Log_Pos: 17650028 Relay_Log_File: mysql-relay-bin.000007 Relay_Log_Pos: 11923471 Relay_Master_Log_File: mysql-bin.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 17650028 Relay_Log_Space: 11923718 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 6138 Master_UUID: 17733276-6720-11ed-862d-0800275cac40 Master_Info_File: /mysqldata/13309/data/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 17733276-6720-11ed-862d-0800275cac40:1-35685 Executed_Gtid_Set: 17733276-6720-11ed-862d-0800275cac40:1-35685, 4a8a6572-6650-11ed-b79c-0800272944a2:1-18009 Auto_Position: 0 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 sec) ERROR: No query specified
检查138主从状态
MySQL [(none)]> show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 172.16.6.137 Master_User: repl Master_Port: 13309 Connect_Retry: 10 Master_Log_File: mysql-bin.000004 Read_Master_Log_Pos: 11352433 Relay_Log_File: mysql-relay-bin.000007 Relay_Log_Pos: 407 Relay_Master_Log_File: mysql-bin.000004 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 11352433 Relay_Log_Space: 820 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 6137 Master_UUID: 4a8a6572-6650-11ed-b79c-0800272944a2 Master_Info_File: /mysqldata/13309/data/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 4a8a6572-6650-11ed-b79c-0800272944a2:8-18009 Executed_Gtid_Set: 17733276-6720-11ed-862d-0800275cac40:1-35685, 4a8a6572-6650-11ed-b79c-0800272944a2:8-18009 Auto_Position: 0 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 sec) ERROR: No query specified
场景二:大事务
1.模拟主库执行一个大事务时,还没执行完成,意外中断主库后。
2.新主库继续执行事务。
3.启动原主库,两边主从关系正常,没有模拟出1236错误。
实验如下:
137:
MySQL [(none)]> use cjc MySQL [cjc]> create table t2 like t1; MySQL [cjc]> set global autocommit=0;
执行2秒后中断数据库
MySQL [cjc]> insert into t2 select * from t1; ERROR 2013 (HY000): Lost connection to MySQL server during query [mysql@cjc-db-01 scripts]$ ps -ef|grep mysql|grep -v grep mysql 5074 3679 6 10:36 pts/0 00:00:21 mysqld --defaults-file=/etc/my.cnf --user=mysql
模拟主库故障
[mysql@cjc-db-01 scripts]$ kill -9 5074
138库
MySQL [(none)]> select count(*) from cjc.t2; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec)
138插入新数据
MySQL [(none)]> insert into cjc.t2 values(1,'a',now()); MySQL [(none)]> insert into cjc.t2 values(1,'a',now()); MySQL [(none)]> select count(*) from cjc.t2; +----------+ | count(*) | +----------+ | 2 | +----------+ 1 row in set (0.01 sec)
启动137数据库
[mysql@cjc-db-01 scripts]$ mysqld --defaults-file=/etc/my.cnf --user=mysql&
检查数据,双向同步没问题
MySQL [(none)]> select count(*) from cjc.t2; +----------+ | count(*) | +----------+ | 2 | +----------+ 1 row in set (0.01 sec)
场景三:破坏当前binlog
1.模拟主库执行一个大事务或多个小事务,还没执行完成,意外中断主库后。
2.破坏原主库当前binlog文件。
3.新主库继续执行事务。
3.启动原主库,138到137主从中断,报错1236。
实验如下:
137 执行插入脚本
MySQL [(none)]> select count(*) from cjc.t1; +----------+ | count(*) | +----------+ | 71325 | +----------+ 1 row in set (0.13 sec)
插入脚本
[mysql@cjc-db-01 scripts]$ mysql -uroot -p -vvv < 01_insert.sql > 01_insert.log
中断数据库,模拟故障
[mysql@cjc-db-01 ~]$ ps -ef|grep mysql|grep -v grep mysql 3750 3679 0 10:16 pts/0 00:00:04 mysqld --defaults-file=/etc/my.cnf --user=mysql [mysql@cjc-db-01 ~]$ kill -9 3750
137先不启动
查看138数据量
MySQL [(none)]> select count(*) from cjc.t1; +----------+ | count(*) | +----------+ | 80933 | +----------+ 1 row in set (0.08 sec)
继续向138插入新数据,模拟VIP切换
[mysql@cjc-db-02 scripts]$ mysql -uroot -p -vvv < 01_insert.sql > 01_insert.log
再次查看数据量
MySQL [(none)]> select count(*) from cjc.t1; +----------+ | count(*) | +----------+ | 93054 | +----------+ 1 row in set (0.09 sec)
破坏137最新binlog
[mysql@cjc-db-01 binlog]$ mv mysql-bin.000005 mysql-bin.000005.bak [mysql@cjc-db-01 binlog]$ cp mysql-bin.000001 mysql-bin.000005
启动137数据库
[mysql@cjc-db-01 scripts]$ mysqld --defaults-file=/etc/my.cnf --user=mysql&
检查137数据量,比138少了1条数据
MySQL [(none)]> select count(*) from cjc.t1; +----------+ | count(*) | +----------+ | 93053 | +----------+ 1 row in set (0.21 sec)
检查138同步状态,报错1236
Got fatal error 1236 from master when reading data from binary log: 'Client requested master to start replication from position > file size'
MySQL [(none)]> show slave status\G; *************************** 1. row *************************** Slave_IO_State: Master_Host: 172.16.6.137 Master_User: repl Master_Port: 13309 Connect_Retry: 10 Master_Log_File: mysql-bin.000005 Read_Master_Log_Pos: 3211353 Relay_Log_File: mysql-relay-bin.000009 Relay_Log_Pos: 3210882 Relay_Master_Log_File: mysql-bin.000005 Slave_IO_Running: No Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 3211353 Relay_Log_Space: 3211295 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 1236 Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Client requested master to start replication from position > file size' Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 6137 Master_UUID: 4a8a6572-6650-11ed-b79c-0800272944a2 Master_Info_File: /mysqldata/13309/data/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: 221121 10:54:24 Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 4a8a6572-6650-11ed-b79c-0800272944a2:8-27620 Executed_Gtid_Set: 17733276-6720-11ed-862d-0800275cac40:1-53496, 4a8a6572-6650-11ed-b79c-0800272944a2:8-27620 Auto_Position: 0 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 sec) ERROR: No query specified
其中Read_Master_Log_Pos=3211353,而137当前Position: 3856433
MySQL [(none)]> show master status\G; *************************** 1. row *************************** File: mysql-bin.000006 Position: 3856433 Binlog_Do_DB: Binlog_Ignore_DB: Executed_Gtid_Set: 17733276-6720-11ed-862d-0800275cac40:1-41373:41376-53496, 4a8a6572-6650-11ed-b79c-0800272944a2:1-18012 1 row in set (0.00 sec) ERROR: No query specified
检查137同步状态正常
MySQL [(none)]> show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 172.16.6.138 Master_User: repl Master_Port: 13309 Connect_Retry: 10 Master_Log_File: mysql-bin.000003 Read_Master_Log_Pos: 26659074 Relay_Log_File: mysql-relay-bin.000011 Relay_Log_Pos: 4050575 Relay_Master_Log_File: mysql-bin.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 26659074 Relay_Log_Space: 4050822 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 6138 Master_UUID: 17733276-6720-11ed-862d-0800275cac40 Master_Info_File: /mysqldata/13309/data/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 17733276-6720-11ed-862d-0800275cac40:1-53496 Executed_Gtid_Set: 17733276-6720-11ed-862d-0800275cac40:1-41373:41376-53496, 4a8a6572-6650-11ed-b79c-0800272944a2:1-18012 Auto_Position: 0 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 sec) ERROR: No query specified
查看138 GTID
MySQL [(none)]> show global variables like '%gtid%'; +----------------------------------+--------------------------------------------------------------------------------------------+ | Variable_name | Value | +----------------------------------+--------------------------------------------------------------------------------------------+ | binlog_gtid_simple_recovery | ON | | enforce_gtid_consistency | ON | | gtid_executed | 17733276-6720-11ed-862d-0800275cac40:1-53496, 4a8a6572-6650-11ed-b79c-0800272944a2:8-27620 | | gtid_executed_compression_period | 1000 | | gtid_mode | ON | | gtid_owned | | | gtid_purged | | | session_track_gtids | OFF | +----------------------------------+--------------------------------------------------------------------------------------------+ 8 rows in set (0.00 sec)
查看137 GTID
MySQL [(none)]> show global variables like '%gtid%'; +----------------------------------+--------------------------------------------------------------------------------------------------------+ | Variable_name | Value | +----------------------------------+--------------------------------------------------------------------------------------------------------+ | binlog_gtid_simple_recovery | ON | | enforce_gtid_consistency | ON | | gtid_executed | 17733276-6720-11ed-862d-0800275cac40:1-41373:41376-53496, 4a8a6572-6650-11ed-b79c-0800272944a2:1-18012 | | gtid_executed_compression_period | 1000 | | gtid_mode | ON | | gtid_owned | | | gtid_purged | 17733276-6720-11ed-862d-0800275cac40:1-41373, 4a8a6572-6650-11ed-b79c-0800272944a2:1-18012 | | session_track_gtids | OFF | +----------------------------------+--------------------------------------------------------------------------------------------------------+ 8 rows in set (0.02 sec)
解决1238错误:
一.尝试跳过事务
由于是双主架构,无法在从库清空GTID_EXECUTED
138
stop slave; set global gtid_purged='17733276-6720-11ed-862d-0800275cac40:1-53496,4a8a6572-6650-11ed-b79c-0800272944a2:8-27620,17733276-6720-11ed-862d-0800275cac40:1-41373:41376-53496,4a8a6572-6650-11ed-b79c-0800272944a2:1-18012,17733276-6720-11ed-862d-0800275cac40:1-41373,4a8a6572-6650-11ed-b79c-0800272944a2:1-18012'; ERROR 1840 (HY000): @@GLOBAL.GTID_PURGED can only be set when @@GLOBAL.GTID_EXECUTED is empty. reset slave; set global gtid_purged='17733276-6720-11ed-862d-0800275cac40:1-53496,4a8a6572-6650-11ed-b79c-0800272944a2:8-27620,17733276-6720-11ed-862d-0800275cac40:1-41373:41376-53496,4a8a6572-6650-11ed-b79c-0800272944a2:1-18012,17733276-6720-11ed-862d-0800275cac40:1-41373,4a8a6572-6650-11ed-b79c-0800272944a2:1-18012'; ERROR 1840 (HY000): @@GLOBAL.GTID_PURGED can only be set when @@GLOBAL.GTID_EXECUTED is empty.
双主架构下还需要执行reset master,执行后137到138的同步也会失败。
二:重建主从库关系
137和138 停止主从进程,清理复制配置
stop slave; reset slave all; reset master;
删除137业务库cjc,以138数据为准,重新同步数据
MySQL [(none)]> drop database cjc; Query OK, 2 rows affected (0.10 sec) MySQL [(none)]> create database cjc; Query OK, 1 row affected (0.00 sec)
备份138 cjc数据库
mysqldump -uroot -p cjc --hex-blob --master-data=2 --single-transaction --routines --triggers --events --flush-logs > /home/mysql/bak/mysql_cjc_202221.sql [mysql@cjc-db-02 bak]$ ls -lrth mysql_cjc_202221.sql -rw-rw-r-- 1 mysql mysql 2.7M Nov 21 11:14 mysql_cjc_202221.sql
传到137数据库上
137还原数据
MySQL [(none)]> use cjc MySQL [(none)]> source mysql_cjc_202221.sql MySQL [cjc]> select count(*) from cjc.t1; +----------+ | count(*) | +----------+ | 93054 | +----------+ 1 row in set (0.05 sec)
138新增数据,模拟新业务
MySQL [(none)]> use cjc Database changed MySQL [cjc]> select count(*) from cjc.t1; +----------+ | count(*) | +----------+ | 93054 | +----------+ 1 row in set (0.07 sec) MySQL [cjc]> insert into t1 values(1,'a',now()); Query OK, 1 row affected (0.00 sec) MySQL [cjc]> insert into t1 values(1,'a',now()); Query OK, 1 row affected (0.00 sec) MySQL [cjc]> select count(*) from cjc.t1; +----------+ | count(*) | +----------+ | 93056 | +----------+ 1 row in set (0.08 sec)
137创建同步关系
[mysql@cjc-db-01 bak]$ grep "CHANGE MASTER TO MASTER_LOG_FILE" mysql_cjc_202221.sql -- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=154;
执行
CHANGE MASTER TO MASTER_HOST='172.16.6.138', MASTER_USER='repl', MASTER_PASSWORD='1', MASTER_PORT=13309, MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=154, MASTER_CONNECT_RETRY=10;
启动
MySQL [(none)]> start slave; Query OK, 0 rows affected (0.02 sec)
检查同步状态恢复正常
MySQL [(none)]> show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 172.16.6.138 Master_User: repl Master_Port: 13309 Connect_Retry: 10 Master_Log_File: mysql-bin.000002 Read_Master_Log_Pos: 822 Relay_Log_File: mysql-relay-bin.000002 Relay_Log_Pos: 988 Relay_Master_Log_File: mysql-bin.000002 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 822 Relay_Log_Space: 1195 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 6138 Master_UUID: 17733276-6720-11ed-862d-0800275cac40 Master_Info_File: /mysqldata/13309/data/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 17733276-6720-11ed-862d-0800275cac40:1-2 Executed_Gtid_Set: 17733276-6720-11ed-862d-0800275cac40:1-2, 4a8a6572-6650-11ed-b79c-0800272944a2:1-14 Auto_Position: 0 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 sec) ERROR: No query specified
检查数据,已经同步最新数据
MySQL [(none)]> select count(*) from cjc.t1; +----------+ | count(*) | +----------+ | 93056 | +----------+ 1 row in set (0.07 sec)
最后恢复138上的同步关系
检查137位置信息
MySQL [(none)]> show master status\G; *************************** 1. row *************************** File: mysql-bin.000001 Position: 3930806 Binlog_Do_DB: Binlog_Ignore_DB: Executed_Gtid_Set: 17733276-6720-11ed-862d-0800275cac40:1-2, 4a8a6572-6650-11ed-b79c-0800272944a2:1-14 1 row in set (0.00 sec) ERROR: No query specified
138新增数据
MySQL [cjc]> insert into t1 values(1,'a',now());
137位置信息也会发生变化
MySQL [(none)]> show master status\G; *************************** 1. row *************************** File: mysql-bin.000001 Position: 3931442 Binlog_Do_DB: Binlog_Ignore_DB: Executed_Gtid_Set: 17733276-6720-11ed-862d-0800275cac40:1-4, 4a8a6572-6650-11ed-b79c-0800272944a2:1-14 1 row in set (0.00 sec) ERROR: No query specified
先对137上创建全局锁
FLUSH TABLES WITH READ LOCK;
138继续新增数据
137位置信息没有变
*************************** 1. row *************************** File: mysql-bin.000001 Position: 3931442 Binlog_Do_DB: Binlog_Ignore_DB: Executed_Gtid_Set: 17733276-6720-11ed-862d-0800275cac40:1-4, 4a8a6572-6650-11ed-b79c-0800272944a2:1-14 1 row in set (0.00 sec) ERROR: No query specified
137执行
CHANGE MASTER TO MASTER_HOST='172.16.6.137', MASTER_USER='repl', MASTER_PASSWORD='1', MASTER_PORT=13309, MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=3931442, MASTER_CONNECT_RETRY=10;
启动
MySQL [cjc]> start slave; Query OK, 0 rows affected (0.01 sec)
主从恢复正常
MySQL [cjc]> show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 172.16.6.137 Master_User: repl Master_Port: 13309 Connect_Retry: 10 Master_Log_File: mysql-bin.000001 Read_Master_Log_Pos: 3931442 Relay_Log_File: mysql-relay-bin.000002 Relay_Log_Pos: 320 Relay_Master_Log_File: mysql-bin.000001 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 3931442 Relay_Log_Space: 527 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 6137 Master_UUID: 4a8a6572-6650-11ed-b79c-0800272944a2 Master_Info_File: /mysqldata/13309/data/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: 17733276-6720-11ed-862d-0800275cac40:1-6 Auto_Position: 0 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 sec) ERROR: No query specified
137上取消全局锁
UNLOCK TABLES;
检查两边主从状态恢复正常
MySQL [cjc]> show slave status\G;
结论:
1.MySQL双主架构,正在执行事务的主库异常中断后,如果当前binlog也损坏了,可能会出现1236错误。
2.如果只是主从架构,不是双主架构,通过跳过事务理论上可以恢复主从关系。
其他:
如果复制用户密码忘记怎么办?
1.可以重建复制用户,当然也有文件明文记录复制用户密码:
具体在数据目录下master.info文件里。
[mysql@cjc-db-01 data]$ cat master.info
2.如果业务库太多,也可以通过直接初始化数据库恢复主
(1)主从执行reset操作
stop slave; reset slave all; reset master;
(2)主库全库备份,并将备份文件上传到从库
mysqldump -uroot -p --all-database --hex-blob --master-data=2 --single-transaction --routines --triggers --events --flush-logs > /home/mysql/bak/mysql_cjc_20221118.sql
(3)停从库,删除数据文件
[mysql@cjc-db-02 bak]$ mysqladmin -uroot -p shutdown
删除数据
rm -rf binlog/* rm -rf data/* rm -rf log/* rm -rf pid/* rm -rf redo/* rm -rf relaylog/* rm -rf socket/* rm -rf undo/*
(4)从库数据库初始化
###mysqld --defaults-file=/etc/my.cnf --initialize-insecure
启动数据库
[mysql@cjc-db-02 13309]$ mysqld --defaults-file=/etc/my.cnf --user=mysql &
恢复数据库
grep "CHANGE MASTER TO MASTER_LOG_FILE"
(5)重建主从关系
例如:
CHANGE MASTER TO MASTER_HOST='172.16.6.138', MASTER_USER='repl', MASTER_PASSWORD='1', MASTER_PORT=13309, MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=154, MASTER_CONNECT_RETRY=10;
启动
start slave;
(6)待单向主从正常后,创建另一个主从关系
也需要先申请全局锁
CHANGE MASTER TO MASTER_HOST='172.16.6.137', MASTER_USER='repl', MASTER_PASSWORD='1', MASTER_PORT=13309, MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=505716, MASTER_CONNECT_RETRY=10;
启动
start slave;
