mysql PXC集群脑裂及grastate.dat修改实验

来源:这里教程网 时间:2026-03-01 11:39:59 作者:

三台服务器做了 mysql PXC 集群 172.31.217.182  bd-dev-mingshuo-182 172.31.217.183  bd-dev-mingshuo-183 172.31.217.89   bd-dev-vertica-89   正常关闭 183 一个节点 mysqladmin -uroot -poracle -S /u01/mysql/3307/data/mysql.sock -P3307 shutdown   关闭节点 log

2018-09-27T07:33:13.222079Z   0 [Note] WSREP: Received shutdown signal. Will sleep for 10 secs before   initiating shutdown. pxc_maint_mode switched to SHUTDOWN 2018-09-27T07:33:23.230509Z   0 [Note] WSREP: Stop replication 2018-09-27T07:33:23.230619Z   0 [Note] WSREP: Closing send monitor... 2018-09-27T07:33:23.230640Z   0 [Note] WSREP: Closed send monitor. 2018-09-27T07:33:23.230660Z   0 [Note] WSREP: gcomm: terminating thread 2018-09-27T07:33:23.230680Z   0 [Note] WSREP: gcomm: joining thread 2018-09-27T07:33:23.230827Z   0 [Note] WSREP: gcomm: closing backend 2018-09-27T07:33:23.231780Z   0 [Note] WSREP: Current view of cluster as seen by this node view   (view_id(NON_PRIM,12f1e199,11) memb {         12f1e199,0         } joined {         } left {         } partitioned {         2331d3d7,0         c05737fd,0         } ) 2018-09-27T07:33:23.231867Z   0 [Note] WSREP: Current view of cluster as seen by this node view ((empty)) 2018-09-27T07:33:23.232111Z   0 [Note] WSREP: gcomm: closed 2018-09-27T07:33:23.232165Z   0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0,   memb_num = 1 2018-09-27T07:33:23.232253Z   0 [Note] WSREP: Flow-control interval: [100, 100] 2018-09-27T07:33:23.232260Z   0 [Note] WSREP: Trying to continue unpaused monitor 2018-09-27T07:33:23.232264Z   0 [Note] WSREP: Received NON-PRIMARY. 2018-09-27T07:33:23.232268Z   0 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 27) 2018-09-27T07:33:23.232279Z   0 [Note] WSREP: Received self-leave message. 2018-09-27T07:33:23.232285Z   0 [Note] WSREP: Flow-control interval: [0, 0] 2018-09-27T07:33:23.232288Z   0 [Note] WSREP: Trying to continue unpaused monitor 2018-09-27T07:33:23.232291Z   0 [Note] WSREP: Received SELF-LEAVE. Closing connection. 2018-09-27T07:33:23.232295Z   0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 27) 2018-09-27T07:33:23.232302Z   0 [Note] WSREP: RECV thread exiting 0: Success 2018-09-27T07:33:23.232383Z   2 [Note] WSREP: New cluster view: global state:   c057dbc5-c16e-11e8-a1a6-825ed9079934:27, view# -1: non-Primary, number of   nodes: 1, my index: 0, protocol version 3 2018-09-27T07:33:23.232394Z   2 [Note] WSREP: Setting wsrep_ready to false 2018-09-27T07:33:23.232400Z   2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2018-09-27T07:33:23.232439Z   2 [Note] WSREP: New cluster view: global state:   c057dbc5-c16e-11e8-a1a6-825ed9079934:27, view# -1: non-Primary, number of   nodes: 0, my index: -1, protocol version 3 2018-09-27T07:33:23.232443Z   2 [Note] WSREP: Setting wsrep_ready to false 2018-09-27T07:33:23.232446Z   2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2018-09-27T07:33:23.232472Z   2 [Note] WSREP: applier thread exiting (code:0) 2018-09-27T07:33:23.232479Z   0 [Note] WSREP: recv_thread() joined. 2018-09-27T07:33:23.232502Z   0 [Note] WSREP: Closing replication queue. 2018-09-27T07:33:23.232509Z   0 [Note] WSREP: Closing slave action queue. 2018-09-27T07:33:23.232517Z   0 [Note] Giving 2 client threads a chance to die gracefully 2018-09-27T07:33:25.232639Z   0 [Note] WSREP: Waiting for active wsrep applier to exit 2018-09-27T07:33:25.232758Z   1 [Note] WSREP: rollbacker thread exiting 2018-09-27T07:33:25.232994Z   0 [Note] Giving 0 client threads a chance to die gracefully 2018-09-27T07:33:25.233010Z   0 [Note] Shutting down slave threads 2018-09-27T07:33:25.233025Z   0 [Note] Forcefully disconnecting 0 remaining clients 2018-09-27T07:33:25.233044Z   0 [Note] Event Scheduler: Purging the queue. 0 events 2018-09-27T07:33:25.242788Z   0 [Note] WSREP: Service thread queue flushed. 2018-09-27T07:33:25.250399Z   0 [Note] WSREP: MemPool(SlaveTrxHandle): hit ratio: 0, misses: 0, in use: 0,   in pool: 0 2018-09-27T07:33:25.250479Z   0 [Note] WSREP: Shifting CLOSED -> DESTROYED (TO: 27) 2018-09-27T07:33:25.259428Z   0 [Note] Binlog end 2018-09-27T07:33:25.261702Z   0 [Note] Shutting down plugin 'ngram' 2018-09-27T07:33:25.261721Z   0 [Note] Shutting down plugin 'partition' 2018-09-27T07:33:25.261726Z   0 [Note] Shutting down plugin 'ARCHIVE' 2018-09-27T07:33:25.261729Z   0 [Note] Shutting down plugin 'BLACKHOLE' 2018-09-27T07:33:25.261733Z   0 [Note] Shutting down plugin 'INNODB_SYS_VIRTUAL' 2018-09-27T07:33:25.261736Z   0 [Note] Shutting down plugin 'INNODB_CHANGED_PAGES' 2018-09-27T07:33:25.261739Z   0 [Note] Shutting down plugin 'INNODB_SYS_DATAFILES' 2018-09-27T07:33:25.261741Z   0 [Note] Shutting down plugin 'INNODB_SYS_TABLESPACES' 2018-09-27T07:33:25.261744Z   0 [Note] Shutting down plugin 'INNODB_SYS_FOREIGN_COLS' 2018-09-27T07:33:25.261746Z   0 [Note] Shutting down plugin 'INNODB_SYS_FOREIGN' 2018-09-27T07:33:25.261749Z   0 [Note] Shutting down plugin 'INNODB_SYS_FIELDS' 2018-09-27T07:33:25.261751Z   0 [Note] Shutting down plugin 'INNODB_SYS_COLUMNS' 2018-09-27T07:33:25.261754Z   0 [Note] Shutting down plugin 'INNODB_SYS_INDEXES' 2018-09-27T07:33:25.261756Z   0 [Note] Shutting down plugin 'INNODB_SYS_TABLESTATS' 2018-09-27T07:33:25.261759Z   0 [Note] Shutting down plugin 'INNODB_SYS_TABLES' 2018-09-27T07:33:25.261761Z   0 [Note] Shutting down plugin 'INNODB_FT_INDEX_TABLE' 2018-09-27T07:33:25.261764Z   0 [Note] Shutting down plugin 'INNODB_FT_INDEX_CACHE' 2018-09-27T07:33:25.261766Z   0 [Note] Shutting down plugin 'INNODB_FT_CONFIG' 2018-09-27T07:33:25.261769Z   0 [Note] Shutting down plugin 'INNODB_FT_BEING_DELETED' 2018-09-27T07:33:25.261771Z   0 [Note] Shutting down plugin 'INNODB_FT_DELETED' 2018-09-27T07:33:25.261774Z   0 [Note] Shutting down plugin 'INNODB_FT_DEFAULT_STOPWORD' 2018-09-27T07:33:25.261776Z   0 [Note] Shutting down plugin 'INNODB_METRICS' 2018-09-27T07:33:25.261778Z   0 [Note] Shutting down plugin 'INNODB_TEMP_TABLE_INFO' 2018-09-27T07:33:25.261781Z   0 [Note] Shutting down plugin 'INNODB_BUFFER_POOL_STATS' 2018-09-27T07:33:25.261783Z   0 [Note] Shutting down plugin 'INNODB_BUFFER_PAGE_LRU' 2018-09-27T07:33:25.261785Z   0 [Note] Shutting down plugin 'INNODB_BUFFER_PAGE' 2018-09-27T07:33:25.261788Z   0 [Note] Shutting down plugin 'INNODB_CMP_PER_INDEX_RESET' 2018-09-27T07:33:25.261790Z   0 [Note] Shutting down plugin 'INNODB_CMP_PER_INDEX' 2018-09-27T07:33:25.261793Z   0 [Note] Shutting down plugin 'INNODB_CMPMEM_RESET' 2018-09-27T07:33:25.261795Z   0 [Note] Shutting down plugin 'INNODB_CMPMEM' 2018-09-27T07:33:25.261797Z   0 [Note] Shutting down plugin 'INNODB_CMP_RESET' 2018-09-27T07:33:25.261800Z   0 [Note] Shutting down plugin 'INNODB_CMP' 2018-09-27T07:33:25.261802Z   0 [Note] Shutting down plugin 'INNODB_LOCK_WAITS' 2018-09-27T07:33:25.261805Z   0 [Note] Shutting down plugin 'INNODB_LOCKS' 2018-09-27T07:33:25.261807Z   0 [Note] Shutting down plugin 'INNODB_TRX' 2018-09-27T07:33:25.261809Z   0 [Note] Shutting down plugin 'XTRADB_ZIP_DICT_COLS' 2018-09-27T07:33:25.261812Z   0 [Note] Shutting down plugin 'XTRADB_ZIP_DICT' 2018-09-27T07:33:25.261814Z   0 [Note] Shutting down plugin 'XTRADB_RSEG' 2018-09-27T07:33:25.261817Z   0 [Note] Shutting down plugin 'XTRADB_INTERNAL_HASH_TABLES' 2018-09-27T07:33:25.261819Z   0 [Note] Shutting down plugin 'XTRADB_READ_VIEW' 2018-09-27T07:33:25.261822Z   0 [Note] Shutting down plugin 'InnoDB' 2018-09-27T07:33:25.261857Z   0 [Note] InnoDB: FTS optimize thread exiting. 2018-09-27T07:33:25.262097Z   0 [Note] InnoDB: Starting shutdown... 2018-09-27T07:33:25.362428Z   0 [Note] InnoDB: Dumping buffer pool(s) to   /u01/mysql/3307/data/ib_buffer_pool 2018-09-27T07:33:25.363022Z   0 [Note] InnoDB: Buffer pool(s) dump completed at 180927 15:33:25 2018-09-27T07:33:25.562786Z   0 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool 2018-09-27T07:33:26.571050Z   0 [Note] InnoDB: Shutdown completed; log sequence number 2569669 2018-09-27T07:33:26.574169Z   0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1" 2018-09-27T07:33:26.574193Z   0 [Note] Shutting down plugin 'MyISAM' 2018-09-27T07:33:26.574210Z   0 [Note] Shutting down plugin 'MRG_MYISAM' 2018-09-27T07:33:26.574222Z   0 [Note] Shutting down plugin 'CSV' 2018-09-27T07:33:26.574233Z   0 [Note] Shutting down plugin 'MEMORY' 2018-09-27T07:33:26.574254Z   0 [Note] Shutting down plugin 'PERFORMANCE_SCHEMA' 2018-09-27T07:33:26.574287Z   0 [Note] Shutting down plugin 'sha256_password' 2018-09-27T07:33:26.574296Z   0 [Note] Shutting down plugin 'mysql_native_password' 2018-09-27T07:33:26.574304Z   0 [Note] Shutting down plugin 'wsrep' 2018-09-27T07:33:26.574480Z   0 [Note] Shutting down plugin 'binlog'

  正常节点 log

2018-09-27T07:33:22.505216Z   0 [Note] WSREP: declaring c05737fd at tcp://172.31.217.89:4567 stable 2018-09-27T07:33:22.505345Z   0 [Note] WSREP: forgetting 12f1e199 (tcp://172.31.217.183:4567) 2018-09-27T07:33:22.511586Z   0 [Note] WSREP: Node 2331d3d7 state primary 2018-09-27T07:33:22.512245Z   0 [Note] WSREP: Current view of cluster as seen by this node view   (view_id(PRIM,2331d3d7,12) memb {         2331d3d7,0         c05737fd,0         } joined {         } left {         } partitioned {         12f1e199,0         } ) 2018-09-27T07:33:22.512303Z   0 [Note] WSREP: Save the discovered primary-component to disk 2018-09-27T07:33:22.512547Z   0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0,   memb_num = 2 2018-09-27T07:33:22.513157Z   0 [Note] WSREP: forgetting 12f1e199 (tcp://172.31.217.183:4567) 2018-09-27T07:33:22.513241Z   0 [Note] WSREP: STATE_EXCHANGE: sent state UUID:   9ccf0351-c227-11e8-ae6a-d3cac5b411a7 2018-09-27T07:33:22.514096Z   0 [Note] WSREP: STATE EXCHANGE: sent state msg:   9ccf0351-c227-11e8-ae6a-d3cac5b411a7 2018-09-27T07:33:22.514647Z   0 [Note] WSREP: STATE EXCHANGE: got state msg: 9ccf0351-c227-11e8-ae6a-d3cac5b411a7   from 0 (bd-dev-mingshuo-182) 2018-09-27T07:33:22.514661Z   0 [Note] WSREP: STATE EXCHANGE: got state msg:   9ccf0351-c227-11e8-ae6a-d3cac5b411a7 from 1 (bd-dev-vertica-89) 2018-09-27T07:33:22.514669Z   0 [Note] WSREP: Quorum results:         version    = 4,         component  = PRIMARY,         conf_id    = 11,         members    = 2/2 (primary/total),         act_id     = 27,         last_appl. = 0,         protocols  = 0/8/3 (gcs/repl/appl),         group UUID =   c057dbc5-c16e-11e8-a1a6-825ed9079934 2018-09-27T07:33:22.514675Z   0 [Note] WSREP: Flow-control interval: [141, 141] 2018-09-27T07:33:22.514679Z   0 [Note] WSREP: Trying to continue unpaused monitor 2018-09-27T07:33:22.514707Z   2 [Note] WSREP: New cluster view: global state:   c057dbc5-c16e-11e8-a1a6-825ed9079934:27, view# 12: Primary, number of nodes:   2, my index: 0, protocol version 3 2018-09-27T07:33:22.514713Z   2 [Note] WSREP: Setting wsrep_ready to true 2018-09-27T07:33:22.514719Z   2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2018-09-27T07:33:22.514727Z   2 [Note] WSREP: REPL Protocols: 8 (3, 2) 2018-09-27T07:33:22.514747Z   2 [Note] WSREP: Assign initial position for certification: 27, protocol   version: 3 2018-09-27T07:33:22.514830Z   0 [Note] WSREP: Service thread queue flushed. 2018-09-27T07:33:27.691129Z   0 [Note] WSREP:  cleaning up 12f1e199   (tcp://172.31.217.183:4567)

  182 节点插入数据

mysql> insert   into t1 values (4,4); Query OK, 1 row   affected (0.01 sec)   mysql> select   * from t1; +---+------+ | a | b    | +---+------+ | 1 |    1 | | 2 |    2 | | 3 |    3 | | 4 |    4 | +---+------+ 4 rows in set   (0.00 sec)

    启动 183 节点

mysql -S   /u01/mysql/3307/data/mysql.sock -uroot -poracle -P3307 mysql> select   * from t1; +---+------+ | a | b    | +---+------+ | 1 |    1 | | 2 |    2 | | 3 |    3 | | 4 |    4 | +---+------+ 4 rows in set   (0.00 sec)

增量数据已经同步过来了。   下面是日志增量应用部分,可以看到收到了一个事务。

2018-09-27T08:05:50.785769Z   0 [Note] WSREP: Signalling provider to continue on SST completion. 2018-09-27T08:05:50.785808Z   0 [Note] WSREP: Initialized wsrep sidno 2 2018-09-27T08:05:50.785833Z   0 [Note] WSREP: SST received: c057dbc5-c16e-11e8-a1a6-825ed9079934:27 2018-09-27T08:05:50.785872Z   2 [Note] WSREP: Receiving IST: 1 writesets, seqnos 27-28 2018-09-27T08:05:50.785985Z   0 [Note] 2018-09-27T08:05:50.785985Z   0 [Note] WSREP: Receiving IST...  0.0%   (0/1 events) complete. 2018-09-27T08:05:50.877679Z   0 [Note] WSREP: Receiving IST...100.0% (1/1 events) complete. 2018-09-27T08:05:50.877904Z   2 [Note] WSREP: IST received: c057dbc5-c16e-11e8-a1a6-825ed9079934:28 2018-09-27T08:05:50.878589Z   0 [Note] WSREP: 1.0 (bd-dev-mingshuo-183): State transfer from 0.0   (bd-dev-mingshuo-182) complete. 2018-09-27T08:05:50.878603Z   0 [Note] WSREP: SST leaving flow control 2018-09-27T08:05:50.878608Z   0 [Note] WSREP: Shifting JOINER -> JOINED (TO: 28) 2018-09-27T08:05:50.879059Z   0 [Note] WSREP: Member 1.0 (bd-dev-mingshuo-183) synced with group. 2018-09-27T08:05:50.879072Z   0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 28) 2018-09-27T08:05:50.879101Z   2 [Note] WSREP: Synchronized with group, ready for connections 2018-09-27T08:05:50.879115Z   2 [Note] WSREP: Setting wsrep_ready to true

  现在测试直接非正常关闭两个节点 182 183 两个节点进程直接 kill -9 杀掉 89 存活节点:

mysql> select   * from ming.t1; ERROR   1047 (08S01): WSREP has not yet prepared node for application use mysql> insert   into ming.t1 values(10,10); ERROR   1047 (08S01): WSREP has not yet prepared node for application use

存活节点已经无法正常提供读写服务。  

mysql> show   status where Variable_name IN   ('wsrep_local_state_uuid','wsrep_cluster_conf_id','wsrep_cluster_size',   'wsrep_cluster_status','wsrep_ready','wsrep_connected'); +------------------------+--------------------------------------+ |   Variable_name          | Value                                | +------------------------+--------------------------------------+ |   wsrep_local_state_uuid | c057dbc5-c16e-11e8-a1a6-825ed9079934 | |   wsrep_cluster_conf_id  |   18446744073709551615                 | |   wsrep_cluster_size     | 1                                    | |   wsrep_cluster_status   |   non-Primary                          | |   wsrep_connected        | ON                                   | |   wsrep_ready            | OFF                                  | +------------------------+--------------------------------------+ 6 rows in set   (0.00 sec)

  可以看到 wsrep_cluster_size=1 代表集群节点个数只剩自己了。 wsrep_cluster_status=non-Primary 代表集群状态不一致 wsrep_connected=ON 代表数据库还接受连接 wsrep_read=OFF 代表数据库已经不能正常接受查询服务了。上面的 select 语句也佐证了这一点。   存活节点能否提供读服务,取决于 wsrep_dirty_reads 参数

mysql> show   variables like 'wsrep_dirty_reads'; +-------------------+-------+ |   Variable_name     | Value | +-------------------+-------+ |   wsrep_dirty_reads | OFF   | +-------------------+-------+ 1 row in set   (0.00 sec)

  wsrep_dirty_reads 是可以动态调整的。如果设置为 ON ,那么在节点状态是 non-Primary 时, 是可以提供读的服务的。写的服务还需要提升该节点为 primary ,这个是要通过其他参数设定的, 后面会说的。     存活节点一直在尝试连接另外两个节点

2018-09-28T02:57:37.209095Z   0 [Note] WSREP: (725136c0, 'tcp://0.0.0.0:4567') reconnecting to 2331d3d7   (tcp://172.31.217.182:4567), attempt 960 2018-09-28T02:58:12.714612Z   0 [Note] WSREP: (725136c0, 'tcp://0.0.0.0:4567') reconnecting to 252da778   (tcp://172.31.217.183:4567), attempt 900 2018-09-28T02:58:22.216078Z   0 [Note] WSREP: (725136c0, 'tcp://0.0.0.0:4567') reconnecting to 2331d3d7   (tcp://172.31.217.182:4567), attempt 990 2018-09-28T02:58:57.721850Z   0 [Note] WSREP: (725136c0, 'tcp://0.0.0.0:4567') reconnecting to 252da778   (tcp://172.31.217.183:4567), attempt 930 2018-09-28T02:59:07.223430Z   0 [Note] WSREP: (725136c0, 'tcp://0.0.0.0:4567') reconnecting to 2331d3d7   (tcp://172.31.217.182:4567), attempt 1020

  不能提供读写的原因其实就是 PXC 对集群脑裂的判断机制还不完善,对我自己来说我是 kill 掉了两个节点的进程。 但是对 PXC 来说,存活节点不知道另外两个节点的状态,有可能另外两个节点已经死掉了,有可能另外两个节点相互之间还能继续通信对外提供服务, 这样一来就形成了两个信息孤岛,彼此之间不能联系对方,所以存活节点就变成了这样不能读写的状态。   拉起两个节点后存活节点日志

2018-09-28T03:04:57.003914Z   0 [Note] WSREP: (725136c0, 'tcp://0.0.0.0:4567') connection established to   252da778 tcp://172.31.217.183:4567 2018-09-28T03:05:03.215507Z   0 [Note] WSREP: declaring 252da778 at tcp://172.31.217.183:4567 stable 2018-09-28T03:05:03.216346Z   0 [Note] WSREP: Current view of cluster as seen by this node view   (view_id(NON_PRIM,252da778,30) memb {         252da778,0         725136c0,0         } joined {         } left {         } partitioned {         2331d3d7,0         } ) 2018-09-28T03:05:03.216630Z   0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 1,   memb_num = 2 2018-09-28T03:05:03.216710Z   0 [Note] WSREP: Flow-control interval: [141, 141] 2018-09-28T03:05:03.216718Z   0 [Note] WSREP: Trying to continue unpaused monitor 2018-09-28T03:05:03.216723Z   0 [Note] WSREP: Received NON-PRIMARY. 2018-09-28T03:05:03.216794Z   1 [Note] WSREP: New cluster view: global state:   c057dbc5-c16e-11e8-a1a6-825ed9079934:33, view# -1: non-Primary, number of   nodes: 2, my index: 1, protocol version 3 2018-09-28T03:05:03.216822Z   1 [Note] WSREP: Setting wsrep_ready to false 2018-09-28T03:05:03.216833Z   1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2018-09-28T03:05:04.277523Z   0 [Note] WSREP: (725136c0, 'tcp://0.0.0.0:4567') connection established to   2331d3d7 tcp://172.31.217.182:4567 2018-09-28T03:05:04.279018Z   0 [Note] WSREP: (725136c0, 'tcp://0.0.0.0:4567') connection established to   2331d3d7 tcp://172.31.217.182:4567 2018-09-28T03:05:04.776965Z   0 [Note] WSREP: declaring 2331d3d7 at tcp://172.31.217.182:4567 stable 2018-09-28T03:05:04.777019Z   0 [Note] WSREP: declaring 252da778 at tcp://172.31.217.183:4567 stable 2018-09-28T03:05:04.777487Z   0 [Note] WSREP: re-bootstrapping prim from partitioned components 2018-09-28T03:05:04.778262Z   0 [Note] WSREP: Current view of cluster as seen by this node view   (view_id(PRIM,2331d3d7,31) memb {         2331d3d7,0         252da778,0         725136c0,0         } joined {         } left {         } partitioned {         } ) 2018-09-28T03:05:04.778307Z   0 [Note] WSREP: Save the discovered primary-component to disk 2018-09-28T03:05:04.778588Z   0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 2,   memb_num = 3 2018-09-28T03:05:04.778629Z   0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID. 2018-09-28T03:05:05.277931Z   0 [Note] WSREP: STATE EXCHANGE: sent state msg:   838d4806-c2cb-11e8-8bb1-eeeae1741165 2018-09-28T03:05:05.278435Z   0 [Note] WSREP: STATE EXCHANGE: got state msg:   838d4806-c2cb-11e8-8bb1-eeeae1741165 from 0 (bd-dev-mingshuo-182) 2018-09-28T03:05:05.278463Z   0 [Note] WSREP: STATE EXCHANGE: got state msg:   838d4806-c2cb-11e8-8bb1-eeeae1741165 from 1 (bd-dev-mingshuo-183) 2018-09-28T03:05:05.278470Z   0 [Note] WSREP: STATE EXCHANGE: got state msg:   838d4806-c2cb-11e8-8bb1-eeeae1741165 from 2 (bd-dev-vertica-89) 2018-09-28T03:05:05.278490Z   0 [Warning] WSREP: Quorum: No node with complete state:           Version      : 4         Flags        : 0x1         Protocols    : 0 / 8 / 3         State        : NON-PRIMARY         Desync count : 0         Prim state   : NON-PRIMARY         Prim UUID    : 00000000-0000-0000-0000-000000000000         Prim    seqno  : -1         First seqno  : -1         Last    seqno  : 33         Prim JOINED  : 0         State UUID   : 838d4806-c2cb-11e8-8bb1-eeeae1741165         Group UUID   : c057dbc5-c16e-11e8-a1a6-825ed9079934         Name         : 'bd-dev-mingshuo-182'         Incoming addr: '172.31.217.182:3307'           Version      : 4         Flags        : 00         Protocols    : 0 / 8 / 3         State        : NON-PRIMARY         Desync count : 0         Prim state   : NON-PRIMARY         Prim UUID    : 00000000-0000-0000-0000-000000000000         Prim    seqno  : -1         First seqno  : -1         Last    seqno  : 33         Prim JOINED  : 0         State UUID   : 838d4806-c2cb-11e8-8bb1-eeeae1741165         Group UUID   : c057dbc5-c16e-11e8-a1a6-825ed9079934         Name         : 'bd-dev-mingshuo-183'         Incoming addr: '172.31.217.183:3307'           Version      : 4         Flags        : 0x2         Protocols    : 0 / 8 / 3         State        : NON-PRIMARY         Desync count : 0         Prim state   : SYNCED         Prim UUID    : 19faf204-c2c7-11e8-b642-52dd65ccae43         Prim    seqno  : 26         First seqno  : 33         Last    seqno  : 33         Prim JOINED  : 2         State UUID   : 838d4806-c2cb-11e8-8bb1-eeeae1741165         Group UUID   : c057dbc5-c16e-11e8-a1a6-825ed9079934         Name         : 'bd-dev-vertica-89'         Incoming addr: '172.31.217.89:3307'   2018-09-28T03:05:05.278511Z   0 [Note] WSREP: Partial re-merge of primary   19faf204-c2c7-11e8-b642-52dd65ccae43 found: 1 of 2. 2018-09-28T03:05:05.278520Z   0 [Note] WSREP: Quorum results:         version    = 4,         component  = PRIMARY,         conf_id    = 26,         members    = 3/3 (primary/total),         act_id     = 33,         last_appl. = 0,         protocols  = 0/8/3 (gcs/repl/appl),         group UUID =   c057dbc5-c16e-11e8-a1a6-825ed9079934 2018-09-28T03:05:05.278540Z   0 [Note] WSREP: Flow-control interval: [173, 173] 2018-09-28T03:05:05.278544Z   0 [Note] WSREP: Trying to continue unpaused monitor 2018-09-28T03:05:05.278548Z   0 [Note] WSREP: Restored state OPEN -> SYNCED (33) 2018-09-28T03:05:05.278593Z   1 [Note] WSREP: New cluster view: global state:   c057dbc5-c16e-11e8-a1a6-825ed9079934:33, view# 27: Primary, number of nodes:   3, my index: 2, protocol version 3 2018-09-28T03:05:05.278612Z   1 [Note] WSREP: Setting wsrep_ready to true 2018-09-28T03:05:05.278621Z   1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2018-09-28T03:05:05.278661Z   1 [Note] WSREP: REPL Protocols: 8 (3, 2) 2018-09-28T03:05:05.278679Z   1 [Note] WSREP: Assign initial position for certification: 33, protocol   version: 3 2018-09-28T03:05:05.278752Z   0 [Note] WSREP: Service thread queue flushed. 2018-09-28T03:05:05.278828Z   1 [Note] WSREP: Synchronized with group, ready for connections 2018-09-28T03:05:05.278849Z   1 [Note] WSREP: Setting wsrep_ready to true 2018-09-28T03:05:05.278863Z   1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2018-09-28T03:05:05.279134Z   0 [Note] WSREP: Member 0.0 (bd-dev-mingshuo-182) synced with group. 2018-09-28T03:05:05.279179Z   0 [Note] WSREP: Member 1.0 (bd-dev-mingshuo-183) synced with group. 2018-09-28T03:05:07.290328Z   0 [Note] WSREP: (725136c0, 'tcp://0.0.0.0:4567') turning message relay   requesting off

      出现脑裂后解决方法:

SET GLOBAL   wsrep_provider_options='pc.bootstrap=YES';

    三个节点正常关闭 依次关闭 183,182,89 三个节点 启动的之前,一定要看一下 grastate.dat 文件内容 183 节点

root@bd-dev-mingshuo-183:/u01/mysql/3307/data#more   grastate.dat # GALERA saved   state version: 2.1 uuid:    c057dbc5-c16e-11e8-a1a6-825ed9079934 seqno:   51 safe_to_bootstrap:   0  

  182 节点

root@bd-dev-mingshuo-182:/opt/mysql/3307/data#more   grastate.dat # GALERA saved state version: 2.1 uuid:    c057dbc5-c16e-11e8-a1a6-825ed9079934 seqno:   51 safe_to_bootstrap:   0

  89 节点

root@bd-dev-vertica-89:/opt/mysql/3307/data#more   grastate.dat # GALERA saved   state version: 2.1 uuid:    c057dbc5-c16e-11e8-a1a6-825ed9079934 seqno:   51 safe_to_bootstrap:   1

  注意: safe_to_bootstrap=1 的节点,说明这个节点是可以安全的作为主节点启动的。所以启动的时候必须先启动 89 节点。     mysqld_safe --defaults-file=/etc/my.cnf --wsrep-new-cluster & mysqld_safe --defaults-file=/etc/my3307.cnf & mysqld_safe --defaults-file=/etc/my3307.cnf &     疑问: mysql PXC 在启动时是不是只是按照 grastate.dat safe_to_bootstrap 来验证集群呢? 这个很好证明,还是按照上面的做法关闭集群,然后修改 183 grastate.dat safe_to_bootstrap 值为 1. 实验过程省略,但是这样做确实是可以启动集群的。     如果在关闭部分节点后有数据变化呢? 关闭 183,182 节点后,在 89 节点插入数据

mysql> insert   into ming.t1 values (16,16); Query OK, 1 row   affected (0.01 sec)  

然后关闭 89 节点。至此集群全部关闭。   修改 183 的节点的 grastate.dat

root@bd-dev-mingshuo-183:/u01/mysql/3307/data#more   grastate.dat # GALERA saved   state version: 2.1 uuid:    c057dbc5-c16e-11e8-a1a6-825ed9079934 seqno:   51 safe_to_bootstrap:   1

    启动集群,先启动 183 节点:

mysqld_safe   --defaults-file=/etc/my3307.cnf --wsrep-new-cluster & mysql> select   * from ming.t1; +----+------+ | a  | b      | +----+------+ |  1 |      1 | |  2 |      2 | |  3 |      3 | |  4 |      4 | |  5 |      5 | |  6 |      6 | |  7 |      7 | |  8 |      8 | |  9 |      9 | | 10 |   10 | | 11 |   11 | | 12 |   12 | | 13 |   13 | | 14 |   14 | | 15 |   15 | +----+------+ 15 rows in set   (0.00 sec)

16 那行数据丢失了。   再去启动 89 节点,看看丢失的数据能否找回来

2018-09-28T08:13:33.156722Z   0 [ERROR] WSREP: gcs/src/gcs_group.cpp:group_post_state_exchange():322:   Reversing history: 52 -> 51, this member has applied 1 more events than   the primary component.Data loss is possible. Aborting.

  89 节点的日志序列已经到了 52 ,超过了其他节点的 51. 修改 89 节点日志序列为 51 ,然后再尝试启动 89 节点

mysqld_safe   --defaults-file=/etc/my.cnf &   mysql> select   * from ming.t1; +----+------+ | a  | b      | +----+------+ |  1 |      1 | |  2 |      2 | |  3 |      3 | |  4 |      4 | |  5 |      5 | |  6 |      6 | |  7 |      7 | |  8 |      8 | |  9 |      9 | | 10 |   10 | | 11 |   11 | | 12 |   12 | | 13 |   13 | | 14 |   14 | | 15 |   15 | | 16 |   16 | +----+------+ 16 rows in set   (0.01 sec)

但是 183 节点的数据还是 15 条。   183 节点删除一条数据

mysql> delete   from ming.t1 where a=11; Query OK, 1 row   affected (0.00 sec)

  两个存活节点都删除了 11 这条数据。   启动 182 节点,可以正常启动,启动后检查数据,数据与 183 一致,推测数据的 donor 节点被选择成了 183

2018-09-28T08:21:19.486736Z   2 [Note] WSREP: Check if state gap can be serviced using IST 2018-09-28T08:21:19.486832Z   2 [Note] WSREP: IST receiver addr using tcp://172.31.217.182:4568 2018-09-28T08:21:19.487026Z   2 [Note] WSREP: Prepared IST receiver, listening at:   tcp://172.31.217.182:4568 2018-09-28T08:21:19.487050Z   2 [Note] WSREP: State gap can be likely serviced using IST. SST request   though present would be void. 2018-09-28T08:21:19.487984Z   0 [Note] WSREP: may fallback to sst. ist_seqno [51] < safe_ist_seqno [52] 2018-09-28T08:21:19.488006Z   0 [Note] WSREP: Member 2.0 (bd-dev-mingshuo-182) requested state transfer   from '*any*'. Selected 0.0   (bd-dev-mingshuo-183)(SYNCED) as donor.

日志中可以看到, 182 节点被选择成为 IST receiver ,监听端口 4568 端口。选择 183 节点作为 数据的 donor 。那么数据与 183 一致也就不足为奇了。   此时数据出现了不一致,如何解决呢? 可以删除节点数据目录下文件,然后按照启动,通过 SST 全量恢复数据。   Pxc 启动时可以人为选择数据的 doner 节点。 wsrep_sst_donor 参数 关闭两个节点,加 wsrep_sst_donor 参数重新启动

mysqld_safe   --defaults-file=/etc/my3307.cnf -- wsrep_sst_donor=172.31.217.89   & mysqld_safe   --defaults-file=/etc/my3307.cnf -- wsrep_sst_donor=172.31.217.89   &

 

相关推荐