slave_preserve_commit_order 这个参数文档上说是维持顺序,保持跟master一致,但是这个很容易理解成不能并发的组顺序乱序应用,其实不是,这个是指能并发执行的事务间的乱序。所以这个参数关闭后,从库在并行复制的时候,也会出现“Waiting for dependent transaction to commit“ 状态,这个在压测环境很容易复现
slave1 [localhost] {msandbox} ((none)) > show processlist;
+----+-------------+-----------+------+---------+-------+---------------------------------------------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+-----------+------+---------+-------+---------------------------------------------+------------------+
| 1 | system user | | NULL | Connect | 15602 | Waiting for master to send event | NULL |
| 2 | system user | | NULL | Connect | 0 | Waiting for dependent transaction to commit | NULL |
| 3 | system user | | NULL | Connect | 6 | System lock | NULL |
| 4 | system user | | NULL | Connect | 6 | System lock | NULL |
| 5 | system user | | NULL | Connect | 6 | System lock | NULL |
| 6 | system user | | NULL | Connect | 6 | Waiting for an event from Coordinator | NULL |
| 10 | msandbox | localhost | NULL | Query | 0 | starting | show processlist |
+----+-------------+-----------+------+---------+-------+---------------------------------------------+------------------+
7 rows in set (0.00 sec)
slave1 [localhost] {msandbox} ((none)) > show variables like '%order%';
+-----------------------------+-------+
| Variable_name | Value |
+-----------------------------+-------+
| binlog_order_commits | ON |
| slave_preserve_commit_order | OFF |
+-----------------------------+-------+
2 rows in set (0.01 sec)
下面看下代码schedule_next_event -> wait_for_last_committed_trx -> stage_worker_waiting_for_commit_parent 在遇到事务开始,先调度下一个event。schedule_next_event,调度这里判断是否能并行,不能并行就等待,然后分配worker,在获取worker这里,get_least_occupied_worker这里根据参数slave_preserve_commit_order判断是否有order_commit_manager,有就加入顺序队列,没有就不加了。所以这里看是先判断是否能并行执行的,能并行调度才找的worker,所以slave_preserve_commit_order是在可并行的前提下,是否顺序应用的。 下面是获取worker的流程 在获取worker的时候,如果之前分配了,就直接使用就行了说明队列里已经注册过了,如果最近没有,那么在空闲worker里找一个,如果空闲worker没有,那么进入stage_slave_waiting_for_workers_to_process_queue状态。如果在空闲中找到一个,如果设置了slave_preserve_commit_order,就注册下,worker放入顺序队列中。所以说乱序,是worker注册顺序问题,注册的早,event分配到这个worker,执行的就早。 这个参数不保证非事务性的更新的顺序,这样就可能导致出现gap, relay_log_recovery 能解决gap和不一致的的情况。
