主从半同步降级异步分析

来源:这里教程网 时间:2026-03-01 18:30:25 作者:

一、问题

这个问题大概是这样。当前主从状态如下有2个半同步从库,如果停掉1个半同步从库是是否会堵塞主库,堵塞时间大概是多久?

mysql> show global status like '%semi%';+--------------------------------------------+-----------+| Variable_name                              | Value     |+--------------------------------------------+-----------+| Rpl_semi_sync_master_clients               | 2         || Rpl_semi_sync_master_net_avg_wait_time     | 0         || Rpl_semi_sync_master_net_wait_time         | 0         || Rpl_semi_sync_master_net_waits             | 8454      || Rpl_semi_sync_master_no_times              | 9         || Rpl_semi_sync_master_no_tx                 | 1597      || Rpl_semi_sync_master_status                | ON        || Rpl_semi_sync_master_timefunc_failures     | 0         || Rpl_semi_sync_master_tx_avg_wait_time      | 77630     || Rpl_semi_sync_master_tx_wait_time          | 169312495 || Rpl_semi_sync_master_tx_waits              | 2181      || Rpl_semi_sync_master_wait_pos_backtraverse | 0         || Rpl_semi_sync_master_wait_sessions         | 0         || Rpl_semi_sync_master_yes_tx                | 2181      || Rpl_semi_sync_slave_status                 | OFF       |+--------------------------------------------+-----------+15 rows in set (0.01 sec)mysql> show variables like '%semi%';+-------------------------------------------+---------------------+| Variable_name                             | Value               |+-------------------------------------------+---------------------+| rpl_semi_sync_master_enabled              | ON                  || rpl_semi_sync_master_timeout              | 1000000000000000000 || rpl_semi_sync_master_trace_level          | 32                  || rpl_semi_sync_master_wait_for_slave_count | 2                   || rpl_semi_sync_master_wait_no_slave        | OFF                 || rpl_semi_sync_master_wait_point           | AFTER_SYNC          || rpl_semi_sync_slave_enabled               | ON                  || rpl_semi_sync_slave_trace_level           | 32                  |+-------------------------------------------+---------------------+

二、分析

首先从设置中这里,

  • rpl_semi_sync_master_timeout:1000000000000000000 设置为几乎无限大
  • rpl_semi_sync_master_wait_for_slave_count:2 设置为2代表需要2个半同步落盘才返回ACK。
  • rpl_semi_sync_master_wait_no_slave:OFF 设置为了OFF(默认为ON)

    而从当前的状态来看,

  • Rpl_semi_sync_master_clients 存在2个半同步从库
  • Rpl_semi_sync_master_status 为ON,处于半同步状态。

    这种情况下虽然rpl_semi_sync_master_timeout设置了很大值,但是如果2个半同步中一个关闭后,是否需要等待rpl_semi_sync_master_timeout设置的值呢?其实这就和 rpl_semi_sync_master_wait_no_slave 参数有关,因为设置为off,那么当半同步的dump线程退出的时候那么会判断当前半同步的数量是否满足rpl_semi_sync_master_wait_for_slave_count 参数设置,如果不满足则马上降级为异步复制(回调函数repl_semi_binlog_dump_end) 。

    但是dump线程的退出并不是及时的,需要等到一次心跳超时再次发送心跳的时候发现无法发送才会停止dump线程,而线上繁忙的系统,通常DUMP线程不会处于等待心跳发送的情况下的,但是一但回话线程在等待ACK上堵塞,那么binlog就不会再更改,dump线程也就没有事可做,只能触发心跳发送,但是发送心跳前会等待一个心跳周期为slave_net_timeout/2,如下,

    Binlog_sender::wait_with_heartbeatdo
      {
        set_timespec_nsec(&ts, m_heartbeat_period); //心跳周期
        ret= mysql_bin_log.wait_for_update_bin_log(m_thd, &ts);//如果没有binlog生成,也会不会被提前唤醒,等待到超时
        if (ret != ETIMEDOUT && ret != ETIME) //如果是正常收到 发送信号则返回 否则如果是超时则发送心跳event,进行心跳检测
          break; //正常返回0 是超时返回ETIMEDOUT 继续循环...
          if (send_heartbeat_event(log_pos)) //发送心跳
            return 1;
      } while (!m_thd->killed);

    slave_net_timeout默认为60秒,因此心跳周期默认为是30秒的时间,也就是参数slave_net_timeout/2,也就是这种情况下,半同步降级可能来自2个方面,

  • 通过repl_semi_report_binlog_sync 回调函数,用户线程ACK超时直接降级,这依赖rpl_semi_sync_master_timeout参数
  • 心跳发送超时未接收到,导致dump线程退出,退出触发回调函数repl_semi_binlog_dump_end,判断当前剩余半桶数量是否小于rpl_semi_sync_master_wait_for_slave_count 参数设置降级。

    因此此处 降级的时间大概应该是min(slave_net_timeout/2,rpl_semi_sync_master_timeout(微秒)),取一个小值。

    当然如果rpl_semi_sync_master_wait_no_slave保持默认值ON,则必须等待rpl_semi_sync_master_timeout参数设置的大小才降级。

    三、结论和测试截图

    因此这里我们看到这个配置下(rpl_semi_sync_master_wait_no_slave=OFF,rpl_semi_sync_master_wait_for_slave_count =2),停掉一个从库实际上会影响到主库但是在slave_net_timeout/2超时后会降级,默认是30秒,也就是30秒降级,当然如果rpl_semi_sync_master_timeout是默认的10秒,那么优先使用rpl_semi_sync_master_timeout的时间作为降级标准。这种情况下,5.7 测试截图如下,

    image.png

    8.0测试如下,

    image.png

    四、超时栈

    (gdb) bt
    #0  0x00007ffff7bcade2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0#1  0x00000000018a9cb7 in native_cond_timedwait (cond=0x2dc1100 <mysql_bin_log+1824>, mutex=0x2dc1048 <mysql_bin_log+1640>, abstime=0x7fffec165e10)
        at /opt/percona-server-locks-detail-5.7.22/include/thr_cond.h:129#2  0x00000000018aa01c in safe_cond_timedwait (cond=0x2dc1100 <mysql_bin_log+1824>, mp=0x2dc1020 <mysql_bin_log+1600>, abstime=0x7fffec165e10, 
        file=0x2221bd8 "/opt/percona-server-locks-detail-5.7.22/sql/binlog.cc", line=8362) at /opt/percona-server-locks-detail-5.7.22/mysys/thr_cond.c:88#3  0x0000000001802a65 in my_cond_timedwait (cond=0x2dc1100 <mysql_bin_log+1824>, mp=0x2dc1020 <mysql_bin_log+1600>, abstime=0x7fffec165e10, 
        file=0x2221bd8 "/opt/percona-server-locks-detail-5.7.22/sql/binlog.cc", line=8362) at /opt/percona-server-locks-detail-5.7.22/include/thr_cond.h:180#4  0x0000000001802f61 in inline_mysql_cond_timedwait (that=0x2dc1100 <mysql_bin_log+1824>, mutex=0x2dc1020 <mysql_bin_log+1600>, abstime=0x7fffec165e10, 
        src_file=0x2221bd8 "/opt/percona-server-locks-detail-5.7.22/sql/binlog.cc", src_line=8362) at /opt/percona-server-locks-detail-5.7.22/include/mysql/psi/mysql_thread.h:1229#5  0x0000000001814919 in MYSQL_BIN_LOG::wait_for_update_bin_log (this=0x2dc09e0 <mysql_bin_log>, thd=0x7ffee40008c0, timeout=0x7fffec165e10)
        at /opt/percona-server-locks-detail-5.7.22/sql/binlog.cc:8362#6  0x0000000001839283 in Binlog_sender::wait_with_heartbeat (this=0x7fffec1665b0, log_pos=194) at /opt/percona-server-locks-detail-5.7.22/sql/rpl_binlog_sender.cc:652#7  0x00000000018391d2 in Binlog_sender::wait_new_events (this=0x7fffec1665b0, log_pos=194) at /opt/percona-server-locks-detail-5.7.22/sql/rpl_binlog_sender.cc:631#8  0x0000000001838e87 in Binlog_sender::get_binlog_end_pos (this=0x7fffec1665b0, log_cache=0x7fffec165fc0) at /opt/percona-server-locks-detail-5.7.22/sql/rpl_binlog_sender.cc:397#9  0x00000000018369c5 in Binlog_sender::send_binlog (this=0x7fffec1665b0, log_cache=0x7fffec165fc0, start_pos=123) at /opt/percona-server-locks-detail-5.7.22/sql/rpl_binlog_sender.cc:345#10 0x0000000001836383 in Binlog_sender::run (this=0x7fffec1665b0) at /opt/percona-server-locks-detail-5.7.22/sql/rpl_binlog_sender.cc:226#11 0x0000000001834074 in mysql_binlog_send (thd=0x7ffee40008c0, log_ident=0x7fffec1671e0 "", pos=4, slave_gtid_executed=0x7fffec166fb0, flags=0)
        at /opt/percona-server-locks-detail-5.7.22/sql/rpl_master.cc:414#12 0x0000000001833f40 in com_binlog_dump_gtid (thd=0x7ffee40008c0, packet=0x7ffee40063f1 "", packet_length=124) at /opt/percona-server-locks-detail-5.7.22/sql/rpl_master.cc:398#13 0x00000000015671d6 in dispatch_command (thd=0x7ffee40008c0, com_data=0x7fffec167d90, command=COM_BINLOG_DUMP_GTID) at /opt/percona-server-locks-detail-5.7.22/sql/sql_parse.cc:1705#14 0x00000000015655c5 in do_command (thd=0x7ffee40008c0) at /opt/percona-server-locks-detail-5.7.22/sql/sql_parse.cc:1021#15 0x00000000016a635c in handle_connection (arg=0x64b1d50) at /opt/percona-server-locks-detail-5.7.22/sql/conn_handler/connection_handler_per_thread.cc:312#16 0x00000000018ce0f6 in pfs_spawn_thread (arg=0x5d3bc20) at /opt/percona-server-locks-detail-5.7.22/storage/perfschema/pfs.cc:2190#17 0x00007ffff7bc6ea5 in start_thread () from /lib64/libpthread.so.0#18 0x00007ffff66008dd in clone () from /lib64/libc.so.6(gdb) c
    Continuing.2024-10-21T08:24:34.739778Z 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 8700ms. The settings might not be optimal. (flushed=0, during the time.)Breakpoint 3, repl_semi_binlog_dump_end (param=0x7fffec165ee0) at /opt/percona-server-locks-detail-5.7.22/plugin/semisync/semisync_master_plugin.cc:164164       bool semi_sync_slave= is_semi_sync_dump();(gdb) bt
    #0  repl_semi_binlog_dump_end (param=0x7fffec165ee0) at /opt/percona-server-locks-detail-5.7.22/plugin/semisync/semisync_master_plugin.cc:164#1  0x000000000148c6f6 in Binlog_transmit_delegate::transmit_stop (this=0x2d89fc0 <delegates_init()::transmit_mem>, thd=0x7ffee40008c0, flags=0)
        at /opt/percona-server-locks-detail-5.7.22/sql/rpl_handler.cc:729#2  0x0000000001836169 in Binlog_sender::cleanup (this=0x7fffec1665b0) at /opt/percona-server-locks-detail-5.7.22/sql/rpl_binlog_sender.cc:176#3  0x00000000018368b9 in Binlog_sender::run (this=0x7fffec1665b0) at /opt/percona-server-locks-detail-5.7.22/sql/rpl_binlog_sender.cc:312#4  0x0000000001834074 in mysql_binlog_send (thd=0x7ffee40008c0, log_ident=0x7fffec1671e0 "", pos=4, slave_gtid_executed=0x7fffec166fb0, flags=0)
        at /opt/percona-server-locks-detail-5.7.22/sql/rpl_master.cc:414#5  0x0000000001833f40 in com_binlog_dump_gtid (thd=0x7ffee40008c0, packet=0x7ffee40063f1 "", packet_length=124) at /opt/percona-server-locks-detail-5.7.22/sql/rpl_master.cc:398#6  0x00000000015671d6 in dispatch_command (thd=0x7ffee40008c0, com_data=0x7fffec167d90, command=COM_BINLOG_DUMP_GTID) at /opt/percona-server-locks-detail-5.7.22/sql/sql_parse.cc:1705#7  0x00000000015655c5 in do_command (thd=0x7ffee40008c0) at /opt/percona-server-locks-detail-5.7.22/sql/sql_parse.cc:1021#8  0x00000000016a635c in handle_connection (arg=0x64b1d50) at /opt/percona-server-locks-detail-5.7.22/sql/conn_handler/connection_handler_per_thread.cc:312#9  0x00000000018ce0f6 in pfs_spawn_thread (arg=0x5d3bc20) at /opt/percona-server-locks-detail-5.7.22/storage/perfschema/pfs.cc:2190#10 0x00007ffff7bc6ea5 in start_thread () from /lib64/libpthread.so.0#11 0x00007ffff66008dd in clone () from /lib64/libc.so.6
    (gdb) thread 73[Switching to thread 73 (Thread 0x7fffd8161700 (LWP 1887))]#0  0x00007ffff7bcade2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0(gdb) bt
    #0  0x00007ffff7bcade2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0#1  0x0000000004802623 in native_cond_timedwait (cond=0x7e0efa8 <mysql_bin_log+1096>, mutex=0xa2d21d8, abstime=0x7fffd815e280) at /newdata/mysql-8.0.23/include/thr_cond.h:100#2  0x000000000480298e in safe_cond_timedwait (cond=0x7e0efa8 <mysql_bin_log+1096>, mp=0xa2d21b0, abstime=0x7fffd815e280, file=0x61b1de8 "/newdata/mysql-8.0.23/sql/binlog.cc", line=7557)
        at /newdata/mysql-8.0.23/mysys/thr_cond.cc:113#3  0x00000000043b8259 in my_cond_timedwait (cond=0x7e0efa8 <mysql_bin_log+1096>, mp=0x7e0ef48 <mysql_bin_log+1000>, abstime=0x7fffd815e280, file=0x61b1de8 "/newdata/mysql-8.0.23/sql/binlog.cc", 
        line=7557) at /newdata/mysql-8.0.23/include/thr_cond.h:147#4  0x00000000043b8496 in inline_mysql_cond_timedwait (that=0x7e0efa8 <mysql_bin_log+1096>, mutex=0x7e0ef48 <mysql_bin_log+1000>, abstime=0x7fffd815e280, 
        src_file=0x61b1de8 "/newdata/mysql-8.0.23/sql/binlog.cc", src_line=7557) at /newdata/mysql-8.0.23/include/mysql/psi/mysql_cond.h:241#5  0x00000000043cb6a6 in MYSQL_BIN_LOG::wait_for_update (this=0x7e0eb60 <mysql_bin_log>, timeout=0x7fffd815e280) at /newdata/mysql-8.0.23/sql/binlog.cc:7557#6  0x000000000448acaf in Binlog_sender::wait_with_heartbeat (this=0x7fffd815eb60, log_pos=316133) at /newdata/mysql-8.0.23/sql/rpl_binlog_sender.cc:796#7  0x0000000004489023 in Binlog_sender::wait_new_events (this=0x7fffd815eb60, log_pos=316133) at /newdata/mysql-8.0.23/sql/rpl_binlog_sender.cc:778#8  0x0000000004488502 in Binlog_sender::get_binlog_end_pos (this=0x7fffd815eb60, reader=0x7fffd815e970, end_pos=0x7fffd815e440) at /newdata/mysql-8.0.23/sql/rpl_binlog_sender.cc:557#9  0x0000000004488201 in Binlog_sender::send_binlog (this=0x7fffd815eb60, reader=0x7fffd815e970, start_pos=125) at /newdata/mysql-8.0.23/sql/rpl_binlog_sender.cc:511#10 0x0000000004487b95 in Binlog_sender::run (this=0x7fffd815eb60) at /newdata/mysql-8.0.23/sql/rpl_binlog_sender.cc:412#11 0x0000000004483604 in mysql_binlog_send (thd=0x7fff20015540, log_ident=0x7fffd815f700 "", pos=4, slave_gtid_executed=0x7fffd815f540, flags=2) at /newdata/mysql-8.0.23/sql/rpl_master.cc:1017#12 0x00000000044834c9 in com_binlog_dump_gtid (thd=0x7fff20015540, packet=0x7fff20006f71 "", packet_length=70) at /newdata/mysql-8.0.23/sql/rpl_master.cc:1002#13 0x000000000323dcc9 in dispatch_command (thd=0x7fff20015540, com_data=0x7fffd8160b40, command=COM_BINLOG_DUMP_GTID) at /newdata/mysql-8.0.23/sql/sql_parse.cc:2048#14 0x000000000323b513 in do_command (thd=0x7fff20015540) at /newdata/mysql-8.0.23/sql/sql_parse.cc:1320#15 0x000000000340dfb1 in handle_connection (arg=0xa2d4880) at /newdata/mysql-8.0.23/sql/conn_handler/connection_handler_per_thread.cc:301#16 0x000000000500dc16 in pfs_spawn_thread (arg=0xa524270) at /newdata/mysql-8.0.23/storage/perfschema/pfs.cc:2900#17 0x00007ffff7bc6ea5 in start_thread () from /lib64/libpthread.so.0#18 0x00007ffff61418dd in clone () from /lib64/libc.so.6(gdb) f 5#5  0x00000000043cb6a6 in MYSQL_BIN_LOG::wait_for_update (this=0x7e0eb60 <mysql_bin_log>, timeout=0x7fffd815e280) at /newdata/mysql-8.0.23/sql/binlog.cc:75577557        ret = mysql_cond_timedwait(&update_cond, &LOCK_binlog_end_pos,(gdb) p timeout
    $3 = (const timespec *) 0x7fffd815e280(gdb) p *timeout
    $4 = {tv_sec = 1729517575, tv_nsec = 413257724}(gdb) f 6#6  0x000000000448acaf in Binlog_sender::wait_with_heartbeat (this=0x7fffd815eb60, log_pos=316133) at /newdata/mysql-8.0.23/sql/rpl_binlog_sender.cc:796796         ret = mysql_bin_log.wait_for_update(&ts);(gdb)  p ts
    $5 = {tv_sec = 1729517575, tv_nsec = 413257724}(gdb) p m_heartbeat_period.count()$6 = 30000001024

  • 相关推荐