[ERROR] [MY-012872] [InnoDB] [FATAL] Semaphore wait has lasted > 600 seconds

来源:这里教程网 时间:2026-03-01 16:09:16 作者:

InnoDB: ###### Diagnostic info printed to the standard error stream2021-11-19T14:52:21.671328+08:00 0 [ERROR] [MY-012872] [InnoDB] [FATAL] Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.2021-11-19T14:52:21.671407+08:00 0 [ERROR] [MY-013183] [InnoDB] Assertion failure: ut0ut.cc:532 thread 139928164968192InnoDB: We intentionally generate a memory trap.InnoDB: Submit a detailed bug report toInnoDB: If you get repeated assertion failures or crashes, evenInnoDB: immediately after the mysqld startup, there may beInnoDB: corruption in the InnoDB tablespace. Please refer toInnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.htmlInnoDB: about forcing recovery.06:52:21 UTC - mysqld got signal 6 ;Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.Thread pointer: 0x0Attempting backtrace. You can use the following information to find outwhere mysqld died. If you see no messages after this, something wentterribly wrong... Thread 139932014196480 在 buf0flu.cc line 1357 等待一个信号量,需要在buf0buf.cc line 781 创建一个 申请SX-lock的 RW-latch  ,但是这时thread id 139928049043200 在这上面已经有独占锁了,而Thread 139928049043200 在  btr0cur.cc line 233 等待一个信号量,需要在buf0buf.cc line 781 创建一个 申请X-lock (wait_ex)的 RW-latch ,可是此时Thread 139928049043200实际上是一直在等待独占锁(wait exclusive),即是说该RW-latch 目前仍未被释放。 [Warning] [MY-012985] [InnoDB] A long semaphore wait:--Thread 139932014196480 has waited at buf0flu.cc line 1357 for 784 seconds the semaphore:SX-lock on RW-latch at 0x7f439b4df988 created in file buf0buf.cc line 781a writer (thread id 139928049043200) has reserved it in mode  exclusive 2021-11-19T14:51:51.381694+08:00 0 [Warning] [MY-012985] [InnoDB] A long semaphore wait:--Thread 139928049043200 has waited at btr0cur.cc line 233 for 923 seconds the semaphore:X-lock (wait_ex) on RW-latch at 0x7f43a36c0108 created in file buf0buf.cc line 781a writer (thread id 139928049043200) has reserved it in mode  wait exclusive 先说buf0buf.cc ,其就是指的database buffer pool , line 781 对应的是  rw_lock_create(PFS_NOT_INSTRUMENTED, &block->lock, SYNC_LEVEL_VARYING); 是创建一个读写锁,其所属的函数是buf_block_init(),其是在初始buffer pool时初始化一个buffer 控制块,即是指对数据库buffer 进行锁控制的。 而buf0flu.cc 是用于mysql 的buffer pool flush的。line 1357 是 rw_lock_sx_lock_gen(rw_lock, BUF_IO_WRITE),看起来是在flush buffer时需要申请的rw_lock。  btr0cur.cc 的描述是索引树游标,作用是所有行操作对一个b-tree或记录的改变都必须通过这个模块,同时,每一次对聚合索引记录的修改或插入,都会在这里写入undo 日志记录。 而第233 行是获取btree block的过程,可以看到他需要去申请RW_X_LATCH.232  get_block = btr_block_get(page_id_t(page_id.space(), left_page_no),233                          page_size, RW_X_LATCH, cursor->index, mtr); 这里看起来就是对表的更改操作才会触发, 同时在error log中也可以看到Thread 139928049043200  的TRANSACTION 787138017 是一直在进行rollback,这可以进一步印证以上情况。---TRANSACTION 787138017, ACTIVE 4387 sec rollback <<<<<<<mysql tables in use 2, locked 1ROLLING BACK 1 lock struct(s), heap size 1136, 0 row lock(s), undo log entries 7891101MySQL thread id 11963922, OS thread handle 139928049043200, query id 1348226447 30.88.75.45 msvcmdata query endINSERT INTO `xxl_job_log_copy1` SELECT * FROM `msvcm`.`xxl_job_log`Trx read view will not see trx with id >= 787138017, sees < 787138017 同时也存在另外一条类似sql正在进行inserting,此时的buffer pool确实会相对紧张。---TRANSACTION 787233964, ACTIVE 3040 sec insertingmysql tables in use 2, locked 11 lock struct(s), heap size 1136, 0 row lock(s), undo log entries 8544482MySQL thread id 11964240, OS thread handle 139933323695872, query id 1348381276 30.88.75.45 msvcmdata executingINSERT INTO `xxl_job_log_copy2` SELECT * FROM `msvcm`.`xxl_job_log`Trx read view will not see trx with id >= 787233964, sees < 787138017 从innodb status 看起,当时Pending flushes  buffer pool 的情况不断上涨。Pending flushes (fsync) log: 1; buffer pool: 30Pending flushes (fsync) log: 1; buffer pool: 33Pending flushes (fsync) log: 1; buffer pool: 31Pending flushes (fsync) log: 1; buffer pool: 31Pending flushes (fsync) log: 1; buffer pool: 31Pending flushes (fsync) log: 1; buffer pool: 35Pending flushes (fsync) log: 0; buffer pool: 37Pending flushes (fsync) log: 0; buffer pool: 48Pending flushes (fsync) log: 1; buffer pool: 54Pending flushes (fsync) log: 1; buffer pool: 54Pending flushes (fsync) log: 1; buffer pool: 62Pending flushes (fsync) log: 0; buffer pool: 68Pending flushes (fsync) log: 0; buffer pool: 63Pending flushes (fsync) log: 1; buffer pool: 63Pending flushes (fsync) log: 0; buffer pool: 67Pending flushes (fsync) log: 0; buffer pool: 71Pending flushes (fsync) log: 1; buffer pool: 70Pending flushes (fsync) log: 1; buffer pool: 71Pending flushes (fsync) log: 1; buffer pool: 72Pending flushes (fsync) log: 1; buffer pool: 73Pending flushes (fsync) log: 0; buffer pool: 76Pending flushes (fsync) log: 1; buffer pool: 75Pending flushes (fsync) log: 1; buffer pool: 75Pending flushes (fsync) log: 0; buffer pool: 75Pending flushes (fsync) log: 1; buffer pool: 75Pending flushes (fsync) log: 1; buffer pool: 76Pending flushes (fsync) log: 0; buffer pool: 76Pending flushes (fsync) log: 1; buffer pool: 78Pending flushes (fsync) log: 0; buffer pool: 82Pending flushes (fsync) log: 0; buffer pool: 85Pending flushes (fsync) log: 1; buffer pool: 89Pending flushes (fsync) log: 1; buffer pool: 95Pending flushes (fsync) log: 1; buffer pool: 97Pending flushes (fsync) log: 0; buffer pool: 102 主要是因为存在两条比较大的insert select 全表扫描的语句,一条在进行insert,一条在进行rollback,加上当时buffer pool相对比较小,buffer pool的资源紧张,数据库在flush buffer及modify buffer 时对buffer 的latch发生争用且长时间等待,从而导致信号量超时。 目前我暂时未找到相关的已知bug描述,这类问题解决方案就是避免SQL长时间占用Latch:1,修改select查询避免全表扫描,避免Latch长期被占用。2,加查询条件,适当的加索引,让select执行更快,也避免一个select锁的数据更少。3,适当加大buffer pool 及 buffer pool instance,每个buffer pool都有自己独立的Latch,避免latch竞争。4.避免长时间回滚

相关推荐