故障描述
8 月 4 号晚上 2 2 点 ,运维侧反馈有系统报错,提示无监听程序,需要我们帮忙查下原因,这是一套 RAC 数据库 。
登录到服务器上面 ,为了尽快恢复实例,手工启动实例发现节点 1 的 crs 都已经 hang 住了,手工强制重启 crs ,之后启动数据库实例,实例可以正常启动。
问题详细诊断过程
检查数据库alert 日志 , 发现 ORA-07445 和 ORA-00600 [ 17147 ] 报错 : Sun Aug 01 22:41:29 2021 Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0x250E876, qertbRestoreFetchState2()+60] [flags: 0x0, count: 1] Errors in file /u01/app/oracle/diag/rdbms/zlemr/zlemr1/trace/zlemr1_ora_20043.trc (incident=96939): ORA-07445: 鍑虹幇寮傚父閿欒 : 鏍稿績杞偍 [qertbRestoreFetchState2()+60] [SIGSEGV] [ADDR:0x0] [PC:0x250E876] [SI_KERNEL(general_protection)] [] Incident details in: /u01/app/oracle/diag/rdbms/zlemr/zlemr1/incident/incdir_96939/zlemr1_ora_20043_i96939.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Sun Aug 01 22:41:29 2021 Dumping diagnostic data in directory=[cdmp_20210801224129], requested by (instance=1, osid=20043), summary=[incident=96939]. Sun Aug 01 22:41:30 2021 Sweep [inc][96939]: completed Sweep [inc2][96939]: completed Sun Aug 01 22:42:08 2021 Errors in file /u01/app/oracle/diag/rdbms/zlemr/zlemr1/trace/zlemr1_pmon_22489.trc (incident=96017): ORA-00600: internal error code, arguments: [17147], [0xB0BF733F8], [], [], [], [], [], [], [], [], [], [] Incident details in: /u01/app/oracle/diag/rdbms/zlemr/zlemr1/incident/incdir_96017/zlemr1_pmon_22489_i96017.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /u01/app/oracle/diag/rdbms/zlemr/zlemr1/trace/zlemr1_pmon_22489.trc: ORA-00600: internal error code, arguments: [17147], [0xB0BF733F8], [], [], [], [], [], [], [], [], [], [] PMON (ospid: 22489): terminating the instance due to error 472 Sun Aug 01 22:42:11 2021 System state dump requested by (instance=1, osid=22489 (PMON)), summary=[abnormal instance termination]. System State dumped to trace file /u01/app/oracle/diag/rdbms/zlemr/zlemr1/trace/zlemr1_diag_22544_20210801224211.trc Instance terminated by PMON, pid = 22489 Wed Aug 04 22:52:02 2021
从上面的日志可以看出 , 数据库从8 月 1 号实例 1 就已经宕掉了 ,检查操作系统日志, crs 日志都没有发现其他报错信息。
查看对应的 trace 文件,可以判断是由于 plsql 程序执行了一个查询操作, sql 如下: select * from (Select Extractvalue(a.content, 'zlxml//document/element[@title=" 住院号 "]/text()') as 住院号 , Extractvalue(a.content, 'zlxml//document/element[@title=" 姓名 "]/text()') as 姓名 , extractvalue(b.detail,'root/ic_list/icode[@title=" 病人 ID"]') AS 病人 id, extractvalue(b.detail,'root/ic_list/icode[@title=" 住院次数 "]') AS 主页 id, replace(b_Doc_Base.f_get_content(a.actlog_id, ' 主诉 ', a.title),' ','') 主诉 , ---- length(b_Doc_Base.f_get_content(a.actlog_id, ' 主诉 ', a.title)) as aaa, replace(b_Doc_Base.f_get_content(a.actlog_id, ' 现病史 ', a.title),' ','') 现病史 , a.title From bz_doc_log a, bz_masters b,bz_act_log c where b.id=c.master_id --- and a.edit_time = (select max(edit_time) from bz_doc_log x where x.actlog_id = a.actlog_id and x.antetype_id = a.antetype_id) and c.id=a.actlog_id and a.title in (' XXXX ',' XXXX ',' XXXX ',' XXXX ',' XXXX ',' XXXX ) and a.edit_time > sysdate -1 ) D
检查trace 文件 zlemr1_pmon_22489_i96017.trc 发现 ,存在大量的 报错信息 , 信息如下 : ………………………
Chunk b0bf6f778 sz= 3072 free " " Chunk b0bf70378 sz= 4224 freeable "SQLA^63c299a9 " ds=0x107343770 Chunk b0bf713f8 sz= 4096 NEXT NEXT CHUNK'S PREVIOUS POINTER NOT POINTING TO NEXT CHUNK 000000000 freeable "SQLA^27cec08c " ds=0x106780760 Two suspect chunks will be dumped with their memory contents The first chunk (above) may have corrupted the size of the second chunk More likely is that the second chunk has corrupted the following chunk ……………
通过这个 trace 文件内容可以判断, plsql 程序执行查询语句,触发了共享池内存冲突,
问题定位 :通过官网对报错进行匹配,确认 遇到了 Bug 22243719 。这个 bug 补丁已经包含在比较早的数据库 P SU 里面 , 所以 怀疑这套库没打任何补丁,通知检查数据库补丁的情况,确实数据库没有打任何补丁。
解决办法和建议
1 . 避免使用 plsql 执行上面的语句 。
2. 打补丁,至少要打 11.2.0.4.161018 (Oct 2016) ,综合考虑风险等因素,打补丁 Patch: 22243719 。
最后经过客户确定,还是采用打补丁方式解决这个问题,当天晚上停应用,停库,打补丁。
补丁验证 : [oracle@emrdb1 OPatch]$ ./opatch lsinv Oracle Interim Patch Installer version 11.2.0.3.29 Copyright (c) 2021, Oracle Corporation. All rights reserved. Oracle Home : /u01/app/oracle/product/11.2.0/db_1 Central Inventory : /u01/app/oraInventory from : /u01/app/oracle/product/11.2.0/db_1/oraInst.loc OPatch version : 11.2.0.3.29 OUI version : 11.2.0.4.0 Log file location : /u01/app/oracle/product/11.2.0/db_1/cfgtoollogs/opatch/opatch2021-08-05_22-29-58PM_1.log Lsinventory Output file location : /u01/app/oracle/product/11.2.0/db_1/cfgtoollogs/opatch/lsinv/lsinventory2021-08-31_15-50-26PM.txt ------------------------------------------------------------------------- Local Machine Information:: Hostname: emrdb1 ARU platform id: 226 ARU platform description:: Linux x86-64 Installed Top-level Products (1): Oracle Database 11g 11.2.0.4.0 There are 1 products installed in this Oracle Home. Interim patches (1) : Patch 22243719 : applied on Thu Aug 05 22:27:56 CST 2021 Unique Patch ID: 19972949 Created on 12 Mar 2016, 04:02:16 hrs PST8PDT Bugs fixed: 22243719 ------------------------------------------------------------------------- OPatch succeeded.
