故障描述
管理数据库在上线过程中,由于大量的RESMGR:cpu quantum等待事件事件从而导致数据库性能问题,后续调整并关闭resource manager功能,后续数据库出现自动调度任务无法自动执行等后台alert日志告警,现将故障进行初步模拟,并对故障原因进行分析,提供相关解决方案。
前期参数设置并调整
由于数据库上线过程中出现大量的RESMGR:cpu quantum等待事件,出现性能问题,关闭了resource manager功能,关闭过程如下:
|
ALTER SYSTEM SET “_resource_manager_always_on”=FALSE SCOPE=SPFILE SID='*';
execute dbms_scheduler.set_attribute('SATURDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('SUNDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('MONDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('TUESDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('WEDNESDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('THURSDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('FRIDAY_WINDOW','RESOURCE_PLAN','');
#4 重启数据库使生效 |
重启后,resource manager 关闭,数据库不在出现RESMGR:cpu quantum等待事件。
但是数据库自动任务计划调度开始报错,后台报错如下:
|
ORA-12012: error on auto execute of job "SYS"."ORA$AT_SQ_SQL_SW_186" ORA-29373: resource manager is not on |
视图检查报错:
|
SQL> select client_name,window_name,job_name,job_status,job_start_time from dba_autotask_job_history ;
CLIENT_NAME WINDOW_NAME JOB_NAME JOB_STATUS JOB_START_TIME -------------------- -------------------- ------------------------------ ------------------------------ ---------------------------------------- auto optimizer stats WEDNESDAY_WINDOW ORA$AT_OS_OPT_SY_102 FAILED 09-APR-14 10.00.01.582006 PM PRC collection
auto optimizer stats FRIDAY_WINDOW ORA$AT_OS_OPT_SY_105 FAILED 11-APR-14 10.00.06.999466 PM PRC collection
auto optimizer stats TUESDAY_WINDOW ORA$AT_OS_OPT_SY_99 FAILED 08-APR-14 10.00.07.885416 PM PRC collection |
前期故障排查
为了确认故障原因,我们对该报错进行 打开29373 errorstack,收集了报错中的详细跟踪日志:
后台alert报错如下:
|
Sun May 04 06:00:04 2014 Dumping diagnostic data in directory=[cdmp_20140504060004], requested by (instance=1, osid=55167 (J001)), summary=[abnormal process termination]. Errors in file /oracle/database/diag/rdbms/nfdb/nfdb1/trace/nfdb1_j001_55167.trc: ORA-29373: resource manager is not on Errors in file /oracle/database/diag/rdbms/nfdb/nfdb1/trace/nfdb1_j001_55167.trc: ORA-12012: error on auto execute of job "SYS"."ORA$AT_OS_OPT_SY_271" ORA-29373: resource manager is not on Dumping diagnostic data in directory=[cdmp_20140504060009], requested by (instance=1, osid=55167 (J001)), summary=[abnormal process termination]. |
详细的trace跟踪报告如下:
|
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=3, mask=0x0) ----- Error Stack Dump ----- ORA-29373: resource manager is not on ----- SQL Statement (None) ----- Current SQL information unavailable - no cursor. |
从trace文件中我们也只能确认是因为resource manager造成JOB无法调度
不过在检测中我们发现,管理resource manager的进程DBRM依旧在数据库中存在,
|
#ps –ef | grep dbrm oracle 34920 1 0 Apr18 ? 00:00:59 ora_dbrm_nfdb1
|
这从一定程度上给予我们一定的怀疑方向,可能存在resource并没有完全关闭,而且从详细的trace文件中的从call stack trace里来看, 当前进程也没有没有关于resource manager的函数调用,而是一直在向另外一个进程post message。
所以,我们怀疑错误很有可能是由于DBRM没有正常关闭造成。
现对以上分析结果进行判断,获取可能依据
故障模拟
故障模拟一共分为如下几个部分:
|
步骤 |
操作 |
结果 |
|
Step1 |
和故障操作一样,设置 “_resource_manager_always_on 隐含参数,关闭 resource manager windows 调用计划 |
后台报错 |
|
Step2 |
删除隐含参数,只设置 resource manager windows 调用计划关闭 |
后台不报错 |
|
Step3 |
添加2 个隐含参数, 关闭 resource manager windows 调用计划 |
后台不报错 |
详细测试过程如下:
测试我们采用将数据库时间设置到22 点,让其自动调度JOB 计划
模拟管理库故障环境
关闭数据库,调整时间,设置隐含参数,关闭windows 计划
|
关闭数据库: SQL> shutdown immediate Database closed. Database dismounted. ORACLE instance shut down.
调整时间: [root@rhel6 ~]# date -s 2014/05/14 [root@rhel6 ~]# date -s 21:57:00 [root@rhel6 ~]# clock -w [root@rhel6 ~]# date Wed May 14 21:57:35 CST 2014
添加隐含参数,启动至open: SQL> ALTER SYSTEM SET "_resource_manager_always_on"=FALSE SCOPE=SPFILE; SQL> startup force (测试环境,直接force启动,生产环境勿如此操作)
设置resource manager plan: execute dbms_scheduler.set_attribute('SATURDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('SUNDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('MONDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('TUESDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('WEDNESDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('THURSDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('FRIDAY_WINDOW','RESOURCE_PLAN',''); |
我们观察22 点的alert 信息,确实开始报错:
|
Wed May 14 22:00:03 2014 Errors in file /oracle/ora11g/base/diag/rdbms/ora11g/ora11g/trace/ora11g_j003_4452.trc: ORA-12012: error on auto execute of job "SYS"."ORA$AT_OS_OPT_SY_27" ORA-29373: resource manager is not on Wed May 14 22:00:03 2014 Errors in file /oracle/ora11g/base/diag/rdbms/ora11g/ora11g/trace/ora11g_j004_4454.trc: ORA-12012: error on auto execute of job "SYS"."ORA$AT_SA_SPC_SY_28" ORA-29373: resource manager is not on Wed May 14 22:00:03 2014 Errors in file /oracle/ora11g/base/diag/rdbms/ora11g/ora11g/trace/ora11g_j005_4456.trc: ORA-12012: error on auto execute of job "SYS"."ORA$AT_SQ_SQL_SW_29" ORA-29373: resource manager is not on Wed May 14 22:00:04 2014 XDB installed. XDB initialized. |
检查 DBRM 进程已经存在:
|
[root@rhel6 ~]# ps -ef | grep dbrm ora11g 4346 1 0 21:57 ? 00:00:00 ora_dbrm_ora11g
|
检查后台JOB 执行记录视图:
|
SQL> select CLIENT_NAME,WINDOW_NAME,JOB_NAME,JOB_STATUS,JOB_START_TIME from DBA_AUTOTASK_JOB_HISTORY where CLIENT_NAME='auto optimizer stats collection' order by JOB_START_TIME desc;
CLIENT_NAME WINDOW_NAME JOB_NAME JOB_STATUS JOB_START_TIME ----------------------------------- ------------------------------ ------------------------------ ------------------------------ ---------------------------------------- auto optimizer stats collection WEDNESDAY_WINDOW ORA$AT_OS_OPT_SY_27 FAILED 14-MAY-14 10.00.03.233570 PM PRC |
模拟过程与生产管理库一致,结果也一致,DBRM 进程已经存在。
模拟去除隐含参数
关闭数据库,调整时间,去除隐含参数,关闭windows 计划
|
关闭数据库: SQL> shutdown immediate Database closed. Database dismounted. ORACLE instance shut down.
调整时间: [root@rhel6 ~]# date -s 2014/05/14 [root@rhel6 ~]# date -s 21:57:00 [root@rhel6 ~]# clock -w [root@rhel6 ~]# date Wed May 14 21:57:35 CST 2014
去除隐含参数,启动至open: 隐含参数去除采用create pfile from spfile; 删除spfile,编辑pfile文件,删除隐含参数,以pfile启动数据库
设置resource manager plan:(由于前面已经设置过,无需再设置) |
我们观察22 点的alert 信息,发现没有报错
|
Tue May 13 22:00:03 2014 Begin automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK" End automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK" Tue May 13 22:00:05 2014 XDB installed. XDB initialized.
|
检查 DBRM 进程:
|
[root@rhel6 ~]# ps -ef | grep dbrm ora11g 3844 1 0 21:55 ? 00:00:00 ora_dbrm_ora11g |
说明:此时 resource manager 由于只是关闭了 resource manager plan 计划,没有真正关闭 resource manager 因此该进程依旧存在。
检查后台JOB 执行视图信息:
|
SQL> select CLIENT_NAME,WINDOW_NAME,JOB_NAME,JOB_STATUS,JOB_START_TIME from DBA_AUTOTASK_JOB_HISTORY where CLIENT_NAME='auto optimizer stats collection' order by JOB_START_TIME desc;
CLIENT_NAME WINDOW_NAME JOB_NAME JOB_STATUS JOB_START_TIME ---------------------------------------- ------------------------------ ------------------------- -------------------- ---------------------------------------- auto optimizer stats collection TUESDAY_WINDOW ORA$AT_OS_OPT_SY_24 SUCCEEDED 13-MAY-14 10.00.02.102741 PM PRC |
说明在隐含参数除掉的情况下, JOB 可以正常执行,后台没有报错。
模拟隐含参数及resource manager plan 均存在的情况
关闭数据库,调整时间,添加隐含参数,关闭windows 计划
|
关闭数据库: SQL> shutdown immediate Database closed. Database dismounted. ORACLE instance shut down. 调整时间: [root@rhel6 ~]# date -s 2014/05/15 [root@rhel6 ~]# date -s 21:57:00 [root@rhel6 ~]# clock -w [root@rhel6 ~]# date Thu May 15 21:57:35 CST 2014 添加隐含参数,启动至open: SQL> alter system set "_resource_manager_always_off"=true scope=spfile; SQL> ALTER SYSTEM SET "_resource_manager_always_on"=FALSE SCOPE=SPFILE; SQL> startup force (测试环境,直接force启动,生产环境勿如此操作) 设置resource manager plan:(由于前面已经设置过,无需再设置) |
我们观察22 点的alert 信息,发现没有报错
|
Thu May 15 22:00:03 2014 Begin automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK" Thu May 15 22:00:05 2014 XDB installed. XDB initialized. End automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK" |
检查 DBRM 进程:
|
[root@rhel6 ~]# ps -ef | grep dbrm root 4650 3647 0 21:58 pts/2 00:00:00 grep dbrm |
说明:此时 DBRM 进程消失
检查后台JOB 执行视图信息:
|
SQL> select CLIENT_NAME,WINDOW_NAME,JOB_NAME,JOB_STATUS,JOB_START_TIME from DBA_AUTOTASK_JOB_HISTORY where CLIENT_NAME='auto optimizer stats collection' order by JOB_START_TIME desc;
CLIENT_NAME WINDOW_NAME JOB_NAME JOB_STATUS JOB_START_TIME ----------------------------------- ------------------------------ ------------------------------ ------------------------------ ---------------------------------------- auto optimizer stats collection THURSDAY_WINDOW ORA$AT_OS_OPT_SY_30 SUCCEEDED 15-MAY-14 10.00.02.115232 PM PRC |
说明在隐含参数两个都添加的情况下,完全屏蔽 resource manager 的的情况下, JOB 可以正常执行,后台没有报错。
总结说明
以上测试结果证明,后台报错JOB执行失败原因应该是DBRM进程依旧活动,而DBRM进程是管理RESOURCE manager 当去除"_resource_manager_always_off"=true及"_resource_manager_always_on"=FALSE
或者将两个参数全部添加,均可避免该错误,统计信息自动收集也可以自动执行
由于管理数据库相对重要,且上线时候出现过严重的RESMGR:cpu quantum等待事件,建议不要移除隐含参数,而是添加"_resource_manager_always_off"=true隐含参数,重启数据库
当然,可以先在另外两套库上进行确认,再在管理库上进
编辑推荐:
相关推荐
-
雷神推出 MIX PRO II 迷你主机:基于 Ultra 200H,玻璃上盖 + ARGB 灯效
2 月 9 日消息,雷神 (THUNDEROBOT) 现已宣布推出基于英
-
制造商 Musnap 推出彩色墨水屏电纸书 Ocean C:支持手写笔、第三方安卓应用
2 月 10 日消息,制造商 Musnap 现已在海外推出一款 Oce
