Oracle数据库 后台JOB报错排查

来源:这里教程网 时间:2026-03-03 18:12:13 作者:

故障描述

管理数据库在上线过程中,由于大量的RESMGR:cpu quantum等待事件事件从而导致数据库性能问题,后续调整并关闭resource manager功能,后续数据库出现自动调度任务无法自动执行等后台alert日志告警,现将故障进行初步模拟,并对故障原因进行分析,提供相关解决方案。

 

前期参数设置并调整

由于数据库上线过程中出现大量的RESMGR:cpu quantum等待事件,出现性能问题,关闭了resource manager功能,关闭过程如下:

ALTER SYSTEM SET “_resource_manager_always_on”=FALSE   SCOPE=SPFILE SID='*';

 

   execute   dbms_scheduler.set_attribute('SATURDAY_WINDOW','RESOURCE_PLAN','');

   execute   dbms_scheduler.set_attribute('SUNDAY_WINDOW','RESOURCE_PLAN','');

   execute   dbms_scheduler.set_attribute('MONDAY_WINDOW','RESOURCE_PLAN','');

   execute   dbms_scheduler.set_attribute('TUESDAY_WINDOW','RESOURCE_PLAN','');

   execute   dbms_scheduler.set_attribute('WEDNESDAY_WINDOW','RESOURCE_PLAN','');

   execute   dbms_scheduler.set_attribute('THURSDAY_WINDOW','RESOURCE_PLAN','');

   execute   dbms_scheduler.set_attribute('FRIDAY_WINDOW','RESOURCE_PLAN','');

 

#4  重启数据库使生效

 

重启后,resource manager 关闭,数据库不在出现RESMGR:cpu quantum等待事件。

但是数据库自动任务计划调度开始报错,后台报错如下:

ORA-12012: error on auto execute of job   "SYS"."ORA$AT_SQ_SQL_SW_186"

ORA-29373:   resource manager is not on

 

视图检查报错:

SQL> select client_name,window_name,job_name,job_status,job_start_time   from dba_autotask_job_history ;

 

CLIENT_NAME            WINDOW_NAME            JOB_NAME                         JOB_STATUS                       JOB_START_TIME

-------------------- --------------------   ------------------------------ ------------------------------   ----------------------------------------

auto optimizer stats WEDNESDAY_WINDOW     ORA$AT_OS_OPT_SY_102            FAILED                         09-APR-14   10.00.01.582006 PM PRC

 collection

 

auto optimizer stats FRIDAY_WINDOW        ORA$AT_OS_OPT_SY_105            FAILED                         11-APR-14   10.00.06.999466 PM PRC

 collection

 

auto optimizer stats TUESDAY_WINDOW       ORA$AT_OS_OPT_SY_99             FAILED                         08-APR-14   10.00.07.885416 PM PRC

 collection

 

 

前期故障排查

为了确认故障原因,我们对该报错进行  打开29373 errorstack,收集了报错中的详细跟踪日志:

后台alert报错如下:

Sun May 04   06:00:04 2014

Dumping diagnostic   data in directory=[cdmp_20140504060004], requested by (instance=1, osid=55167   (J001)), summary=[abnormal process termination].

Errors in file   /oracle/database/diag/rdbms/nfdb/nfdb1/trace/nfdb1_j001_55167.trc:

ORA-29373:   resource manager is not on

Errors in file   /oracle/database/diag/rdbms/nfdb/nfdb1/trace/nfdb1_j001_55167.trc:

ORA-12012: error   on auto execute of job "SYS"."ORA$AT_OS_OPT_SY_271"

ORA-29373:   resource manager is not on

Dumping diagnostic   data in directory=[cdmp_20140504060009], requested by (instance=1, osid=55167   (J001)), summary=[abnormal process termination].

 

 

详细的trace跟踪报告如下:

dbkedDefDump():   Starting a non-incident diagnostic dump (flags=0x0, level=3, mask=0x0)

----- Error Stack   Dump -----

ORA-29373:   resource manager is not on

----- SQL   Statement (None) -----

Current SQL   information unavailable - no cursor.

 

从trace文件中我们也只能确认是因为resource manager造成JOB无法调度

 

不过在检测中我们发现,管理resource manager的进程DBRM依旧在数据库中存在,

#ps –ef | grep   dbrm

oracle 34920 1 0 Apr18 ? 00:00:59    ora_dbrm_nfdb1

 

 

这从一定程度上给予我们一定的怀疑方向,可能存在resource并没有完全关闭,而且从详细的trace文件中的从call stack trace里来看, 当前进程也没有没有关于resource manager的函数调用,而是一直在向另外一个进程post message。

所以,我们怀疑错误很有可能是由于DBRM没有正常关闭造成。

 

现对以上分析结果进行判断,获取可能依据

 

故障模拟

 

故障模拟一共分为如下几个部分:

步骤

操作

结果

Step1

和故障操作一样,设置  “_resource_manager_always_on  隐含参数,关闭  resource manager windows  调用计划

后台报错

Step2

删除隐含参数,只设置  resource   manager windows  调用计划关闭

后台不报错

Step3

添加个隐含参数,  关闭  resource   manager windows  调用计划

后台不报错

 

详细测试过程如下:

测试我们采用将数据库时间设置到22  点,让其自动调度JOB  计划

模拟管理库故障环境

关闭数据库,调整时间,设置隐含参数,关闭windows  计划

关闭数据库:

SQL> shutdown immediate

Database closed.

Database dismounted.

ORACLE instance shut down.

 

调整时间:

[root@rhel6 ~]# date -s 2014/05/14

[root@rhel6 ~]# date -s 21:57:00

[root@rhel6 ~]# clock -w

[root@rhel6 ~]# date

Wed May 14 21:57:35 CST 2014

 

添加隐含参数,启动至open:

SQL> ALTER SYSTEM SET   "_resource_manager_always_on"=FALSE SCOPE=SPFILE;

SQL> startup force  (测试环境,直接force启动,生产环境勿如此操作)

 

设置resource manager plan:

execute   dbms_scheduler.set_attribute('SATURDAY_WINDOW','RESOURCE_PLAN','');

execute   dbms_scheduler.set_attribute('SUNDAY_WINDOW','RESOURCE_PLAN','');

execute   dbms_scheduler.set_attribute('MONDAY_WINDOW','RESOURCE_PLAN','');

execute   dbms_scheduler.set_attribute('TUESDAY_WINDOW','RESOURCE_PLAN','');

execute   dbms_scheduler.set_attribute('WEDNESDAY_WINDOW','RESOURCE_PLAN','');

execute   dbms_scheduler.set_attribute('THURSDAY_WINDOW','RESOURCE_PLAN','');

execute   dbms_scheduler.set_attribute('FRIDAY_WINDOW','RESOURCE_PLAN','');

 

我们观察22  点的alert  信息,确实开始报错:

Wed May 14 22:00:03 2014

Errors in file   /oracle/ora11g/base/diag/rdbms/ora11g/ora11g/trace/ora11g_j003_4452.trc:

ORA-12012: error on auto execute   of job "SYS"."ORA$AT_OS_OPT_SY_27"

ORA-29373: resource manager is   not on

Wed May 14 22:00:03 2014

Errors in file   /oracle/ora11g/base/diag/rdbms/ora11g/ora11g/trace/ora11g_j004_4454.trc:

ORA-12012: error on auto execute of job   "SYS"."ORA$AT_SA_SPC_SY_28"

ORA-29373: resource manager is not on

Wed May 14 22:00:03 2014

Errors in file /oracle/ora11g/base/diag/rdbms/ora11g/ora11g/trace/ora11g_j005_4456.trc:

ORA-12012: error on auto execute of job   "SYS"."ORA$AT_SQ_SQL_SW_29"

ORA-29373: resource manager is not on

Wed May 14 22:00:04 2014

XDB installed.

XDB initialized.

 

检查  DBRM  进程已经存在:

[root@rhel6 ~]# ps -ef | grep dbrm

ora11g      4346     1  0 21:57 ?        00:00:00 ora_dbrm_ora11g

 

 

检查后台JOB  执行记录视图:

SQL> select   CLIENT_NAME,WINDOW_NAME,JOB_NAME,JOB_STATUS,JOB_START_TIME from   DBA_AUTOTASK_JOB_HISTORY where CLIENT_NAME='auto optimizer stats collection'   order by JOB_START_TIME desc;

 

CLIENT_NAME                         WINDOW_NAME                    JOB_NAME                       JOB_STATUS                     JOB_START_TIME

-----------------------------------   ------------------------------ ------------------------------   ------------------------------ ----------------------------------------

auto optimizer stats collection     WEDNESDAY_WINDOW               ORA$AT_OS_OPT_SY_27             FAILED                         14-MAY-14 10.00.03.233570   PM PRC

 

模拟过程与生产管理库一致,结果也一致,DBRM  进程已经存在。

 

 

模拟去除隐含参数

关闭数据库,调整时间,去除隐含参数,关闭windows  计划

关闭数据库:

SQL> shutdown immediate

Database closed.

Database dismounted.

ORACLE instance shut down.

 

调整时间:

[root@rhel6 ~]# date -s 2014/05/14

[root@rhel6 ~]# date -s 21:57:00

[root@rhel6 ~]# clock -w

[root@rhel6 ~]# date

Wed May 14 21:57:35 CST 2014

 

去除隐含参数,启动至open:

隐含参数去除采用create pfile from spfile;

删除spfile,编辑pfile文件,删除隐含参数,以pfile启动数据库

 

设置resource manager plan:(由于前面已经设置过,无需再设置)

 

我们观察22  点的alert  信息,发现没有报错

Tue   May 13 22:00:03 2014

Begin   automatic SQL Tuning Advisor run for special tuning task  "SYS_AUTO_SQL_TUNING_TASK"

End   automatic SQL Tuning Advisor run for special tuning task  "SYS_AUTO_SQL_TUNING_TASK"

Tue   May 13 22:00:05 2014

XDB   installed.

XDB   initialized.

 

 

检查 DBRM  进程:

[root@rhel6 ~]# ps -ef | grep dbrm

ora11g      3844     1  0 21:55 ?        00:00:00 ora_dbrm_ora11g

 

说明:此时  resource manager  由于只是关闭了  resource manager plan  计划,没有真正关闭  resource manager  因此该进程依旧存在。

 

检查后台JOB  执行视图信息:

SQL> select   CLIENT_NAME,WINDOW_NAME,JOB_NAME,JOB_STATUS,JOB_START_TIME from   DBA_AUTOTASK_JOB_HISTORY where CLIENT_NAME='auto optimizer stats collection'   order by JOB_START_TIME desc;

 

CLIENT_NAME                                WINDOW_NAME                      JOB_NAME                    JOB_STATUS           JOB_START_TIME

----------------------------------------   ------------------------------ ------------------------- --------------------   ----------------------------------------

auto optimizer stats collection          TUESDAY_WINDOW                 ORA$AT_OS_OPT_SY_24        SUCCEEDED              13-MAY-14   10.00.02.102741 PM PRC

 

说明在隐含参数除掉的情况下,  JOB  可以正常执行,后台没有报错。

 

模拟隐含参数及resource manager plan  均存在的情况

关闭数据库,调整时间,添加隐含参数,关闭windows  计划

关闭数据库:

SQL> shutdown immediate

Database closed.

Database dismounted.

ORACLE instance shut down.

调整时间:

[root@rhel6 ~]# date -s 2014/05/15

[root@rhel6 ~]# date -s 21:57:00

[root@rhel6 ~]# clock -w

[root@rhel6 ~]# date

Thu May 15 21:57:35 CST 2014

添加隐含参数,启动至open:

SQL> alter system set   "_resource_manager_always_off"=true scope=spfile;

SQL> ALTER SYSTEM SET   "_resource_manager_always_on"=FALSE SCOPE=SPFILE;

SQL> startup   force  (测试环境,直接force启动,生产环境勿如此操作)

设置resource manager plan:(由于前面已经设置过,无需再设置)

 

我们观察22  点的alert  信息,发现没有报错

Thu   May 15 22:00:03 2014

Begin   automatic SQL Tuning Advisor run for special tuning task  "SYS_AUTO_SQL_TUNING_TASK"

Thu   May 15 22:00:05 2014

XDB   installed.

XDB   initialized.

End   automatic SQL Tuning Advisor run for special tuning task  "SYS_AUTO_SQL_TUNING_TASK"

 

检查 DBRM  进程:

[root@rhel6 ~]# ps -ef | grep dbrm

root        4650  3647  0 21:58 pts/2    00:00:00 grep dbrm

 

说明:此时  DBRM  进程消失

 

检查后台JOB  执行视图信息:

SQL>   select CLIENT_NAME,WINDOW_NAME,JOB_NAME,JOB_STATUS,JOB_START_TIME from   DBA_AUTOTASK_JOB_HISTORY where CLIENT_NAME='auto optimizer stats collection'   order by JOB_START_TIME desc;

 

CLIENT_NAME                         WINDOW_NAME                    JOB_NAME                       JOB_STATUS                     JOB_START_TIME

-----------------------------------   ------------------------------ ------------------------------ ------------------------------   ----------------------------------------

auto   optimizer stats collection       THURSDAY_WINDOW                  ORA$AT_OS_OPT_SY_30             SUCCEEDED                      15-MAY-14 10.00.02.115232   PM PRC

 

说明在隐含参数两个都添加的情况下,完全屏蔽  resource manager  的的情况下,  JOB  可以正常执行,后台没有报错。

 

 

总结说明

以上测试结果证明,后台报错JOB执行失败原因应该是DBRM进程依旧活动,而DBRM进程是管理RESOURCE manager 当去除"_resource_manager_always_off"=true及"_resource_manager_always_on"=FALSE

或者将两个参数全部添加,均可避免该错误,统计信息自动收集也可以自动执行

由于管理数据库相对重要,且上线时候出现过严重的RESMGR:cpu quantum等待事件,建议不要移除隐含参数,而是添加"_resource_manager_always_off"=true隐含参数,重启数据库

 

当然,可以先在另外两套库上进行确认,再在管理库上进

相关推荐