一 问题现象及日志 DBA接到客户请求系统已经不能使用了,这是一个超市的一套系统,oracle 版本使用的是11.2.0.3.0
alert日志出现ora-04031报错,先使用alter system flush shared_pool 后,应用正常使用了20分钟后就宕机了
宕机前alert日志(
2024.10.15)
alert日志在10.15日系统重启前报错
ORA-04031: ???? 2928 ??????? ("shared pool","unknown object","sga heap(1,1)","KGLHD")
ERROR at line 1:
ORA-04031: ???? 32 ??????? ("shared pool","unknown
object","KGLH0^556e728f","kglHeapInitialize:temp")
ERROR:
ORA-00604: ?? SQL ?? 1 ????
ORA-04031: ???? 32 ??????? ("shared pool","BEGIN
DBMS_OUTPUT.ENABLE(NUL...","KGLH0^e3a2d601","kglHeapInitialize:temp")
其中还报有ora-600错误
以上关键信息KGLHD,是什么有点纳闷,后面详细分析
重启数据库后,修改内存手工管理模式,过滤几天再次宕机,2024.10.20再次宕机前日志
Errors in file /u01/app/oracle/diag/rdbms/upop/upop/trace/upop_smon_3007.trc:
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-04031: 无法分配 32 字节的共享内存 ("shared pool","select f.file#, f.block#, f....","SQLA","tmp")
Mon Oct 20 08:23:26 2014
Errors in file /u01/app/oracle/diag/rdbms/upop/upop/trace/upop_smon_3007.trc:
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-04031: 无法分配 32 字节的共享内存 ("shared pool","unknown object","KGLH0^f185eace","kglHeapInitialize:temp")
Mon Oct 20 08:23:38 2014
Errors in file /u01/app/oracle/diag/rdbms/upop/upop/trace/upop_j001_7116.trc:
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-04031: 无法分配 32 字节的共享内存 ("shared pool","update sys.job$ set this_dat...","KGLH0^7a42409d","kglHeapInitialize:temp")
Mon Oct 20 08:23:38 2014
Errors in file /u01/app/oracle/diag/rdbms/upop/upop/trace/upop_j000_7114.trc:
ORA-12012: 自动执行作业 14 出错
ORA-04031: 无法分配 3480 字节的共享内存 ("shared pool","DXS","PLMCD^97454214","BAMIMA: Bam Buffer")
ORA-06508: PL/SQL: 无法找到正在调用 : "DBUSRBAS.DXS" 的程序单元
ORA-06512: 在 "DBUSRBAS.TASK", line 685
二 诊断思路 2.1 查看系统使用的是内存自动管理,在oracle 10g,11g 内存自动管理方式可能会导致ora-04031错误可以使用内存手工管理模式规避该问题, 2.2 ora-04031可能与shared pool中的绑定变量使用有关 2.3 oracle bug 问题也会导致ora-04031三 解决过程 3.1 由于已经宕机,先重启数据库,设置内存手工管理机器内存是32G,还部署有应用,给数据库分配内存是10G如下,修改内存手工管理规划sga总大小8G,sga各个组件固定大小(其中db_chache_size 4G,shared_pool_size 3G);pga 2G
(1) 取消memory_target和memory_max_target参数设置
create pfile='/tmp/puoppfile.ora' from spfile;
alter system set memory_max_target=0 scope=spfile;
alter system set memory_target=0 scope=spfile;
(2)SGA各个组件大小设置(总大小8G)
set line 120
select COMPONENT,current_size/1024/1024 as MB from v$sga_dynamic_components;
alter system set sga_target=0 scope=spfile;
alter system set sga_max_size=8G scope=spfile;
alter system set JAVA_POOL_SIZE=256M scope=spfile;
alter system set large_POOL_SIZE=256M scope=spfile;
alter system set db_cache_size=4G scope=spfile;
alter system set shared_pool_size=3G scope=spfile;
(3)pga设置2G
alter system set pga_aggregate_target=2G scope=spfile;
设置完成,重启数据库
shutdown immediate;
startup;
(4) 检查结果
show parameter sga
show parameter pga
show parameter shared
show parameter memory
set line 120
select COMPONENT,current_size/1024/1024 as MB from v$sga_dynamic_components;
select ROUND(value/1024/1024,2) from v$pgastat where name='total PGA allocated';
附:
修改内存组件大小后,startup 启动报错
提示memory_max_target小于sga_max_size;
解决:在数据库没有启动情况下,create pfile='/tmp/upopfile.ora' from spfile
vi /tmp/upopfile.ora
将memory_max_target修改为10737418240=10*1024*1024*1024
startup pfile='/tmp/upopfile.ora'
create spfile from pfile='/tmp/upopfile.ora';
操作结果
QL> show parameter sga
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
lock_sga boolean FALSE
pre_page_sga boolean FALSE
sga_max_size big integer 8G
sga_target big integer 0
SQL> show parameter pga
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
pga_aggregate_target big integer 2G
SQL> show parameter shared
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
hi_shared_memory_address integer 0
max_shared_servers integer
shared_memory_address integer 0
shared_pool_reserved_size big integer 161061273
shared_pool_size big integer 3G
shared_server_sessions integer
shared_servers integer 0
SQL> show parameter memory
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
hi_shared_memory_address integer 0
memory_max_target big integer 10G
memory_target big integer 0
shared_memory_address integer 0
SQL> set line 120
SQL> select COMPONENT,current_size/1024/1024 as MB from v$sga_dynamic_components;
COMPONENT MB
---------------------------------------------------------------- ----------
shared pool 3072
large pool 256
java pool 256
streams pool 0
DEFAULT buffer cache 4096
KEEP buffer cache 0
RECYCLE buffer cache 0
DEFAULT 2K buffer cache 0
DEFAULT 4K buffer cache 0
DEFAULT 8K buffer cache 0
DEFAULT 16K buffer cache 0
COMPONENT MB
---------------------------------------------------------------- ----------
DEFAULT 32K buffer cache 0
Shared IO Pool 0
ASM Buffer Cache 0
SQL> select ROUND(value/1024/1024,2) from v$pgastat where name='total PGA allocated';
ROUND(VALUE/1024/1024,2)
------------------------
69.35
3.2 检查是否与shared pool中绑定变量或执行计划有关,或者sql有关
检查执行计划发现问题不大
select sql_id,child_number from v$sql where sql_id ='906p86k62kzkh';
select * from table(dbms_xplan.display_cursor('906p86k62kzkh',&child_number));
select HASH_VALUE,sql_text,executions from v$sqlarea where HASH_VALUE=':&hash';
1773737552
select * from table(dbms_xplan.display_awr('1773737552'));
以上检查排除是应用问题导致。
3.3 部署脚本监控shared pool各个内存组件使用情况
检查Shared Pool Usage使用情况,根据部署脚本,发现以下规律
SELECT SUM(bytes)/1024/1024 as "Total Shared Pool Usage" FROM v$sgastat WHERE pool = 'shared pool' AND name != 'free memory';
Total Shared Pool Usage
逐渐增长:手工管理设置为3G,war报告建议7G(SGA总大小才8G)。
10.15重启之后
"Total Shared Pool Usage"大小
4138.7879 (10.18)
5081.09141 (10.19)
5876.86457 (10.20)
导出宕机前的AWR报告,查看报告最后面内容
2014.10.15 13:00---15:00,可以发现shared_pool的KGLHD达到5.2G,buffer_cache 仅有352MB
查看系统重启后SGA各个组件内存使用情况(即手工管理固定内存大小后,系统重启时间2014.10.15 16:52),2014.10.15 16:52---18:00,KGLHD两小时内逐渐增大,变化率165%,364MB
2014.10.17 13:00--14:00 相相对于上一个awr报告(两天前即15日的),shared pool在增长,buffer cache 在减少
其中KGLHD达到2847MB,已经增长了2847-364=2483MB
2014.10.20 05:00--06:00,可以看到shred pool已经
使用了7G(7040MB),而buffer cache缩小到128MB
由以上3个awr报告中,SGA内存组件大小变化情况,可以判断即使在对数据库内存使用手工管理固定大小,但shared pool依然逐渐
增大至7G,之前分配的是3G ,从15号到20号增长了4G,具体是其中的KGLHD不停在增长(达到5.4G),和系统宕机前类似。
至此,内存手工管理实效,问题再现,alert日志继续报错如下
Errors in file /u01/app/oracle/diag/rdbms/upop/upop/trace/upop_smon_3007.trc:ORA-00604: 递归 SQL 级别 1 出现错误ORA-04031: 无法分配 32 字节的共享内存 ("shared pool","select f.file#, f.block#, f....","SQLA","tmp")Mon Oct 20 08:23:26 2014Errors in file /u01/app/oracle/diag/rdbms/upop/upop/trace/upop_smon_3007.trc:ORA-00604: 递归 SQL 级别 1 出现错误ORA-04031: 无法分配 32 字节的共享内存 ("shared pool","unknown object","KGLH0^f185eace","kglHeapInitialize:temp")Mon Oct 20 08:23:38 2014Errors in file /u01/app/oracle/diag/rdbms/upop/upop/trace/upop_j001_7116.trc:ORA-00604: 递归 SQL 级别 1 出现错误ORA-04031: 无法分配 32 字节的共享内存 ("shared pool","update sys.job$ set this_dat...","KGLH0^7a42409d","kglHeapInitialize:temp")Mon Oct 20 08:23:38 2014Errors in file /u01/app/oracle/diag/rdbms/upop/upop/trace/upop_j000_7114.trc:ORA-12012: 自动执行作业 14 出错ORA-04031: 无法分配 3480 字节的共享内存 ("shared pool","DXS","PLMCD^97454214","BAMIMA: Bam Buffer")ORA-06508: PL/SQL: 无法找到正在调用 : "DBUSRBAS.DXS" 的程序单元ORA-06512: 在 "DBUSRBAS.TASK", line 685
至此陷入山穷水尽
..........
在MOS上搜索ora-04031,文章有好几百篇,一筹莫展
偶然灵机一动,在MOS上使用关键字
KGLHD+ORA04031
搜索,
找到比较相近的bug(1534706.1)
Doc ID 4031.1
ORA-04031: ???? 2928 ??????? ("shared pool","unknown object","sga heap(1,1)","KGLHD")
Shared pool leak of "KGLHD" memory when using multiple subpools
bug
1534706.1 (High allocation under KGLHD in shared pool.)
This problem may be seen if both of the following are true:
(a) High allocation of KGLHD in shared pool which may lead to ORA-4031
(b) Multiple shared pool subpools must be in use.
(eg _kghdsidx_count > 1)
解决;打上相应PSU,解决问题
四 后续观察
使用任务计划观察shared pool内存组件变化情况(尤其KGLHD),发现KGLHD大小稳定,至此解决问题
oracle@yingx ~]$ crontab -l40 2 * * * "/home/oracle/oracleback.sh"#0 */4 * * * sh /home/oracle/share_pool_check.sh >> share_pool_check.log#20 3 * * 3,0 sh /home/oracle/share_pool_clean.sh >> share_pool_clean.log#36 14 * * 3,5 sh /home/oracle/share_pool_clean.sh >> share_pool_clean.log[oracle@yingx ~]$ cat /home/oracle/share_pool_check.sh
[oracle@yingx ~]$ cat /home/oracle/share_pool_check.sh#!/bin/bash#Author:lslcd --#. ./.profile. ./.bash_profile#export ORACLE_SID=orclcheck_os(){(uname -a|grep Linux)&&return 1;(uname -a|grep AIX)&&return 2;(uname -a|grep HP)&&return 3;}check_osos=$?DATE=$(date '+%Y%m%d')DATE1=$(date '+%Y%m%d_%H%M%S')SID=`echo $ORACLE_SID`OUTPUT=sahred_pool_check#OUTPUT=${DATE}_$(hostname)if [ -d ${OUTPUT} ]; thenecho "dir is exties"else mkdir -p ${OUTPUT}ficd ${OUTPUT}touch ${DATE1}_${SID}_share_pool_check.logexport file=${DATE1}_${SID}_share_pool_check.logsqlplus / as sysdba <<EOFset feedback off;set linesize 100;set wrap off;set pagesize 5000;set newpage 0;set echo off;set serveroutput on;set long 40000;set timing on;alter session set nls_date_format='yyyy-mm-dd hh24:mi:ss';spool \$file set line 150COL COMPONENT FORMAT A25COL INITIAL_SIZE FORMAT A10COL FINAL_SIZE FORMAT A10select START_TIME, component, oper_type, oper_mode, initial_size/1024/1024 "INITIAL", FINAL_SIZE/1024/1024 "FINAL", END_TIMEfrom v\$sga_resize_opswhere component = 'shared pool' and status = 'COMPLETE'order by start_time, component;select * from (select POOL, name, BYTES/1024/1024 as MB from v\$sgastat where pool like 'shared%' order by bytes desc) where rownum<=5;select inst_id, bytes/1024/1024 KGLHD from gv\$sgastat where name='KGLHD' and pool='shared pool';select inst_id, bytes/1024/1024 KGLH0 from gv\$sgastat where name='KGLH0' and pool='shared pool';SELECT name, bytes/1024/1024 as MB FROM v\$sgastat WHERE pool = 'shared pool' AND name = 'free memory' ORDER BY bytes DESC;SELECT SUM(bytes)/1024/1024 as "Total Shared Pool Usage" FROM v\$sgastat WHERE pool = 'shared pool' AND name != 'free memory';spool offEOFfree -m >> $filels -l /u01/app/oracle/diag/rdbms/upop/upop/trace |grep trc |wc -l >> $file
10.17 shared_pool 使用情况
SQL> SELECT SUM(bytes)/1024/1024 as "Total Shared Pool Usage" FROM v$sgastat WHERE pool = 'shared pool' AND name != 'free memory';
Total Shared Pool Usage ----------------------- 3265.23795 Elapsed: 00:00:00.01
SQL> select * from (select POOL, name, BYTES/1024/1024 as MB from v$sgastat where pool like 'shared%' order by bytes desc) where rownum<=5;POOL NAME MB ------------ -------------------------- ---------- shared pool KGLH0 1196.00986 shared pool SQLA 732.630402 shared pool KGLHD 305.8918 shared pool free memory 247.295372 shared pool SQLP 182.102776 Elapsed: 00:00:00.01SQL> select inst_id, bytes/1024/1024 KGLHD from gv$sgastat where name='KGLHD' and pool='shared pool'; INST_ID KGLHD ---------- ---------- 1 305.889511 Elapsed: 00:00:00.00SQL> select inst_id, bytes/1024/1024 KGLH0 from gv$sgastat where name='KGLH0' and pool='shared pool'; INST_ID KGLH0 ---------- ---------- 1 1196.00218 Elapsed: 00:00:00.01
SQL> SELECT SUM(bytes)/1024/1024 as "Total Shared Pool Usage" FROM v$sgastat WHERE pool = 'shared pool' AND name != 'free memory';Total Shared Pool Usage ----------------------- 2824.73301 Elapsed: 00:00:00.00
