1 故障现象 2020新冠突发的一年,使用Oracle Exadata 19C的一体机,服务器是市面上性能最好的,数据库的版本是Oracle最新的版本,但服务器故障是每月触发一次ORA-04031错误,且导致服务器重启,连续发生几次。有幸有时间,且在故障现场,查看数据库最新版本的坑是什么,能不能将DBA给埋了!以下为报错日志:2020-05-27T03:23:18.470014+08:00Errors in file /xxx/+ASM1/trace/+ASM1_ora_31574.trc (incident=113648):
ORA-04031: unable to allocate 56 bytes of shared memory ("shared pool","unknown object","KKSSP^22","kglseshtSegs")Incident details in: /xxx/incdir_113648/+ASM1_ora_31574_i113648.trcUse ADRCI or Support Workbench to package the incident.See Note 411.1 at My Oracle Support for error and packaging details.2 查看数据库生成的trace文件查看TRACE,发现init_heap_kfsg占用内存达90%,故认为有可能是init_heap_kfsg引起的内存不足。==============================================TOP 10 MEMORY USES FOR SGA HEAP----------------------------------------------
"init_heap_kfsg " 9135 MB 90%"free memory " 593 MB 6%"ksunfy_meta 1 " 40 MB 0%"SO private sga " 33 MB 0%"ges big msg buffers " 30 MB 0%"ges resource permanent " 19 MB 0%"ges enqueues " 18 MB 0%"session " 15 MB 0%"write state object " 14 MB 0%"kfclBxIni: kfclbx array " 12 MB 0%TOTALS ---------------------------------------Total free memory 593 MBTotal memory alloc. 9551 MBGrand total 10 GB 查看trace文件中各组件内存使用情况,发现剩余内存还有622M,init_heap_kfsg组建使用的内存达到9551M,但报内存不足 3 查看数据库内存不足发生在哪个范围,发现是在4K以下。如下查询可以说明报ORA-04031错误申请的内存大小在4K以内 SQL> select count(*) from v$shared_pool_reserved where request_failures > 0 and last_failure_size < 4400; COUNT(*)---------- 1 4 服务器重启后,观察init_heap_kfsg组件使用内存的情况,经过比对,发现每天增长300M左右,这就能说明为什么每过30天数据库就会报ORA-04031错误,且引起服务器重启。猜测估计找到ORA-04031的元凶了。

5 查看oracle的官方文档: Bug 30173113 - Multiple ORA-4031 Errors Reported Due To High Memory Allocation In init_heap_kfsg (Doc ID 30173113.8) 和 Bug 31341859 - ORA-4031 ("SHARED POOL","BUSE_PK","PRTMV^B8064B6","ALLOCATE KCTPH[]/CKYPH[]/CKYP (Doc ID 31341859.8) 相关的 此文档说明了这个为Oracle的BUG,且有相关SQL查询语句,内容如下:
Affects:
|
Product ( Component) |
Oracle Server (Rdbms) |
|
Range of versions believed to be affected |
( Not specified) |
|
Versions confirmed as being affected |
· 19.5.0 · 19.4.0 · 19.3.0 · 19.2.0 · 19.1.0 · 18.6.0 · 18.5.0 · 18.4.0 · 18.3.0 · 18.2.0 · 18.1.0 · 12.2.0.1 (Base Release) |
|
Platforms affected |
Generic (all / most platforms affected) |
Fixed:
|
The fix for 30173113 is first included in |
· 20.1.0 · 19.7.0.0.200414 (Apr 2020) Database Release Update (DB RU) · 18.10.0.0.200414 (Apr 2020) Database Release Update (DB RU) · 12.2.0.1.200414 (Apr 2020) Database Release Update (DB RU) · 12.2.0.1.200414 (Apr 2020) Bundle Patch for Windows Platforms |
Interim patches may be available for earlier versions - click here to check.
|
Symptoms: |
Related To: |
|
· Leak (Memory Leak / Growth) · ORA-4031 · Memory type "init_heap_kfsg" |
· RMAN (Recovery Manager) |
Description
ORA-04031 error might be seen after execution of RMAN backups when
init_heap_kfsg structure occupies a lot of memory:
col Tot.Bytes format 9999999999999
select 'sga heap('||KSMCHIDX||',0)' "SubPool",ksmchcom contents, count(*) chunks,sum(ksmchsiz)/(1024*1024*1024) "Total GB",avg(ksmchsiz) "Avg.Bytes"
from sys.x$ksmsp where inst_id = userenv('Instance')
and ksmchcom like 'init_heap_kfsg'
group by 'sga heap('||KSMCHIDX||',0)' ,ksmchcom;
SubPool CONTENTS CHUNKS Total GB Avg.Bytes
--------------- ---------------- ---------- ---------- ----------
sga heap(4,0) init_heap_kfsg 3860098 15.0380789 4183.05811
Rediscovery Information:
After RMAN backup, init_heap_kfsg is having huge amount of memory allocated;
mostly free.
Workaround
NONE
|
Please note: The above is a summary description only. Actual symptoms can vary. Matching to any symptoms here does not confirm that you are encountering this problem. For questions about this bug please consult Oracle Support. |
6 为此系统打最新的 PSU ,经过半年的观察,此系统再无报 ORA-04031 错误,且 init_heap_kfsg组件使用内存几乎没有变化或增加, 故认为此问题根本解决。
总结: 搞 DBA ,是需要高度责任心的,且良好的心理素质,真是干着卖xx的活,拿着卖白菜的工资,处理任何问题,都需要找到根本原因,不能糊弄客户,要不然很快会打脸的。处理任何问题,都需要持续进行跟踪,直到认为此问题根本解决,有时一个问题可能持续跟踪半年。
