环境介绍:
Oracle 版本:11.2.0.4.161018 ,操作系统:AIX 7100-04-02-1614 ,系统架构:单实例。
故障现象
数据库报错无法正常 fork 进程,后台报错。
问题分析
1 、alert 日志报错,无法fork 进程,直接报错。 根据ORA-27300: OS system dependent operation:fork failed with status: 12 ,匹配到bug ,内存不足,导致进程无法fork 。
*** 2021-01-01 01:56:33.588
*** SESSION ID:(28.1) 2021-01-01 01:56:33.588
*** CLIENT ID:() 2021-01-01 01:56:33.588 *** SERVICE NAME:(SYS$BACKGROUND) 2021-01-01 01:56:33.588
*** MODULE NAME:() 2021-01-01 01:56:33.588
*** ACTION NAME:() 2021-01-01 01:56:33.588
Process startup failed, error stack:
ORA-27300: OS system dependent operation:fork failed with status: 12
ORA-27301: OS failure message: Not enough space
ORA-27302: failure occurred at: skgpspawn3
*** 2021-01-01 01:56:41.760
Process startup failed, error stack:
ORA-27300: OS system dependent operation:fork failed with status: 12
ORA-27301: OS failure message: Not enough space
ORA-27302: failure occurred at: skgpspawn3
*** 2021-01-01 02:00:31.593
Process startup failed, error stack:
ORA-27300: OS system dependent operation:fork failed with status: 12
ORA-27301: OS failure message: Not enough space
ORA-27302: failure occurred at: skgpspawn3
*** 2021-01-01 02:00:37.594
Process startup failed, error stack:
ORA-27300: OS system dependent operation:fork failed with status: 12
ORA-27301: OS failure message: Not enough space
ORA-27302: failure occurred at: skgpspawn3 2 、OS 同样有交换空间溢出的报错,具体如下: SYSTEM RUNNING OUT OF PAGING SPACE
Detail Data
PROGRAM
oracle
USER'S PROCESS ID:
54593176
PROGRAM'S PAGING SPACE USE IN 1KB BLOCKS
196
---------------------------------------------------------------------------
LABEL: PGSP_KILL
IDENTIFIER: C5C09FFA
Date/Time: Fri Jan 1 02:02:50 CST 2021
Sequence Number: 5100
Machine Id: 00CE70D74C00
Node Id: GXDB01
Class: S
Type: PERM
WPAR: Global
Resource Name: SYSVMM
Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED
Probable Causes
SYSTEM RUNNING OUT OF PAGING SPACE
Failure Causes
INSUFFICIENT PAGING SPACE DEFINED FOR THE SYSTEM
PROGRAM USING EXCESSIVE AMOUNT OF PAGING SPACE
Recommended Actions
DEFINE ADDITIONAL PAGING SPACE
REDUCE PAGING SPACE REQUIREMENTS OF PROGRAM(S)3 、检查OS 内存的剩余情况,发现从01.35 分,内存剩余维持在比较低的值,虚拟内存明显降低,虚拟内存已经不够用了。 4 、检查业务负载00:50 磁盘读写压力明显增加,00:50 网络IO 的读写开始明显增加,00:50 文件系统Paging 明显增加,但是到01:25 才出现明显的内存Paging.
问题综述
明显看出从00:00 之后RUNQUEUE 明显增加, 系统负载增加,导致内存使用增多,内存不够用触发Oracle bug, 导致ORA-27300 错误,无法正常fork 出进程。
后续处理方案
1 、排查当天的业务增长情况,看是否有明显的业务增长。2 、内存扩容,预留足够的内存给OS, 内存分配(PGA+SGA 60%~70% 左右)
