数据库版本:Oracle 12.1.2 相同版本迁移。
X4 操作系统为RHEL 5.10。 X7 操作系统为RHEL 7.4。 通过拷贝修改X4数据库参数文件到X7上,启动数据库到NOMOUNT阶段正常,创建SPFILE文件指定到ASM磁盘组时,实例异常停止,alert日志如下:
ORACLE_BASE from environment = /u01/app/oracle
Wed Jul 10 15:13:08 2019
WARNING: unknown state for DB spfile location resource, Return Value: 3
The spfile name is ?/dbs/spfile@.ora
Wed Jul 10 15:13:15 2019
DSKM process appears to be hung. Initiating system state dump.
Wed Jul 10 15:13:15 2019
System state dump requested by (instance=1, osid=305061 (GEN0)), summary=[system state dump request (ksz_check_ds)].
System State dumped to trace file /u01/app/oracle/diag/rdbms/gncdb/gncdb1/trace/gncdb1_diag_305067_20190710151315.trc
Wed Jul 10 15:13:17 2019
Decreasing number of real time LMS from 3 to 0
Wed Jul 10 15:13:45 2019
Errors in file /u01/app/oracle/diag/rdbms/gncdb/gncdb1/trace/gncdb1_dskm_305077.trc:
ORA-56867: Cannot connect to Master Diskmon on pipe "default pipe"
ORA-27300: OS system dependent operation:connect failed with status: 2
ORA-27301: OS failure message: No such file or directory
ORA-27302: failure occurred at: skgznpcon6
Wed Jul 10 15:13:45 2019
USER (ospid: 305077): terminating the instance due to error 56867
Wed Jul 10 15:13:45 2019
System state dump requested by (instance=1, osid=305077 (DSKM)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/gncdb/gncdb1/trace/gncdb1_diag_305067_20190710151345.trc
Wed Jul 10 15:13:45 2019
Dumping diagnostic data in directory=[cdmp_20190710151345], requested by (instance=1, osid=305077 (DSKM)), summary=[abnormal instance termination].
Wed Jul 10 15:13:46 2019
Instance terminated by USER, pid = 305077
Wed Jul 10 15:15:43 2019
WARNING: unknown state for DB spfile location resource, Return Value: 3
根据日志信息由于DSKM( This process is active only if Exadata Storage is used. DSKM performs operations related to Exadata I/O fencing and Exadata cell failure handling. ) 进程挂起,触发Oracle system dump, 根据查看DSKM日志数据库由于ORA-56867错误导致DIAG进程crash数据库。
Oracle ora-27300 相关错误多数由于系统相关资源限制导致,检查系统资源及系统messages日志未发现相关错误,由于已经有一套数据库存在也排除由于系统资源引起错误的产生。 在ASM磁盘组可以创建目录,将参数文件拷贝到磁盘组中尝试启动,实例终止,报错信息和最初错误一致。 根据相关错误信息,在MOS上查询相关问题,发现跟29164963类似,该BUG影响版本为 Exadata Storage Server Software 19 ,根据相关收集日志,该套EXADATA版本为19,在问题范围以内,根据MOS相关信息进行调整后 On database servers perform the following steps:
1. Add the following lines to the tmpfiles.d(5) configuration file /usr/lib/tmpfiles.d/tmp.conf :
x /tmp/.oracle*
x /var/tmp/.oracle*
x /usr/tmp/.oracle*
2. Restart systemd-tmpfiles-clean.timer service by running the following command as the root user:
# systemctl restart systemd-tmpfiles-clean.timer
3. If the system has already been affected by one of the errors described in the Symptoms section above, then restart clusterware.
4. Review open Advanced Intrusion Detection Environment (AIDE) alerts.
The change to /usr/lib/tmpfiles.d/tmp.conf must be registered in the AIDE database so critical software alerts are not generated as a result of the change. Before updating the AIDE database, review and resolve open AIDE alerts by running the following DBMCLI command:
DBMCLI> list alerthistory where alertDescription like '.*AIDE.*' and endTime = null;
For details about AIDE see Security Guide for Exadata Database Machine.
5. Update the AIDE database by running the following command as the root user:
# /opt/oracle.SupportTools/exadataAIDE -update
调整以上内容后,重启CRS集群,发现原系统上数据库无法启动,数据库报错信息如下: ORA-00210: cannot open control file
ORA-00202: error in writing''+RECODG/utsdb/controlfile/current.256.732754521''
ORA-17503: ksfdopn: 2 Failed to open file +RECODG/utsdb/controlfile/current.256.732754521
ORA-15001: diskgroup "RECODG" does not exist or is not mounted
ORA-15055: unable to connect to ASM instance
ORA-27140: attach to post/wait facility failed
ORA-27300: OS system dependent operation:invalid_euid failed with status: 1
ORA-27301: OS failure message: Not owner
ORA-27302: failure occurred at: skgpwinit5
ORA-27303: additional information: startup euid = 100 (grid), current euid = 101 (oracle)
根据错误信息该问题多数由于grid的oracle文件权限导致,查看该文件权限为-rwxrwxr-x,调整为正确权限后,重启CRS集群恢复正常。
调整方式为:chmod 6751 oracle
数据库恢复后,继续原来恢复操作,通过PFILE创建SPFILE到ASM磁盘组恢复成功。
参考文档
(EX50) Exadata 19.1 / Oracle Linux 7 systemd-tmpfiles cleanup can cause database startup/connection failure, or clusterware connection failure ( 文档 ID 2498572.1)
Startup Instance Failed with ORA-27140 ORA-27300 ORA-27301 ORA-27302 and ORA-27303 on skgpwinit6 ( 文档 ID 1274030.1)
