Oracle X7一体机存储节点重启导致u01使用率不断增大

来源:这里教程网 时间:2026-03-03 18:28:39 作者:

背景:存储节点发生存储,数据文件状态不支持,产生trace过多导致u01使用率不断增高 基础环境:Oracle X7一体机 数据库版本:12.2.0.1.180717 以下是分析和处理过程: ## 1、数据库日志 ... ORA-00376: file 7 cannot be read at this time ORA-01110: data file 7: '+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405' 2023-02-15T16:00:58.833384+08:00 Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl18_107783.trc: ORA-00376: file 7 cannot be read at this time ORA-01110: data file 7: '+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405' 2023-02-15T16:00:58.833398+08:00 Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl11_267126.trc: ORA-00376: file 7 cannot be read at this time ORA-01110: data file 7: '+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405' 2023-02-15T16:00:58.833460+08:00 Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl07_6783.trc: ORA-00376: file 7 cannot be read at this time ORA-01110: data file 7: '+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405' 2023-02-15T16:00:58.833478+08:00 Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl17_163546.trc: ORA-00376: file 7 cannot be read at this time ORA-01110: data file 7: '+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405' 2023-02-15T16:00:58.833498+08:00 Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl32_372326.trc: ORA-00376: file 7 cannot be read at this time ORA-01110: data file 7: '+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405' 2023-02-15T16:00:58.833511+08:00 Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl00_372255.trc: ORA-00376: file 7 cannot be read at this time ORA-01110: data file 7: '+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405' 2023-02-15T16:00:58.833515+08:00 Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl24_54355.trc: ORA-00376: file 7 cannot be read at this time ... 数据库日志忠报错为ORA-00376,ORA-0110,从日志看读取undo表空间的'+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405'数据文件没有成功,产生多个trace。 ##  2、trace日志 ```skedcdb2_cl00_372255.trc *** 2023-02-15T15:25:46.303098+08:00 (CDB$ROOT(1)) kssxdl signal 376: 0xbcd85410 = transaction (78) @ false_loc 0   ----------------------------------------   SO: 0xbcd85410, type: 78, owner: 0x79849860, flag: INIT/-/-/-/0x00 if: 0x3 c: 0x3    proc=0xe81373f0, name=transaction, file=ktccts.h LINE:451, pg=0, conuid=1    SGA version=(1,0) ORA-00376: file 7 cannot be read at this time ORA-01110: data file 7: '+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405' kssxdl success: so 0x3fb9b5a00 = session PMON waiting for 1000 csecs trace日志的内容和alert一致,读书数据文件失败,PMON进程有等待。 ## 3、ASM日志 2023-02-06T01:05:00.756765+08:00 Exadata cell: o/..... is no longer accessible. I/O errors to disks on this might get suppressed 2023-02-06T01:05:00.868575+08:00 Some DDE async actions failed or were cancelled NOTE: failed resync of disk group 1 disks DATAC1_CD_02_CRMPCELADM03 (4) DATAC1_CD_00_CRMPCELADM03 (8) DATAC1_CD_03_CRMPCELADM03 (10) DATAC1_CD_04_CRMPCELADM03 (16) DATAC1_CD_09_CRMPCELADM03 (22) DATAC1_CD_10_CRMPCELADM03 (33) DATAC1_CD_05_CRMPCELADM03 (39) DATAC1_CD_11_CRMPCELADM03 (40) ORA-15032: not all alterations performed ORA-15080: synchronous I/O operation failed to write block 0 of disk 22 in disk group DATAC1 ORA-15080: synchronous I/O operation failed to write block 0 of disk 4 in disk group DATAC1 ORA-15080: synchronous I/O operation failed to write block 0 of disk 33 in disk group DATAC1 ORA-15080: synchronous I/O operation failed to write block 0 of disk 39 in disk group DATAC1 2023-02-06T01:05:01.106577+08:00 ERROR: /* Exadata Auto Mgmt: ONLINE ASM Disk */ alter diskgroup DATAC1 online disk DATAC1_CD_04_CRMPCELADM03   , DATAC1_CD_02_CRMPCELADM03   , DATAC1_CD_05_CRMPCELADM03   , DATAC1_CD_03_CRMPCELADM03   , DATAC1_CD_00_CRMPCELADM03   , DATAC1_CD_10_CRMPCELADM03   , DATAC1_CD_11_CRMPCELADM03   , DATAC1_CD_09_CRMPCELADM03 nowait WARNING: Exadata Auto Management: OS PID: 331356 Operation ID: 911602: ONLINE disk RECOC1_CD_04_CRMPCELADM03 in diskgroup RECOC1 Failed ASM日志中有I/O错误,自动添加磁盘没有成功。 ORA-15080 Synchronous I/O Operation Failed With Exadata error:'HARD check failed' (Doc ID 2412871.1) 对于该错误按官方建议可以打补丁,也可以通过以下方式 Workaround:- 关闭ASM实例的磁盘检测 ```sql SQL>alter diskgroup RECOC1 set attribute 'hard_check.enabled' = 'FALSE'; ALTER DISKGROUP RECOC1 REBALANCE POWER 32; 完成rebalance后再开启磁盘检测 SQL>alter diskgroup RECOC1 set attribute 'hard_check.enabled' = 'TRUE'; ``` ## 4、处理过程 (1)对数据库进行rman全备。 (2)检测数据文件状态 ```sql SQL> select file#,name,status,enabled from v$datafile where status='RECOVER';                                                       FILE#                                  NAME               STATUS     ENABLED ------             --------------------------------     ---------   --------- 3  +DATAC1/SKEDCDB/DATAFILE/sysaux.1002.1066835393       RECOVER   READ WRITE 7  +DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405      RECOVER   READ WRITE 10 +DATAC1/SKEDCDB/PDB/DATAFILE/sysaux.975.1066838165    RECOVER   READ WRITE ``` 有3个数据文件状态为RECOVER状态,其中CDB中2个sysaux和udnotbs2表空间各1个,pdb中sysaux表空间1个数据文件。 无法查询表空间使用率 ```sql SQL> select a.tablespace_name,         round(a.s,2) "CURRENT_TOTAL(MB)",         round(a.s - f.s,2) "USED(MB)",         f.s "FREE(MB)",         round((a.s-f.s)/ a.s * 100, 2) "USED%",         g.autoextensible,         round(a.ms,2) "MAX_TOTAL(MB)",         round((a.s-f.s)/ a.ms * 100, 2) "MAX USED%"    from (select d.tablespace_name,                 sum(bytes / 1024 / 1024) s,                 sum(decode(maxbytes, 0, bytes, maxbytes) / 1024 / 1024) ms            from dba_data_files d           group by d.tablespace_name) a,         (select f.tablespace_name, sum(f.bytes / 1024 / 1024) s            from dba_free_space f           group by f.tablespace_name) f,         (select distinct tablespace_name, autoextensible            from DBA_DATA_FILES           where autoextensible = 'YES'          union          select distinct tablespace_name, autoextensible            from DBA_DATA_FILES           where autoextensible = 'NO'             and tablespace_name not in                 (select distinct tablespace_name                    from DBA_DATA_FILES                   where autoextensible = 'YES')) g     where a.tablespace_name = f.tablespace_name     and g.tablespace_name = f.tablespace_name order by "MAX USED%";          from dba_data_files d               * ``` ERROR at line 12: ORA-00376: file 3 cannot be read at this time ORA-01110: data file 3: '+DATAC1/SKEDCDB/DATAFILE/sysaux.1002.1066835393' (3)重建undo表空间 ```sql  SQL> create undo tablespace UNDOTBS3 datafile '+DATAC1/SKEDCDB/DATAFILE/undotbs3_01.dbf' size 5G autoextend on; ``` (4)修复数据文件 ```sql rman target / Recovery Manager: Release 12.2.0.1.0 - Production on Mon Feb 20 22:22:50 2023 Copyright (c) 1982, 2017, Oracle and/or its affiliates.  All rights reserved. connected to target database: PLANNING (DBID=2741129465) RMAN>  RMAN> recover datafile 3; Starting recover at 20-FEB-23 using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=2185 instance=SKEDCDB1 device type=DISK starting media recovery archived log for thread 1 with sequence 14340 is already on disk as file +DATAC1/SKEDCDB/ARCHIVELOG/2023_02_20/thread_1_seq_14340.794.1129321103 ... ``` 通过rman修复3个数据文件,然后将数据文件onlin 在CDB中进行3号数据文件online ```sql SQL> alter database datafile 3 online; SQL>  alter database datafile 10 online;  alter database datafile 10 online * ERROR at line 1: ORA-01516: nonexistent log file, data file, or temporary file "10" in the current container PDB的数据文件需要在pdb中进行online ``` (5)检查数据文件状态 ```sql SQL> select file#,name,status,enabled from v$datafile where status='RECOVER';                                                       FILE#                                  NAME               STATUS     ENABLED ------             --------------------------------     ---------   --------- 3  +DATAC1/SKEDCDB/DATAFILE/sysaux.1002.1066835393       ONLINE   READ WRITE 7  +DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405      ONLINE   READ WRITE 10 +DATAC1/SKEDCDB/PDB/DATAFILE/sysaux.975.1066838165    ONLINE   READ WRITE ``` 数据文件状态正常 (6)删除undotbs2表空间 ```sql SQL> drop tablespace undotbs2 including contents and datafiles; drop tablespace undotbs2 including contents and datafiles * ERROR at line 1: ORA-01548: active rollback segment '_SYSSMU11_759994052$' found, terminate dropping tablespace SQL> Select segment_name, status, tablespace_name   2    from dba_rollback_segs   3   where status not in ('ONLINE', 'OFFLINE'); SEGMENT_NAME                   STATUS           TABLESPACE_NAME ------------------------------ ---------------- ------------------------------ _SYSSMU11_759994052$           NEEDS RECOVERY   UNDOTBS2 _SYSSMU12_3581412579$          NEEDS RECOVERY   UNDOTBS2 _SYSSMU13_398964664$           NEEDS RECOVERY   UNDOTBS2 _SYSSMU14_2753826883$          NEEDS RECOVERY   UNDOTBS2 _SYSSMU15_1616211766$          NEEDS RECOVERY   UNDOTBS2 _SYSSMU16_2927909258$          NEEDS RECOVERY   UNDOTBS2 _SYSSMU17_4161369910$          NEEDS RECOVERY   UNDOTBS2 _SYSSMU18_2116035987$          NEEDS RECOVERY   UNDOTBS2 _SYSSMU19_2637726138$          NEEDS RECOVERY   UNDOTBS2 _SYSSMU20_3386602257$          NEEDS RECOVERY   UNDOTBS2 10 rows selected. ``` 参数文件加入以下内容 _CORRUPTED_ROLLBACK_SEGMENTS=(_SYSSMU11_759994052$,_SYSSMU12_3581412579$,_SYSSMU13_398964664$,_SYSSMU14_2753826883$,_SYSSMU15_1616211766$,_SYSSMU16_2927909258$,_SYSSMU17_4161369910$,_SYSSMU18_2116035987$,_SYSSMU19_2637726138$,_SYSSMU20_3386602257$) 使用pfile重新启动数据库 ```sql SQL> startup pfile='/home/oracle/pfile_skedcdb_0220.ora'; ORA-32006: SEC_CASE_SENSITIVE_LOGON initialization parameter has been deprecated ORACLE instance started. Total System Global Area 2.6844E+10 bytes Fixed Size                 26502656 bytes Variable Size            1.2176E+10 bytes Database Buffers         1.4563E+10 bytes Redo Buffers               77959168 bytes Database mounted. Database opened. SQL>  SQL>  SQL> show pdbs         CON_ID CON_NAME                       OPEN MODE  RESTRICTED ---------- ------------------------------ ---------- ----------          2 PDB$SEED                       READ ONLY  NO          3 SKEDCDB                       MOUNTED 删除undotbs2表空间 SQL> drop tablespace undotbs2 including contents and datafiles; Tablespace dropped. SQL>  SQL>  重启数据库 SQL>  shutdown immediate Database closed. Database dismounted. ORACLE instance shut down. SQL>  SQL>  SQL>  SQL>  SQL>  SQL>  SQL> startup  ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance ORA-29707: inconsistent value 0 for initialization parameter _lm_use_tx_tsn with other instances undo参数2个节点不一致导致该报错 SQL>  SQL>  SQL> create pfile='/tmp/pfile.ora' from spfile; create pfile='/tmp/pfile.ora' from spfile * ERROR at line 1: ORA-01565: error in identifying file '?=/dbs/spfile@.ora' ORA-27037: unable to obtain file status Linux-x86_64 Error: 2: No such file or directory Additional information: 7 SQL> startup pfile='/home/oracle/pfileskedcdb_0220.ora'; ORA-32006: SEC_CASE_SENSITIVE_LOGON initialization parameter has been deprecated ORACLE instance started. Total System Global Area 2.6844E+10 bytes Fixed Size                 26502656 bytes Variable Size            1.2176E+10 bytes Database Buffers         1.4563E+10 bytes Redo Buffers               77959168 bytes Database mounted. Database opened. SQL>  【小结】数据文件状态修复后,数据库alert日志不在产生读取数据文件失败相关的trace,u01目录使用率正常;建议生产环境数据库日志和数据文件状态等关键指标应该加强监控和巡检,发现问题及时处理。 -the end-

相关推荐