ora-01173错误的分享

来源：这里教程网时间：2026-03-03 20:56:48 作者：

ora-01173错误的分享

背景

近期，做了一个数据恢复，由于磁盘阵列做的raid5，3块硬盘亮灯损坏，导致rac宕机，硬件商将存储拿去重新将raid5上线拉回来，起库做恢复遇到一系列的错误。

1、asm报错：ORA-00600: internal error code, arguments: [kfrValAcd30], [DATA], [2], [1843], [3748], [1854], [3748], [], [], [], [], []

2、抽取数据文件报错：amdu和其它工具将文件弄出来的时候。

3、数据库层面恢复报错：部分文件是几个月之前。

**当我解决了前面2层问题后，rac能起来，但查看数据库的文件头，有5个不对，是几个月前的。结合asm的报错，怀疑是最后一块盘的离线。

恢复数据库

1、获取回滚段，起库

ERROR at line 1: ORA-01092: ORACLE instance terminated. Disconnection forced ORA-00604: error occurred at recursive SQL level 1 ORA-01173: data dictionary indicates missing data file from system tablespace Process ID: 149332 Session ID: 2623 Serial number: 1

2、分析

-- oradebug setmypid oradebug event 10046 trace name context forever,level 12; oradebug tracefile_name alter database open resetlogs; #######/u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_14768.trc########## WAIT #140543587171488: nam='db file sequential read' ela= 2 file#=1 block#=128 blocks=1 obj#=0 tim=1728478848746060 --读取了file 1 block 128 ===================== PARSING IN CURSOR #140543583416216 len=142 dep=1 uid=0 oct=3 lid=0 tim=1728478848746193 hv=361892850 ad='15e8f9338' sqlid='7bd391hat42zk' select /*+ rule */ name,file#,block#,status$,user#,undosqn,xactsqn,scnbas,scnwrp,DECODE(inst#,0,NULL,inst#),ts#,spare1 from undo$ where us#=:1 END OF STMT PARSE #140543583416216:c=107,e=108,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=3,plh=0,tim=1728478848746193 BINDS #140543583416216: Bind#0 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0 kxsbbbfp=7fd2da5d6b50 bln=22 avl=02 flg=05 value=1 --获取的us#=1, (0为 SYSTEM，1是第一个回滚段，在3号文件的128开始的位置，ts#=2 指向的是undo表空间的，通过其它库查询,) EXEC #140543583416216:c=187,e=187,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=3,plh=906473769,tim=1728478848746399 WAIT #140543583416216: nam='db file sequential read' ela= 5 file#=1 block#=321 blocks=1 obj#=34 tim=1728478848746414 WAIT #140543583416216: nam='db file sequential read' ela= 2 file#=1 block#=225 blocks=1 obj#=15 tim=1728478848746423 FETCH #140543583416216:c=22,e=23,p=2,cr=2,cu=0,mis=0,r=1,dep=1,og=3,plh=906473769,tim=1728478848746428 STAT #140543583416216 id=1 cnt=1 pid=0 pos=1 obj=15 op='TABLE ACCESS BY INDEX ROWID UNDO$ (cr=2 pr=2 pw=0 time=21 us)' STAT #140543583416216 id=2 cnt=1 pid=1 pos=1 obj=34 op='INDEX UNIQUE SCAN I_UNDO1 (cr=1 pr=1 pw=0 time=14 us)' CLOSE #140543583416216:c=2,e=1,dep=1,type=0,tim=1728478848746441 ===================== PARSING IN CURSOR #140543583416216 len=288 dep=1 uid=0 oct=3 lid=0 tim=1728478848746595 hv=1456614789 ad='15e8f87a8' sqlid='57guu81bd4bc5' select name,online$,contents$,undofile#,undoblock#,blocksize,dflmaxext,dflinit,dflincr,dflextpct,dflminext, dflminlen, owner#,scnwrp,scnbas, NVL(pitrscnwrp, 0), NVL(pitrscnbas, 0), dflogging, bitmapped, inc#, flags, plugged, NVL(spare1,0), NVL(spare2,0), affstrength from ts$ where ts#=:1 --上面这个就是去获取undo表空间的信息（ts#=2的为 UNDOTBS1,但是创建了控制文件后，又没有该undo的文件信息，所以报错） --从上面可以得知，第一步是获取回滚段信息，然后得到使用的那些undo回滚段，再去TS#扫描undo回滚段的信息，进一步获取undo表空间的状态，如果获取不到，就出错 END OF STMT PARSE #140543583416216:c=140,e=139,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1728478848746594 BINDS #140543583416216: Bind#0 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0 kxsbbbfp=7fd2da5d6b50 bln=22 avl=02 flg=05 value=2 EXEC #140543583416216:c=272,e=272,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=2397009925,tim=1728478848746891 WAIT #140543583416216: nam='db file sequential read' ela= 3 file#=1 block#=185 blocks=1 obj#=7 tim=1728478848746917 WAIT #140543583416216: nam='db file sequential read' ela= 1 file#=1 block#=179 blocks=1 obj#=16 tim=1728478848746925 --file#=1 block#=179出现的问题 FETCH #140543583416216:c=32,e=33,p=2,cr=2,cu=0,mis=0,r=1,dep=1,og=4,plh=2397009925,tim=1728478848746931 STAT #140543583416216 id=1 cnt=1 pid=0 pos=1 obj=16 op='TABLE ACCESS CLUSTER TS$ (cr=2 pr=2 pw=0 time=31 us)' STAT #140543583416216 id=2 cnt=1 pid=1 pos=1 obj=7 op='INDEX UNIQUE SCAN I_TS# (cr=1 pr=1 pw=0 time=23 us)' CLOSE #140543583416216:c=10,e=10,dep=1,type=0,tim=1728478848746945 ORA-01173: data dictionary indicates missing data file from system tablespace ORA-01173: data dictionary indicates missing data file from system tablespace *** 2024-10-09 21:00:48.747 USER (ospid: 14768): terminating the instance due to error 1173

*._corrupted_rollback_segments=true *._offline_rollback_segments=true *._offline_rollback_segments=(_SYSSMU1_1880814008$,_SYSSMU2_237578013$,_SYSSMU3_3036648903$,_SYSSMU4_2554663710$,_SYSSMU5_2656872754$,_SYSSMU6_2560781624$,_SYSSMU7_362778077$,_SYSSMU8_3123962168$,_SYSSMU9_902607226$,_SYSSMU10_600653064$,_SYSSMU11_1908542605$,_SYSSMU12_681659328$,_SYSSMU13_2355401967$,_SYSSMU14_1898001506$,_SYSSMU15_2247781413$,_SYSSMU16_589156960$,_SYSSMU17_2366401286$,_SYSSMU18_1287317225$,_SYSSMU19_3322902554$,_SYSSMU20_2714089088$) *._corrupted_rollback_segments=(_SYSSMU1_1880814008$,_SYSSMU2_237578013$,_SYSSMU3_3036648903$,_SYSSMU4_2554663710$,_SYSSMU5_2656872754$,_SYSSMU6_2560781624$,_SYSSMU7_362778077$,_SYSSMU8_3123962168$,_SYSSMU9_902607226$,_SYSSMU10_600653064$,_SYSSMU11_1908542605$,_SYSSMU12_681659328$,_SYSSMU13_2355401967$,_SYSSMU14_1898001506$,_SYSSMU15_2247781413$,_SYSSMU16_589156960$,_SYSSMU17_2366401286$,_SYSSMU18_1287317225$,_SYSSMU19_3322902554$,_SYSSMU20_2714089088$)

再次执行恢复动作 oradebug setmypid oradebug event 10046 trace name context forever,level 12; oradebug tracefile_name alter database open resetlogs; /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_15564.trc SQL> alter database open resetlogs; Database altered. --数据库已经打开 --重要的记录下： 1）更新undo段status=5 select name,online$,contents$,undofile#,undoblock#,blocksize,dflmaxext,dflinit,dflincr,dflextpct,dflminext, dflminlen, owner#,scnwrp,scnbas, NVL(pitrscnwrp, 0), NVL(pitrscnbas, 0), dflogging, bitmapped, inc#, flags, plugged, NVL(spare1,0), NVL(spare2,0), affstrength from ts$ where ts#=:1 2）ts$读取了ts#=2 - 30 ,打开数据库看了，就是排除了SYSTEM和SYSAUX开始读取的 select blocks,NVL(ts#,-1),status$,NVL(relfile#,0),maxextend,inc, crscnwrp,crscnbas,NVL(spare1,0) from file$ where file#=:1 3）file$ 读取了file#=1 -28 ，读取了所有file$中的行

3、再次模拟

上面第一次open resetlogs看到了读取到file 1 block 179就报错了，事实上就是读取的聚簇表C_TS#中的TS$，该表涉及块为177-182 select owner,segment_name,segment_type,partition_name,tablespace_name from dba_extents where relative_fno= 1 and 179 between block_id and (block_id+blocks-1); --C_TS# select * from bootstrap$; select x.*, ltrim(dbms_utility.make_data_block_address(x.file#,x.block#))"10进制dba" ,'0x'||ltrim(to_char(dbms_utility.make_data_block_address(x.file#,x.block#),'xxxxxxxx'))"16进制dba"， 'alter system dump datafile '||x.file# ||' block '||x.block#||';' text from ( select rowid, dbms_rowid.rowid_object(rowid) obj#, dbms_rowid.rowid_relative_fno(rowid) file#, dbms_rowid.rowid_block_number(rowid) block#, dbms_rowid.rowid_row_number(rowid) row# from ts$) x; --模拟坏块看是否能够正常开库聚簇表C_TS#中的TS$，该表涉及块为177-182 BBED> corrupt Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y Block marked media corrupt. BBED> sum apply Check value for File 1, Block 179: current = 0x2bf7, required = 0x2bf7 BBED> v DBVERIFY - Verification starting FILE = /u01/app/oracle/oradata/orcl/SYSTEM01.DBF BLOCK = 179 Block Checking: DBA = 4194483, Block Type = KTB-managed data block Found block already marked corrupted DBVERIFY - Verification complete Total Blocks Examined : 1 Total Blocks Processed (Data) : 1 Total Blocks Failing (Data) : 0 Total Blocks Processed (Index): 0 Total Blocks Failing (Index): 0 Total Blocks Empty : 0 Total Blocks Marked Corrupt : 0 Total Blocks Influx : 0 Message 531 not found; product=RDBMS; facility=BBED SQL> alter database open resetlogs; alter database open resetlogs * ERROR at line 1: ORA-01092: ORACLE instance terminated. Disconnection forced ORA-00604: error occurred at recursive SQL level 1 ORA-01578: ORACLE data block corrupted (file # 1, block # 179) --报错太明显了 ORA-01110: data file 1: '/u01/app/oracle/oradata/orcl/SYSTEM01.DBF' Process ID: 19780 Session ID: 125 Serial number: 3 --再次模拟坏块看是否能够正常开库聚簇表C_TS#中的TS$，该表涉及块为179 BBED> d /v offset 98 count 20 File: /u01/app/oracle/oradata/orcl/SYSTEM01.DBF (1) Block: 179 Offsets: 98 to 117 Dba:0x004000b3 ------------------------------------------------------- 1e00d11e 0f1f0f1f 00000100 01000000 l ................ 01000100 l .... <16 bytes per line> BBED> p kdbh struct kdbh, 14 bytes @92 ub1 kdbhflag @92 0x00 (NONE) sb1 kdbhntab @93 3 sb2 kdbhnrow @94 2 sb2 kdbhfrre @96 -1 sb2 kdbhfsbo @98 30 sb2 kdbhfseo @100 7889 sb2 kdbhavsp @102 7951 sb2 kdbhtosp @104 7951 BBED> m /x 1c offset 98 Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y File: /u01/app/oracle/oradata/orcl/SYSTEM01.DBF (1) Block: 179 Offsets: 98 to 117 Dba:0x004000b3 ------------------------------------------------------------------------ 1c00d11e 0f1f0f1f 00000100 01000000 01000100 <32 bytes per line> BBED> sum apply Check value for File 1, Block 179: current = 0x2bfb, required = 0x2bfb BBED> v DBVERIFY - Verification starting FILE = /u01/app/oracle/oradata/orcl/SYSTEM01.DBF BLOCK = 179 Block Checking: DBA = 4194483, Block Type = KTB-managed data block data header at 0x7f1e6eeaa25c kdbchk: fsbo(28) wrong, (hsz 30) Block 179 failed with check code 6129 DBVERIFY - Verification complete Total Blocks Examined : 1 Total Blocks Processed (Data) : 1 Total Blocks Failing (Data) : 1 Total Blocks Processed (Index): 0 Total Blocks Failing (Index): 0 Total Blocks Empty : 0 Total Blocks Marked Corrupt : 0 Total Blocks Influx : 0 Message 531 not found; product=RDBMS; facility=BBED BBED> p kdbh struct kdbh, 14 bytes @92 ub1 kdbhflag @92 0x00 (NONE) sb1 kdbhntab @93 3 sb2 kdbhnrow @94 2 sb2 kdbhfrre @96 -1 sb2 kdbhfsbo @98 28 --这里改为了28 oradebug setmypid Statement processed. oradebug event 10046 trace name context forever,level 12; Statement processed. oradebug tracefile_name /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_19860.trc SQL> alter database open resetlogs; Database altered. --再次修改了kdbh部分，还是能开库；