核心系统数据库坏块

来源：这里教程网时间：2026-03-03 18:50:27 作者：

环境介绍：

Oracle 版本：11.2.0.4.191015 ，操作系统：Redhat linux server 7.5 ，系统架构：两节点RAC 。

故障现象

业务层反馈数据库无法正常写入，无法切换归档日志。

问题分析

1 、Mon Oct 12 12:26:02 开始多次报错，内存坏块，持续到13 号

Mon Oct 12 12:26:02 2020

Errors in file /opt/oracle/diag/rdbms/cwgkuap/cwgkuap2/trace/cwgkuap2_ora_183325.trc (incident=314289):

ORA-00600: 内部错误代码, 参数: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []

Incident details in: /opt/oracle/diag/rdbms/cwgkuap/cwgkuap2/incident/incdir_314289/cwgkuap2_ora_183325_i314289.trc

Non critical error ORA-48913 caught while writing to trace file "/opt/oracle/diag/rdbms/cwgkuap/cwgkuap2/incident/incdir_314289/cwgkuap2_ora_183325_i314289.trc"

Error message: ORA-48913: 写入跟踪文件失败, 达到了文件大小限制 [5000000]

Writing to the above trace file is disabled for now on...

Mon Oct 12 12:26:08 2020

Tue Oct 13 08:53:38 2020

Errors in file /opt/oracle/diag/rdbms/cwgkuap/cwgkuap2/trace/cwgkuap2_ora_162939.trc (incident=314046):

ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details. 2 、Thu Oct 29 10:11:06 报错， (34, 632163) 块corruption ，触发SQL 是一个更新操作，后续一直报错，并伴随blockrecover 。

Thu Oct 29 10:11:06 2020

Errors in file /opt/oracle/diag/rdbms/cwgkuap/cwgkuap2/trace/cwgkuap2_ora_368867.trc (incident=313977):

ORA-00600: 内部错误代码, 参数: [17182], [0x7F780EE5EF08], [], [], [], [], [], [], [], [], [], []

Incident details in: /opt/oracle/diag/rdbms/cwgkuap/cwgkuap2/incident/incdir_313977/cwgkuap2_ora_368867_i313977.trc

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Non critical error ORA-48913 caught while writing to trace file "/opt/oracle/diag/rdbms/cwgkuap/cwgkuap2/incident/incdir_313977/cwgkuap2_ora_368867_i313977.trc"

Error message: ORA-48913: 写入跟踪文件失败, 达到了文件大小限制 [5000000]

Writing to the above trace file is disabled for now on...

Block after image is corrupt:

buffer tsn: 6 rdba: 0x002cffff (0/2949119)

scn: 0x340d.8f0e2302 seq: 0xc3 flg: 0x26 tail: 0xfc760601

frmt: 0x07 chkval: 0x4243 type: 0xff=unknown

Hex dump of corrupt header 4 = CORRUPT

Opcodes *.*

DBAs (file#, block#):

(34, 632163) .

SCNs: scn: 0x0000.00000000 thru scn: 0xffff.ffffffff

**NOTE: Only Dumping Redo less then 12 hours**

Times: 10/28/2020 22:11:12 thru eternity

Initial buffer sizes: read 1024K, overflow 832K, change 805K

UPDATE TRDATADETAIL4304 SET FN_95 = null , FN_96 = null , FN_97 = null , FN_98 = null , FN_99 = null , FN_100 = null , FN_101 = null , FN_102 = null , FN_103 = null , FN_104 = null , FN_105 = null , FN_106 = null , FN_107 = null , FN_108 = null , FN_109 = null , FN_110 = null , FN_111 = null , FN_112 = null , FN_113 = null , FN_114 = null , FN_115 = null , FN_116 = null , FN_117 = null , FN_118 = null , FN_119 = null , FN_120 = null , FN_121 = null , FN_122 = null , FN_123 = null , FN_124 = null , FN_125 =

2 、 Thu Oct 29 11:41:20 无法进行检查点，所有的redo 都为active ，redo 无法切换, 直到Thu Oct 29 15:54:14 重启之后才能正常归档，业务正常进行。

Thu Oct 29 11:41:20 2020

Thread 2 cannot allocate new log, sequence 9655

Checkpoint not complete 3 、运行一段时间后，业务第二次反馈问题，检查发现无法进行检查点，所有的redo 都为active ，redo 无法切换。

Thu Oct 29 17:13:58 2020

Thread 2 cannot allocate new log, sequence 9678

Checkpoint not complete 4 、Thu Oct 29 19:14:21 节点2 shutdown abort ，开始人工处理，检查出坏块，并修复。

Some indexes or index [sub]partitions of table FMIS0400.TRDATADETAIL5733_1 have been marked unusable

Some indexes or index [sub]partitions of table FMIS0400.TRDATADETAIL4304 have been marked unusable

Some indexes or index [sub]partitions of table FMIS0400.TRDATADETAIL3815_2 have been marked unusable

5 、Thu Oct 29 20:22:06 处理完成，启动二节点，后续无报错。

问题综述

10-12 号有多次内存坏块报错，并没有影响到业务。29 号内存坏块，导致无法正常写入数据文件，无法进行检查点，redo 无法循环复写，数据库最终夯死, 重启后恢复，由于逻辑坏块没有修复，当天第二次触发，再次导致数据库hang ，修复完成后，后台无报错，恢复正常。

后续处理方案

1、使用dbv工具检查全库，看是否还有坏块，排除风险。

2、继续观察几天，如果有报错及时处理，收集诊断信息（见参考）。

3、建议应用最新的补丁，规避潜在风险的BUG

4、定期备份，并验证备份的有效性

5 、定期巡检，及时发现问题。

参考

1 、校验数据库坏块

--rman 校验坏块

# 数据库层面

VALIDATE DATABASE;

backup validate check logical database ;

# 数据文件层面

VALIDATE DATAFILE #file_id;

# 数据块层面

VALIDATE DATAFILE 1 BLOCK 10;

--dbv 校验坏块

# 校验数据文件

dbv userid={system/<password>} file={full path filename} logfile={output filename}

# 校验具体的表段和分区段等

dbv userid=system/manager SEGMENT_ID=2.5.37767

select t.ts#, s.header_file, s.header_block

from v$tablespace t, dba_segments s

where s.segment_name='TAB1'

and t.name = s.tablespace_name;

2 、查询存在的坏块

-- 抽取损坏的数据文件段区块信息（快速查询）

create table corrupt_extents as

select *

from dba_extents

where file_id in (select file# from v$database_block_corruption);

create table corrupt_free_space

select *

from dba_free_space

where file_id in (select file# from v$database_block_corruption);

-- 查找出存在坏块的对象信息和块信息

set lines 200 pages 10000

col segment_name format a30

SELECT e.owner, e.segment_type, e.segment_name, e.partition_name, c.file#

, greatest(e.block_id, c.block#) corr_start_block#

, least(e.block_id+e.blocks-1, c.block#+c.blocks-1) corr_end_block#

, least(e.block_id+e.blocks-1, c.block#+c.blocks-1)

- greatest(e.block_id, c.block#) + 1 blocks_corrupted

, null description

FROM corrupt_extents e, v$database_block_corruption c

WHERE e.file_id = c.file#

AND e.block_id <= c.block# + c.blocks - 1

AND e.block_id + e.blocks - 1 >= c.block#

UNION

SELECT s.owner, s.segment_type, s.segment_name, s.partition_name, c.file#

, header_block corr_start_block#

, header_block corr_end_block#

, 1 blocks_corrupted

, 'Segment Header' description

FROM dba_segments s, v$database_block_corruption c

WHERE s.header_file = c.file#

AND s.header_block between c.block# and c.block# + c.blocks - 1

UNION

SELECT null owner, null segment_type, null segment_name, null partition_name, c.file#

, greatest(f.block_id, c.block#) corr_start_block#

, least(f.block_id+f.blocks-1, c.block#+c.blocks-1) corr_end_block#

, least(f.block_id+f.blocks-1, c.block#+c.blocks-1)

- greatest(f.block_id, c.block#) + 1 blocks_corrupted

, 'Free Block' description

FROM corrupt_free_space f, v$database_block_corruption c

WHERE f.file_id = c.file#

AND f.block_id <= c.block# + c.blocks - 1

AND f.block_id + f.blocks - 1 >= c.block#

ORDER BY file#, corr_start_block#;

3 、校验相关的坏块和段信息

-- 临时段再次校验

-- 表段分区段校验

#dbv 工具校验具体

select t.ts#, s.header_file, s.header_block

from v$tablespace t, dba_segments s

where s.segment_name='TAB1'

and t.name = s.tablespace_name;

dbv userid=system/manager SEGMENT_ID=2.5.37767

# 针对表段或者分区段的单块进行检查

select segment_name, segment_type, owner

from corrupt_extents

where file_id = <Absolute file number>

and <corrupted block number> between block_id

and block_id + blocks -1;

# 通过语句查询统计具体的表段或者分区段

Select count(*) from biao;

10235 事件跟踪SGA 信息 *** WARNING *********************************************************** *** This event should only EVER be set at the request of Oracle Support. *** It can impact performance on most types of system. *** Level 2 and above can impact latch contention. *** Level 3 and above can have a *SEVERE* impact on performance. *** Level 65536 is special and has minimal performance impact *** but has a different purpose to other levels.

ALTER SYSTEM SET EVENTS '10235 trace name context forever, level 65536';

ALTER SYSTEM SET EVENTS '10235 trace name context off';

参考文档：

Causes and Solutions for ora-600 [kdsgrp1] (Doc ID 1332252.1)

Troubleshooting Assistant: Oracle Database Error ORA-600 [kdsgrp1] (Interactive Guide to Resolving Your Error) (Doc ID 1569722.2)