在之前的章节中,讨论过了通过 分区+并行等方式来进行超大的表的切分,通过这种方式能够极大的提高数据的平均分布,但是不是最完美的。 比如在数据量再提高几个层次,我们假设这个表目前有1T的大小。有10个分区,最大的分区有400G,那么如果我们想尽可能的平均的导出数据,使用并行就不一定能够那么奏效了。 比方说我们要求每个dump文件控制在200M总有,那样的话400G的分区就需要800个并行才能完成,在实际的数据库维护中,我们知道默认的并行数只有64个,提高几倍,也不可能超过800 所以在数据量极大的情况下,如果资源紧张,可能生成的dump就会比较大。 我们考虑使用rowid来满足我们的需求。 我们可以根据需要来指定需要生成几个dump文件。比如表subscriber有600M,那么如果按照200M为一个单位,我们需要生成3个dump文件。 如果想数据足够平均,就需要在rowid上做点功夫。 我们先设定一个参数文件,如下的格式。 可以看到表memo数据量极大,按照200M一个单位,最大的分区(P9_A3000_E5)需要800个并行。 表ICE_AGREEMENT比较小,不是分区表,我们以x来临时作为分区表的代名,在处理的时候可以方便的甄别 MEMO P9_A3000_E0 156 MEMO P9_A3000_E1 170 MEMO P9_A3000_E2 190 MEMO P9_A3000_E3 200 MEMO P9_A3000_E4 180 MEMO P9_A3000_E5 800 MEMO PMAXVALUE_AMAXVALUE_EMAXVALUE 1 ICE_AGREEMENT x 36 CRIBER_HISTORY x 11 可以使用如下的脚本来完成rowid的切分。
#### $1 dba conn details #### $2 table owner #### $3 table_name #### $4 subobject_name #### $5 parallel_no function normal_split { sqlplus -s $1 <<1eof set linesize 200 set pages 0 set feedback off spool list/rowid_range_$3_x.lst select rownum || ', ' ||' rowid between '|| chr(39)||dbms_rowid.rowid_create( 1, DOI, lo_fno, lo_block, 0 ) ||chr(39)|| ' and ' || chr(39)||dbms_rowid.rowid_create( 1, DOI, hi_fno, hi_block, 1000000 )||chr(39) data from ( SELECT DISTINCT DOI, grp, first_value(relative_fno) over (partition BY DOI,grp order by relative_fno, block_id rows BETWEEN unbounded preceding AND unbounded following) lo_fno, first_value(block_id ) over (partition BY DOI,grp order by relative_fno, block_id rows BETWEEN unbounded preceding AND unbounded following) lo_block, last_value(relative_fno) over (partition BY DOI,grp order by relative_fno, block_id rows BETWEEN unbounded preceding AND unbounded following) hi_fno, last_value(block_id+blocks-1) over (partition BY DOI,grp order by relative_fno, block_id rows BETWEEN unbounded preceding AND unbounded following) hi_block, SUM(blocks) over (partition BY DOI,grp) sum_blocks,SUBOBJECT_NAME FROM( SELECT obj.OBJECT_ID, obj.SUBOBJECT_NAME, obj.DATA_OBJECT_ID as DOI, ext.relative_fno, ext.block_id, ( SUM(blocks) over () ) SUM, (SUM(blocks) over (ORDER BY DATA_OBJECT_ID,relative_fno, block_id)-0.01 ) sum_fno , TRUNC( (SUM(blocks) over (ORDER BY DATA_OBJECT_ID,relative_fno, block_id)-0.01) / (SUM(blocks) over ()/ $5 ) ) grp, ext.blocks FROM dba_extents ext, dba_objects obj WHERE ext.segment_name = UPPER('$3') AND ext.owner = UPPER('$2') AND obj.owner = ext.owner AND obj.object_name = ext.segment_name AND obj.DATA_OBJECT_ID IS NOT NULL ORDER BY DATA_OBJECT_ID, relative_fno, block_id ) order by DOI,grp ); spool off; EOF }
function partition_split { sqlplus -s $1 <<1eof set linesize 200 set pages 0 set feedback off spool list/rowid_range_$3_$4.lst select rownum || ', ' ||' rowid between '|| chr(39)||dbms_rowid.rowid_create( 1, DOI, lo_fno, lo_block, 0 ) ||chr(39)|| ' and ' || chr(39)||dbms_rowid.rowid_create( 1, DOI, hi_fno, hi_block, 1000000 )||chr(39) data from ( SELECT DISTINCT DOI, grp, first_value(relative_fno) over (partition BY DOI,grp order by relative_fno, block_id rows BETWEEN unbounded preceding AND unbounded following) lo_fno, first_value(block_id ) over (partition BY DOI,grp order by relative_fno, block_id rows BETWEEN unbounded preceding AND unbounded following) lo_block, last_value(relative_fno) over (partition BY DOI,grp order by relative_fno, block_id rows BETWEEN unbounded preceding AND unbounded following) hi_fno, last_value(block_id+blocks-1) over (partition BY DOI,grp order by relative_fno, block_id rows BETWEEN unbounded preceding AND unbounded following) hi_block, SUM(blocks) over (partition BY DOI,grp) sum_blocks,SUBOBJECT_NAME FROM( SELECT obj.OBJECT_ID, obj.SUBOBJECT_NAME, obj.DATA_OBJECT_ID as DOI, ext.relative_fno, ext.block_id, ( SUM(blocks) over () ) SUM, (SUM(blocks) over (ORDER BY DATA_OBJECT_ID,relative_fno, block_id)-0.01 ) sum_fno , TRUNC( (SUM(blocks) over (ORDER BY DATA_OBJECT_ID,relative_fno, block_id)-0.01) / (SUM(blocks) over ()/ $5 ) ) grp, ext.blocks FROM dba_extents ext, dba_objects obj WHERE ext.segment_name = UPPER('$3') AND ext.owner = UPPER('$2') AND obj.owner = ext.owner AND obj.object_name = ext.segment_name AND obj.DATA_OBJECT_ID IS NOT NULL AND obj.subobject_name=UPPER('$4') ORDER BY DATA_OBJECT_ID, relative_fno, block_id ) order by DOI,grp ); spool off EOF }
sub_partition_name=$4
if [[ $sub_partition_name = 'x' ]] then normal_split $1 $2 $3 x $5 else partition_split $1 $2 $3 $4 $5 fi 脚本比较长,需要的参数有5个,因为访问dba_extents,dba_objects需要一定的权限,可以使用dba权限的账号即可。 第2个参数是表的owner,第3个参数是表名,第4个参数是分区表名(如果是分区表就是分区表名,如果不是就填x),第5个参数就是期望使用的并行度,能够在一定程度上加快速度 简单演示一下,可以通过下面的方式来运行脚本,我们指定生成10个dump这个表不是分区表。
ksh gen_rowid.sh n1/n1 prdowner subscriber_history x 10 1, where rowid between 'AAB4VPAAJAAD7qAAAA' and 'AAB4VPAAJAAD/R/EJA' 2, where rowid between 'AAB4VPAAJAAD/SAAAA' and 'AAB4VPAAKAABV5/EJA' 3, where rowid between 'AAB4VPAAKAABV6AAAA' and 'AAB4VPAALAAE/p/EJA' 4, where rowid between 'AAB4VPAALAAE/qAAAA' and 'AAB4VPAAMAAFFh/EJA' 5, where rowid between 'AAB4VPAAMAAFFiAAAA' and 'AAB4VPAAyAACuh/EJA' 6, where rowid between 'AAB4VPAAyAACuiAAAA' and 'AAB4VPAAzAACe5/EJA' 7, where rowid between 'AAB4VPAAzAACe6AAAA' and 'AAB4VPAA1AACZR/EJA' 8, where rowid between 'AAB4VPAA1AACZSAAAA' and 'AAB4VPAA2AACWR/EJA' 9, where rowid between 'AAB4VPAA2AACWSAAAA' and 'AAB4VPAA4AACP5/EJA' 10, where rowid between 'AAB4VPAA4AACQCAAAA' and 'AAB4VPAA5AACHx/EJA' 然后我们来看看数据是否足够平均。 可以类似下面的方式验证,我们抽第1,2,10个。
SQL> select count(*)from subscriber_history where rowid between 'AAB4VPAAJAAD7qAAAA' and 'AAB4VPAAJAAD/R/EJA' 2 ;
COUNT(*) ---------- 328759
SQL> select count(*)from subscriber_history where rowid between 'AAB4VPAAJAAD/SAAAA' and 'AAB4VPAAKAABV5/EJA' 2 /
COUNT(*) ---------- 318021
SQL> select count(*)from subscriber_history where rowid between 'AAB4VPAA4AACQCAAAA' and 'AAB4VPAA5AACHx/EJA';
COUNT(*) ---------- 332638 可以看到数据还是很平均的,达到了我们的期望。
