Oracle当number类型超过一定长度直方图限制导致SQL执行计划错误

来源:这里教程网 时间:2026-03-03 16:22:42 作者:

Oracle当number类型超过一定长度直方图限制

背景

生产系统,监控巡检发现某个SQL逻辑读非常高,通过查看执行计划,存在三个执行计划,最高cost 9w多,较低的两个位100左右。 经过分析,把具体分析以及发现虽然查询where条件列拥有直方图,且存在数据倾斜,但是CBO依然无法判断数据分布情况通过 模拟测试如下。

1. 11.2.0.4构造测试环境

1.1 创建测试表,包含两列,均为number

create table test_hist (id1 number(30) primary key,id2 number(20) not null);

1.2 插入10000条数据,均为15位加1~10000数字构成,也就是两列的distinct均与行数相同

begin
for i in 1..10000
loop
insert into test_hist values(111021111112345||i,111021111112345||i);
end loop;
commit;
end;
/

1.3 再次插入1w条数据

第一列在上面1w的基础上递增,第二列与上面中第一行数据相同,也就是第二列的分布已经倾斜,1110211111123451值在2w总数据中有10001条。

begin
for i in 1..10000
loop
insert into test_hist values(1110211111123451000||i,1110211111123451);
end loop;
commit;
end;
/
SCOTT@honor1 > select count(*) from test_hist;
                                COUNT(*)
----------------------------------------
                                   20000
SCOTT@honor1 > select count(*) from test_hist where id2=1110211111123451;
                                COUNT(*)
----------------------------------------
                                   10001

1.4 在第二列创建普通索引

create index idx_test_hist_id2 on test_hist(id2);

1.5 收集统计信息,不收集直方图

exec dbms_stats.gather_table_stats(user,'TEST_HIST',method_opt=>'for all columns size 1',estimate_percent=>100,cascade=>true,no_invalidate=>false);

1.6 查看数据分布

set lines 200 pages 200
col table_name for a15
col num_distinct for 9999999
col density for 999.9999999
col num_nulls for 9999
col num_buckets for 9999
col low_value for a25
col high_value for a25
select table_name,column_name,NUM_DISTINCT,LOW_VALUE,HIGH_VALUE,DENSITY,NUM_NULLS,NUM_BUCKETS,HISTOGRAM 
from user_tab_col_statistics where table_name='TEST_HIST';
TABLE_NAME COLUMN_NAME          NUM_DISTINCT LOW_VALUE                      HIGH_VALUE                     DENSITY NUM_NULLS NUM_BUCKETS HISTOGRAM
---------- -------------------- ------------ ------------------------------ ------------------------- ------------ --------- ----------- ---------------
TEST_HIST  ID1                         20000 C80C0B160C0C0D2334             CC0C0B160C0C0D23340102        .0000500         0           1 NONE
TEST_HIST  ID2                         10000 C80C0B160C0C0D2334             CA0C0B160C0C0D2334            .0001000         0           1 NONE
col ENDPOINT_ACTUAL_VALUE for a25
col ENDPOINT_ACTUAL_VALUE_RAW for a25
select * from user_tab_histograms where table_name='TEST_HIST';
TABLE_NAME  COLUMN_NAME  ENDPOINT_NUMBER            ENDPOINT_VALUE ENDPOINT_ACTUAL_VALU
----------- ---------------------------- ------------------------- --------------------
TEST_HIST   ID1                        0          1110211111123450
TEST_HIST   ID2                        0          1110211111123450
TEST_HIST   ID1                        1  111021111112345000000000
TEST_HIST   ID2                        1      11102111111234500000

2. 测试SQL语

2.1 查看执行计划

SCOTT@honor1 > set autotrace traceonly
SCOTT@honor1 > select /*+ gather_plan_statistics */ id2 from test_hist where id2=1110211111123451;
10001 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1411376830
--------------------------------------------------------------------------------------
| Id  | Operation        | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                   |     2 |    22 |     1   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| IDX_TEST_HIST_ID2 |     2 |    22 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

2.2 查看真实执行计划     

SYS@honor1 > select sql_id,sql_text,child_number,plan_hash_value from v$sql where sql_text like 'select /*+ gather_plan_statistics */ id2 from test_hist%';
SYS@honor1 > select * from table(dbms_xplan.display_cursor('02v238rmgtfg8',0,'allstats last'));
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------
SQL_ID  02v238rmgtfg8, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ id2 from test_hist where
id2=1110211111123451
Plan hash value: 1411376830
---------------------------------------------------------------------------------------------------------
| Id  | Operation        | Name              | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                   |      1 |        |  10001 |00:00:00.01 |     696 |     31 |
|*  1 |  INDEX RANGE SCAN| IDX_TEST_HIST_ID2 |      1 |      2 |  10001 |00:00:00.01 |     696 |     31 |
---------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("ID2"=1110211111123451)
19 rows selected.

# 可以看到由于CBO优化器不知道ID2存在倾斜,导致estimate-rows与actual-rows存在非常大偏差,导致选择了错误执行计划,走了索引。

2.3 收集直方图

# 11g最大直方图buckets为254,所以按照最大收集直方图。

exec dbms_stats.gather_table_stats(user,'TEST_HIST',method_opt=>'for all columns size 254',
estimate_percent=>100,cascade=>true,no_invalidate=>false);
set lines 200 pages 200
col table_name for a15
col num_distinct for 9999999
col density for 999.9999999
col num_nulls for 9999
col num_buckets for 9999
col low_value for a25
col high_value for a25
select table_name,column_name,NUM_DISTINCT,LOW_VALUE,HIGH_VALUE,DENSITY,NUM_NULLS,NUM_BUCKETS,HISTOGRAM from user_tab_col_statistics where table_name='TEST_HIST'
TABLE_NAME      COLUMN_NAME          NUM_DISTINCT LOW_VALUE                 HIGH_VALUE                     DENSITY NUM_NULLS NUM_BUCKETS HISTOGRAM
--------------- -------------------- ------------ ------------------------- ------------------------- ------------ --------- ----------- ---------------
TEST_HIST       ID1                         20000 C80C0B160C0C0D2334        CC0C0B160C0C0D23340102        .0000500         0         254 HEIGHT BALANCED
TEST_HIST       ID2                         10000 C80C0B160C0C0D2334        CA0C0B160C0C0D2334            .0000500         0         254 HEIGHT BALANCED
SCOTT@honor1 > select table_name,column_name,endpoint_number,to_char(ENDPOINT_VALUE),ENDPOINT_ACTUAL_VALUE from user_tab_histograms where table_name='TEST_HIST' order by column_name,endpoint_number;
TABLE_NAME      COLUMN_NAME          ENDPOINT_NUMBER TO_CHAR(ENDPOINT_VALUE)                  ENDPOINT_ACTUAL_VALU
--------------- -------------------- --------------- ---------------------------------------- --------------------
TEST_HIST       ID2                              126 1110211111123450
TEST_HIST       ID2                              127 11102111111234500
...

2.4 通过上述列直方图统计信息以及测试可以得出如下情况:

# 取15位后,第十五位四舍五入,其余位补0后,那么实际查询时,传入值如果为具体值,不会参考具体值,验证如下: # 验证之前ID2列直方图信息如下:

TABLE_NAME                     COLUMN_NAME  ENDPOINT_NUMBER TO_CHAR(ENDPOINT_VALUE) 
------------------------------ ---------------------------- ------------------------
TEST_HIST                      ID2                        0 1110211111123450
TEST_HIST                      ID2                      112 1110211111123450
TEST_HIST                      ID2                      124 1110211111123460
TEST_HIST                      ID2                      125 11102111111234500
TEST_HIST                      ID2                      126 11102111111234500
TEST_HIST                      ID2                      127 11102111111234500
TEST_HIST                      ID2                      128 11102111111234600
TEST_HIST                      ID2                      129 11102111111234600
TEST_HIST                      ID2                      130 11102111111234600
TEST_HIST                      ID2                      131 11102111111234600
TEST_HIST                      ID2                      132 111021111112345000
TEST_HIST                      ID2                      133 111021111112345000
TEST_HIST                      ID2                      134 111021111112345000
TEST_HIST                      ID2                      135 111021111112345000
TEST_HIST                      ID2                      136 111021111112345000
TEST_HIST                      ID2                      137 111021111112345000
TEST_HIST                      ID2                      138 111021111112345000
TEST_HIST                      ID2                      139 111021111112345000
TEST_HIST                      ID2                      140 111021111112345000
TEST_HIST                      ID2                      141 111021111112345000
TEST_HIST                      ID2                      142 111021111112345000
TEST_HIST                      ID2                      143 111021111112346000
TEST_HIST                      ID2                      144 111021111112346000
TEST_HIST                      ID2                      145 111021111112346000
TEST_HIST                      ID2                      146 111021111112346000
TEST_HIST                      ID2                      147 111021111112346000
TEST_HIST                      ID2                      148 111021111112346000
TEST_HIST                      ID2                      149 111021111112346000
TEST_HIST                      ID2                      150 111021111112346000
TEST_HIST                      ID2                      151 111021111112346000
TEST_HIST                      ID2                      152 1110211111123450000
TEST_HIST                      ID2                      153 1110211111123450000
TEST_HIST                      ID2                      154 1110211111123450000
TEST_HIST                      ID2                      155 1110211111123450000
TEST_HIST                      ID2                      156 1110211111123450000
TEST_HIST                      ID2                      157 1110211111123450000
TEST_HIST                      ID2                      158 1110211111123450000
TEST_HIST                      ID2                      159 1110211111123450000
TEST_HIST                      ID2                      160 1110211111123450000
TEST_HIST                      ID2                      161 1110211111123450000
TEST_HIST                      ID2                      162 1110211111123450000
TEST_HIST                      ID2                      163 1110211111123450000
TEST_HIST                      ID2                      164 1110211111123450000
TEST_HIST                      ID2                      165 1110211111123450000
TEST_HIST                      ID2                      166 1110211111123450000
TEST_HIST                      ID2                      167 1110211111123450000
TEST_HIST                      ID2                      168 1110211111123450000
TEST_HIST                      ID2                      169 1110211111123450000
TEST_HIST                      ID2                      170 1110211111123450000
TEST_HIST                      ID2                      171 1110211111123450000
TEST_HIST                      ID2                      172 1110211111123450000
TEST_HIST                      ID2                      173 1110211111123450000
TEST_HIST                      ID2                      174 1110211111123450000
TEST_HIST                      ID2                      175 1110211111123450000
TEST_HIST                      ID2                      176 1110211111123450000
TEST_HIST                      ID2                      177 1110211111123450000
TEST_HIST                      ID2                      178 1110211111123450000
TEST_HIST                      ID2                      179 1110211111123450000
TEST_HIST                      ID2                      180 1110211111123450000
TEST_HIST                      ID2                      181 1110211111123450000
TEST_HIST                      ID2                      182 1110211111123450000
TEST_HIST                      ID2                      183 1110211111123450000
TEST_HIST                      ID2                      184 1110211111123450000
TEST_HIST                      ID2                      185 1110211111123450000
TEST_HIST                      ID2                      186 1110211111123450000
TEST_HIST                      ID2                      187 1110211111123450000
TEST_HIST                      ID2                      188 1110211111123450000
TEST_HIST                      ID2                      189 1110211111123450000
TEST_HIST                      ID2                      190 1110211111123450000
TEST_HIST                      ID2                      191 1110211111123450000
TEST_HIST                      ID2                      192 1110211111123450000
TEST_HIST                      ID2                      193 1110211111123450000
TEST_HIST                      ID2                      194 1110211111123450000
TEST_HIST                      ID2                      195 1110211111123450000
TEST_HIST                      ID2                      196 1110211111123450000
TEST_HIST                      ID2                      197 1110211111123450000
TEST_HIST                      ID2                      198 1110211111123460000
TEST_HIST                      ID2                      199 1110211111123460000
TEST_HIST                      ID2                      200 1110211111123460000
TEST_HIST                      ID2                      201 1110211111123460000
TEST_HIST                      ID2                      202 1110211111123460000
TEST_HIST                      ID2                      203 1110211111123460000
TEST_HIST                      ID2                      204 1110211111123460000
TEST_HIST                      ID2                      205 1110211111123460000
TEST_HIST                      ID2                      206 1110211111123460000
TEST_HIST                      ID2                      207 1110211111123460000
TEST_HIST                      ID2                      208 1110211111123460000
TEST_HIST                      ID2                      209 1110211111123460000
TEST_HIST                      ID2                      210 1110211111123460000
TEST_HIST                      ID2                      211 1110211111123460000
TEST_HIST                      ID2                      212 1110211111123460000
TEST_HIST                      ID2                      213 1110211111123460000
TEST_HIST                      ID2                      214 1110211111123460000
TEST_HIST                      ID2                      215 1110211111123460000
TEST_HIST                      ID2                      216 1110211111123460000
TEST_HIST                      ID2                      217 1110211111123460000
TEST_HIST                      ID2                      218 1110211111123460000
TEST_HIST                      ID2                      219 1110211111123460000
TEST_HIST                      ID2                      220 1110211111123460000
TEST_HIST                      ID2                      221 1110211111123460000
TEST_HIST                      ID2                      222 1110211111123460000
TEST_HIST                      ID2                      223 1110211111123460000
TEST_HIST                      ID2                      224 1110211111123460000
TEST_HIST                      ID2                      225 1110211111123460000
TEST_HIST                      ID2                      226 1110211111123460000
TEST_HIST                      ID2                      227 1110211111123460000
TEST_HIST                      ID2                      228 1110211111123460000
TEST_HIST                      ID2                      229 1110211111123460000
TEST_HIST                      ID2                      230 1110211111123460000
TEST_HIST                      ID2                      231 1110211111123460000
TEST_HIST                      ID2                      232 1110211111123460000
TEST_HIST                      ID2                      233 1110211111123460000
TEST_HIST                      ID2                      234 1110211111123460000
TEST_HIST                      ID2                      235 1110211111123460000
TEST_HIST                      ID2                      236 1110211111123460000
TEST_HIST                      ID2                      237 1110211111123460000
TEST_HIST                      ID2                      238 1110211111123460000
TEST_HIST                      ID2                      239 1110211111123460000
TEST_HIST                      ID2                      240 1110211111123460000
TEST_HIST                      ID2                      241 1110211111123460000
TEST_HIST                      ID2                      242 1110211111123460000
TEST_HIST                      ID2                      243 1110211111123460000
TEST_HIST                      ID2                      244 1110211111123460000
TEST_HIST                      ID2                      245 1110211111123460000
TEST_HIST                      ID2                      246 1110211111123460000
TEST_HIST                      ID2                      247 1110211111123460000
TEST_HIST                      ID2                      248 1110211111123460000
TEST_HIST                      ID2                      249 1110211111123460000
TEST_HIST                      ID2                      250 1110211111123460000
TEST_HIST                      ID2                      251 1110211111123460000
TEST_HIST                      ID2                      252 1110211111123460000
TEST_HIST                      ID2                      253 1110211111123460000
TEST_HIST                      ID2                      254 11102111111234500000

# 使用直方图显示的为popular value实际只有6行的1110211111123450值验证: # 可以看到执行计划依然根据popular value公式,走了全表扫描。 # cardinality计算公式如下:

cardinality = NUM_ROWS * selectivity
selectivity = (Buckets_this_popular_value) / Buckets_total) * Null_Adjust
Null_Adjust = (NUM_ROWS - NUM_NULLS) / NUM_ROWS

注释:这里Buckets_this_popular_value为popular value所占buckets总数,Buckets_Total为buckets总数。 将上述统计信息代入上述公式 cardinality = 22449 * (112 / 254) * 1 ≈ 9921 符合下面计算公式:

SCOTT@honor1 > select id1,id2 from test_hist where id2=1110211111123450;
6 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3529380458
-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |  9921 |   232K|    27   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST_HIST |  9921 |   232K|    27   (0)| 00:00:01 |
-------------------------------------------------------------------------------

# 使用真正的popular value但是在直方图中为non-popular value反而走了不应该走的索引扫描:

SCOTT@honor1 > select id1,id2 from test_hist where id2=1110211111123451;
10007 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2004230999
-------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                   |     1 |    24 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TEST_HIST         |     1 |    24 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IDX_TEST_HIST_ID2 |     1 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

2.5 根据上述直方图数据分布以及测试得出如下情况

(1)直方图收集number类型时,字符集为AL32UTF8时,取number前15位,其余位有多少位补多少0,第十五位是否出现四舍五入情况,取决于数据分布:

hight balanced直方图含义为每个bucket中平均分布所有not null值,当第十六位为5、6、7、8、9的总和无法放在取前十五位第十六位补0的值所在bucket中时,

那么会再出现下一个bucket,这个bucket的endpoint_value将会为第十五位加一,这个情况可以通过测试得到验证。

(2)ID2值1110211111123450从126 bucket开始表示该值为popular value,0~127个bucket的endpoint_value相同。

endpoint_number为0的user_tab_col_statistics行,表示该endpoint_value行数不足一个bucket,endpoint_number为累计值bucket number。

(3)多个endpoint_value相同的为popular value,可以看出当1110211111123451为最小值且行数大于1个bucket后,则bucket分布如下,不会包含endpoint_number为0的数据分布:

SCOTT@honor1 > select table_name,column_name,endpoint_number,to_char(ENDPOINT_VALUE),ENDPOINT_ACTUAL_VALUE from user_tab_histograms where table_name='TEST_HIST' order by column_name,endpoint_number;
TABLE_NAME      COLUMN_NAME          ENDPOINT_NUMBER TO_CHAR(ENDPOINT_VALUE)                  ENDPOINT_ACTUAL_VALU
--------------- -------------------- --------------- ---------------------------------------- --------------------
TEST_HIST       ID2                              126 1110211111123450
TEST_HIST       ID2                              127 11102111111234500

当最小值不满足一个bucket时,则会有直方图中endpoint_number为0的行数,但是并不是一个bucket,只是代表该endpoint_value并不满足一个bucket,如下:

# 下面数据为再次插入了1110211111123450值与1110211111123450~1110211111123459部分值重新收集的统计信息

TABLE_NAME                     COLUMN_NAME                                   ENDPOINT_NUMBER                           ENDPOINT_VALUE ENDPOINT_ACTUAL_VALU
------------------------------ -------------------- ---------------------------------------- ---------------------------------------- --------------------
TEST_HIST                      ID2                                                         0                         1110211111123450
TEST_HIST                      ID2                                                       112                         1110211111123450
...

(4)取15位后,第十五位四舍五入,其余位补0后,那么实际查询时,传入值如果为具体值,不会参考具体值,上面测试已经验证。

3.19c情况会有所改善吗?

3.1 构造如下数据分布的测试表

CZH@czhpdb > select count(*) from test_hist;
                                COUNT(*)
----------------------------------------
                                   27200
CZH@czhpdb > select id2,count(*) from test_hist group by id2 having count(*) > 2 order by 1;
                                     ID2                                 COUNT(*)
---------------------------------------- ----------------------------------------
                        1110211111123450                                      900
                        1110211111123451                                    10001
                        1110211111123452                                      901
                        1110211111123453                                      901
                        1110211111123454                                      901
                        1110211111123455                                      901
                        1110211111123456                                      901
                        1110211111123457                                      901
                        1110211111123458                                      901
其余值均为distinct值,均只有1个。

3.2 收集直方图

CZH@czhpdb > select * from user_tab_histograms where table_name='TEST_HIST' and column_name='ID2' order by endpoint_number;
TABLE_NAME      COLUM ENDPOINT_NUMBER                           ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE     ENDPOINT_ACTUAL_VALUE_RAW ENDPOINT_REPEAT_COUNT SCOPE
--------------- ----- --------------- ---------------------------------------- ------------------------- ------------------------- --------------------- -------
TEST_HIST       ID2                69                         1110211111123450 1110211111123450          C80C0B160C0C0D2333                            0 SHARED
TEST_HIST       ID2               838                         1110211111123450 1110211111123451          C80C0B160C0C0D2334                            0 SHARED
TEST_HIST       ID2               907                         1110211111123450 1110211111123452          C80C0B160C0C0D2335                            0 SHARED
TEST_HIST       ID2               974                         1110211111123450 1110211111123453          C80C0B160C0C0D2336                            0 SHARED
TEST_HIST       ID2              1040                         1110211111123450 1110211111123454          C80C0B160C0C0D2337                            0 SHARED
TEST_HIST       ID2              1107                         1110211111123450 1110211111123455          C80C0B160C0C0D2338                            0 SHARED
TEST_HIST       ID2              1174                         1110211111123460 1110211111123456          C80C0B160C0C0D2339                            0 SHARED
TEST_HIST       ID2              1241                         1110211111123460 1110211111123457          C80C0B160C0C0D233A                            0 SHARED
TEST_HIST       ID2              1307                         1110211111123460 1110211111123458          C80C0B160C0C0D233B                            0 SHARED
TEST_HIST       ID2              1308                        11102111111234500 11102111111234510         C9020C030C0C0C182E0B                          0 SHARED
TEST_HIST       ID2              1309                        11102111111234500 11102111111234524         C9020C030C0C0C182E19                          0 SHARED
TEST_HIST       ID2              1310                        11102111111234500 11102111111234537         C9020C030C0C0C182E26                          0 SHARED
TEST_HIST       ID2              1311                        11102111111234600 11102111111234551         C9020C030C0C0C182E34                          0 SHARED
TEST_HIST       ID2              1312                        11102111111234600 11102111111234564         C9020C030C0C0C182E41                          0 SHARED
TEST_HIST       ID2              1313                        11102111111234600 11102111111234578         C9020C030C0C0C182E4F                          0 SHARED
TEST_HIST       ID2              1314                        11102111111234600 11102111111234591         C9020C030C0C0C182E5C                          0 SHARED
TEST_HIST       ID2              1315                       111021111112345000 111021111112345105        C90C0B160C0C0D233406                          0 SHARED
TEST_HIST       ID2              1316                       111021111112345000 111021111112345118        C90C0B160C0C0D233413                          0 SHARED
...

# 省略其余bucket直到2048 可以看到Oracle 19c对直方图完善非常明显,可以执行与11.2.0.4中无法正确选择执行计划的SQL,测试如下: # 分别测试两个non-popular与 一个真正popular值

CZH@czhpdb > set autot traceonly
CZH@czhpdb > select id1,id2 from test_hist where id2=1110211111123451;
10001 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3529380458
-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           | 10213 |   229K|    33   (4)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST_HIST | 10213 |   229K|    33   (4)| 00:00:01 |
-------------------------------------------------------------------------------

CZH@czhpdb > select id1,id2 from test_hist where id2=1110211111123450;
900 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2521703107
---------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                   |   916 | 21068 |     9   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TEST_HIST         |   916 | 21068 |     9   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | IDX_TEST_HIST_ID2 |   916 |       |     4   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

CZH@czhpdb > select id1,id2 from test_hist where id2=1110211111123453;
901 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2521703107
---------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                   |   890 | 20470 |     9   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TEST_HIST         |   890 | 20470 |     9   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | IDX_TEST_HIST_ID2 |   890 |       |     4   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

# 可以看到19c虽然endpoint_value在AL32UTF8字符集下也会将16位变成0,但是会正确显示ENDPOINT_ACTUAL_VALUE,这就会给CBO提供非常充足信息 # 可以看到CBO有了充足信息,可以正确选择执行计划,19c相比11.2.0.4进步非常明显。

4. 在11g中这种情况有办法改善吗?

有办法改善,但是不同情况 有不同办法,针对本文上面数据情况,可以采用函数索引办法,如下:

# 可通过substr函数索引改善上文中数据倾斜,直方图无法解决问题,测试如下:

# s ubstr函数会隐式使用to_char转换number创建索引,查询时,需要用单引号括起来,否则会发生隐式转换,将不走函数索引。

SCOTT@honor1 > create index idx_func_id2 on test_hist(substr(id2,16,5));
SCOTT@honor1 > exec dbms_stats.gather_table_stats(user,'TEST_HIST',method_opt=>'for all columns size 254',estimate_percent=>100,cascade=>true,no_invalidate=>false);
SCOTT@honor1 > select id1,id2 from test_hist where substr(id2,16,5)='1';    # 需要单引号括起来,否则不走索引
10001 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3529380458
-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |  9921 |   261K|    23   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST_HIST |  9921 |   261K|    23   (0)| 00:00:01 |
-------------------------------------------------------------------------------

SCOTT@honor1 > select id1,id2 from test_hist where substr(id2,16,5)='2';
Execution Plan
----------------------------------------------------------
Plan hash value: 3835702174
--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |     1 |    27 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TEST_HIST    |     1 |    27 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IDX_FUNC_ID2 |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

# 可以看到已经可以正确选择执行计划。

# 这种方法仅适用于数据前多少位完全一致,如果数据所有位均有可能发生变化,这种情况比较难以稳定执行计划。

综上: (1)11.2.0.4 number取前15,其余位根据情况补0. (2)19c中bucket数量扩展至2048,已经可以正确处理上文中数据倾斜情况。 (3)所以直方图并不是因为倾斜才需要收集,而是直方图能真实反应列统计信息才可以收集,才真正有帮助,上面例子就是列存在倾斜,但是由于直方图11.2.0.4直方图限制 导致无法反映真实统计信息,导致倾斜值存在大量行数,但是Oracle会认为是non popular值,cardinality会计算很低,导致走索引扫描或者由于计算cardinality较低,会导致一旦跟其他表关联时 走错误nest loop,将造成SQL执行效率低下。 直方图其他情况(11.2.0.4为例): 详情参见文章:http://www.dbsnake.net/two_interesting_points_on_histogram.html 1、如果目标列的distinct值的数量和目标表的数据量相同,即使该目标列在SYS.COL_USAGE$中有使用记录, Oracle在自动收集直方图统计信息的时候也不会对该列收集直方图统计信息; 2、在手工收集直方图统计信息的时候,如果我手工指定的bucket的数量等于目标列的distinct值的数量, 且这个值是小于等于254的话,那么Oracle此时收集的直方图统计信息的类型应该是FREQUENCY——这个结论成立的前提条件是该列的数据分布是倾斜的。

相关推荐