Oracle当number类型超过一定长度直方图限制
背景
生产系统,监控巡检发现某个SQL逻辑读非常高,通过查看执行计划,存在三个执行计划,最高cost 9w多,较低的两个位100左右。 经过分析,把具体分析以及发现虽然查询where条件列拥有直方图,且存在数据倾斜,但是CBO依然无法判断数据分布情况通过 模拟测试如下。
1. 11.2.0.4构造测试环境
1.1 创建测试表,包含两列,均为number
create table test_hist (id1 number(30) primary key,id2 number(20) not null);
1.2 插入10000条数据,均为15位加1~10000数字构成,也就是两列的distinct均与行数相同
begin for i in 1..10000 loop insert into test_hist values(111021111112345||i,111021111112345||i); end loop; commit; end; /
1.3 再次插入1w条数据
第一列在上面1w的基础上递增,第二列与上面中第一行数据相同,也就是第二列的分布已经倾斜,1110211111123451值在2w总数据中有10001条。
begin for i in 1..10000 loop insert into test_hist values(1110211111123451000||i,1110211111123451); end loop; commit; end; / SCOTT@honor1 > select count(*) from test_hist; COUNT(*) ---------------------------------------- 20000 SCOTT@honor1 > select count(*) from test_hist where id2=1110211111123451; COUNT(*) ---------------------------------------- 10001
1.4 在第二列创建普通索引
create index idx_test_hist_id2 on test_hist(id2);
1.5 收集统计信息,不收集直方图
exec dbms_stats.gather_table_stats(user,'TEST_HIST',method_opt=>'for all columns size 1',estimate_percent=>100,cascade=>true,no_invalidate=>false);
1.6 查看数据分布
set lines 200 pages 200 col table_name for a15 col num_distinct for 9999999 col density for 999.9999999 col num_nulls for 9999 col num_buckets for 9999 col low_value for a25 col high_value for a25 select table_name,column_name,NUM_DISTINCT,LOW_VALUE,HIGH_VALUE,DENSITY,NUM_NULLS,NUM_BUCKETS,HISTOGRAM from user_tab_col_statistics where table_name='TEST_HIST'; TABLE_NAME COLUMN_NAME NUM_DISTINCT LOW_VALUE HIGH_VALUE DENSITY NUM_NULLS NUM_BUCKETS HISTOGRAM ---------- -------------------- ------------ ------------------------------ ------------------------- ------------ --------- ----------- --------------- TEST_HIST ID1 20000 C80C0B160C0C0D2334 CC0C0B160C0C0D23340102 .0000500 0 1 NONE TEST_HIST ID2 10000 C80C0B160C0C0D2334 CA0C0B160C0C0D2334 .0001000 0 1 NONE col ENDPOINT_ACTUAL_VALUE for a25 col ENDPOINT_ACTUAL_VALUE_RAW for a25 select * from user_tab_histograms where table_name='TEST_HIST'; TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALU ----------- ---------------------------- ------------------------- -------------------- TEST_HIST ID1 0 1110211111123450 TEST_HIST ID2 0 1110211111123450 TEST_HIST ID1 1 111021111112345000000000 TEST_HIST ID2 1 11102111111234500000
2. 测试SQL语 句
2.1 查看执行计划
SCOTT@honor1 > set autotrace traceonly SCOTT@honor1 > select /*+ gather_plan_statistics */ id2 from test_hist where id2=1110211111123451; 10001 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1411376830 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2 | 22 | 1 (0)| 00:00:01 | |* 1 | INDEX RANGE SCAN| IDX_TEST_HIST_ID2 | 2 | 22 | 1 (0)| 00:00:01 | --------------------------------------------------------------------------------------
2.2 查看真实执行计划
SYS@honor1 > select sql_id,sql_text,child_number,plan_hash_value from v$sql where sql_text like 'select /*+ gather_plan_statistics */ id2 from test_hist%';
SYS@honor1 > select * from table(dbms_xplan.display_cursor('02v238rmgtfg8',0,'allstats last'));
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------
SQL_ID 02v238rmgtfg8, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ id2 from test_hist where
id2=1110211111123451
Plan hash value: 1411376830
---------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
---------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 10001 |00:00:00.01 | 696 | 31 |
|* 1 | INDEX RANGE SCAN| IDX_TEST_HIST_ID2 | 1 | 2 | 10001 |00:00:00.01 | 696 | 31 |
---------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID2"=1110211111123451)
19 rows selected.
# 可以看到由于CBO优化器不知道ID2存在倾斜,导致estimate-rows与actual-rows存在非常大偏差,导致选择了错误执行计划,走了索引。
2.3 收集直方图
# 11g最大直方图buckets为254,所以按照最大收集直方图。
exec dbms_stats.gather_table_stats(user,'TEST_HIST',method_opt=>'for all columns size 254', estimate_percent=>100,cascade=>true,no_invalidate=>false); set lines 200 pages 200 col table_name for a15 col num_distinct for 9999999 col density for 999.9999999 col num_nulls for 9999 col num_buckets for 9999 col low_value for a25 col high_value for a25 select table_name,column_name,NUM_DISTINCT,LOW_VALUE,HIGH_VALUE,DENSITY,NUM_NULLS,NUM_BUCKETS,HISTOGRAM from user_tab_col_statistics where table_name='TEST_HIST' TABLE_NAME COLUMN_NAME NUM_DISTINCT LOW_VALUE HIGH_VALUE DENSITY NUM_NULLS NUM_BUCKETS HISTOGRAM --------------- -------------------- ------------ ------------------------- ------------------------- ------------ --------- ----------- --------------- TEST_HIST ID1 20000 C80C0B160C0C0D2334 CC0C0B160C0C0D23340102 .0000500 0 254 HEIGHT BALANCED TEST_HIST ID2 10000 C80C0B160C0C0D2334 CA0C0B160C0C0D2334 .0000500 0 254 HEIGHT BALANCED SCOTT@honor1 > select table_name,column_name,endpoint_number,to_char(ENDPOINT_VALUE),ENDPOINT_ACTUAL_VALUE from user_tab_histograms where table_name='TEST_HIST' order by column_name,endpoint_number; TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER TO_CHAR(ENDPOINT_VALUE) ENDPOINT_ACTUAL_VALU --------------- -------------------- --------------- ---------------------------------------- -------------------- TEST_HIST ID2 126 1110211111123450 TEST_HIST ID2 127 11102111111234500 ...
2.4 通过上述列直方图统计信息以及测试可以得出如下情况:
# 取15位后,第十五位四舍五入,其余位补0后,那么实际查询时,传入值如果为具体值,不会参考具体值,验证如下: # 验证之前ID2列直方图信息如下:
TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER TO_CHAR(ENDPOINT_VALUE) ------------------------------ ---------------------------- ------------------------ TEST_HIST ID2 0 1110211111123450 TEST_HIST ID2 112 1110211111123450 TEST_HIST ID2 124 1110211111123460 TEST_HIST ID2 125 11102111111234500 TEST_HIST ID2 126 11102111111234500 TEST_HIST ID2 127 11102111111234500 TEST_HIST ID2 128 11102111111234600 TEST_HIST ID2 129 11102111111234600 TEST_HIST ID2 130 11102111111234600 TEST_HIST ID2 131 11102111111234600 TEST_HIST ID2 132 111021111112345000 TEST_HIST ID2 133 111021111112345000 TEST_HIST ID2 134 111021111112345000 TEST_HIST ID2 135 111021111112345000 TEST_HIST ID2 136 111021111112345000 TEST_HIST ID2 137 111021111112345000 TEST_HIST ID2 138 111021111112345000 TEST_HIST ID2 139 111021111112345000 TEST_HIST ID2 140 111021111112345000 TEST_HIST ID2 141 111021111112345000 TEST_HIST ID2 142 111021111112345000 TEST_HIST ID2 143 111021111112346000 TEST_HIST ID2 144 111021111112346000 TEST_HIST ID2 145 111021111112346000 TEST_HIST ID2 146 111021111112346000 TEST_HIST ID2 147 111021111112346000 TEST_HIST ID2 148 111021111112346000 TEST_HIST ID2 149 111021111112346000 TEST_HIST ID2 150 111021111112346000 TEST_HIST ID2 151 111021111112346000 TEST_HIST ID2 152 1110211111123450000 TEST_HIST ID2 153 1110211111123450000 TEST_HIST ID2 154 1110211111123450000 TEST_HIST ID2 155 1110211111123450000 TEST_HIST ID2 156 1110211111123450000 TEST_HIST ID2 157 1110211111123450000 TEST_HIST ID2 158 1110211111123450000 TEST_HIST ID2 159 1110211111123450000 TEST_HIST ID2 160 1110211111123450000 TEST_HIST ID2 161 1110211111123450000 TEST_HIST ID2 162 1110211111123450000 TEST_HIST ID2 163 1110211111123450000 TEST_HIST ID2 164 1110211111123450000 TEST_HIST ID2 165 1110211111123450000 TEST_HIST ID2 166 1110211111123450000 TEST_HIST ID2 167 1110211111123450000 TEST_HIST ID2 168 1110211111123450000 TEST_HIST ID2 169 1110211111123450000 TEST_HIST ID2 170 1110211111123450000 TEST_HIST ID2 171 1110211111123450000 TEST_HIST ID2 172 1110211111123450000 TEST_HIST ID2 173 1110211111123450000 TEST_HIST ID2 174 1110211111123450000 TEST_HIST ID2 175 1110211111123450000 TEST_HIST ID2 176 1110211111123450000 TEST_HIST ID2 177 1110211111123450000 TEST_HIST ID2 178 1110211111123450000 TEST_HIST ID2 179 1110211111123450000 TEST_HIST ID2 180 1110211111123450000 TEST_HIST ID2 181 1110211111123450000 TEST_HIST ID2 182 1110211111123450000 TEST_HIST ID2 183 1110211111123450000 TEST_HIST ID2 184 1110211111123450000 TEST_HIST ID2 185 1110211111123450000 TEST_HIST ID2 186 1110211111123450000 TEST_HIST ID2 187 1110211111123450000 TEST_HIST ID2 188 1110211111123450000 TEST_HIST ID2 189 1110211111123450000 TEST_HIST ID2 190 1110211111123450000 TEST_HIST ID2 191 1110211111123450000 TEST_HIST ID2 192 1110211111123450000 TEST_HIST ID2 193 1110211111123450000 TEST_HIST ID2 194 1110211111123450000 TEST_HIST ID2 195 1110211111123450000 TEST_HIST ID2 196 1110211111123450000 TEST_HIST ID2 197 1110211111123450000 TEST_HIST ID2 198 1110211111123460000 TEST_HIST ID2 199 1110211111123460000 TEST_HIST ID2 200 1110211111123460000 TEST_HIST ID2 201 1110211111123460000 TEST_HIST ID2 202 1110211111123460000 TEST_HIST ID2 203 1110211111123460000 TEST_HIST ID2 204 1110211111123460000 TEST_HIST ID2 205 1110211111123460000 TEST_HIST ID2 206 1110211111123460000 TEST_HIST ID2 207 1110211111123460000 TEST_HIST ID2 208 1110211111123460000 TEST_HIST ID2 209 1110211111123460000 TEST_HIST ID2 210 1110211111123460000 TEST_HIST ID2 211 1110211111123460000 TEST_HIST ID2 212 1110211111123460000 TEST_HIST ID2 213 1110211111123460000 TEST_HIST ID2 214 1110211111123460000 TEST_HIST ID2 215 1110211111123460000 TEST_HIST ID2 216 1110211111123460000 TEST_HIST ID2 217 1110211111123460000 TEST_HIST ID2 218 1110211111123460000 TEST_HIST ID2 219 1110211111123460000 TEST_HIST ID2 220 1110211111123460000 TEST_HIST ID2 221 1110211111123460000 TEST_HIST ID2 222 1110211111123460000 TEST_HIST ID2 223 1110211111123460000 TEST_HIST ID2 224 1110211111123460000 TEST_HIST ID2 225 1110211111123460000 TEST_HIST ID2 226 1110211111123460000 TEST_HIST ID2 227 1110211111123460000 TEST_HIST ID2 228 1110211111123460000 TEST_HIST ID2 229 1110211111123460000 TEST_HIST ID2 230 1110211111123460000 TEST_HIST ID2 231 1110211111123460000 TEST_HIST ID2 232 1110211111123460000 TEST_HIST ID2 233 1110211111123460000 TEST_HIST ID2 234 1110211111123460000 TEST_HIST ID2 235 1110211111123460000 TEST_HIST ID2 236 1110211111123460000 TEST_HIST ID2 237 1110211111123460000 TEST_HIST ID2 238 1110211111123460000 TEST_HIST ID2 239 1110211111123460000 TEST_HIST ID2 240 1110211111123460000 TEST_HIST ID2 241 1110211111123460000 TEST_HIST ID2 242 1110211111123460000 TEST_HIST ID2 243 1110211111123460000 TEST_HIST ID2 244 1110211111123460000 TEST_HIST ID2 245 1110211111123460000 TEST_HIST ID2 246 1110211111123460000 TEST_HIST ID2 247 1110211111123460000 TEST_HIST ID2 248 1110211111123460000 TEST_HIST ID2 249 1110211111123460000 TEST_HIST ID2 250 1110211111123460000 TEST_HIST ID2 251 1110211111123460000 TEST_HIST ID2 252 1110211111123460000 TEST_HIST ID2 253 1110211111123460000 TEST_HIST ID2 254 11102111111234500000
# 使用直方图显示的为popular value实际只有6行的1110211111123450值验证: # 可以看到执行计划依然根据popular value公式,走了全表扫描。 # cardinality计算公式如下:
cardinality = NUM_ROWS * selectivity selectivity = (Buckets_this_popular_value) / Buckets_total) * Null_Adjust Null_Adjust = (NUM_ROWS - NUM_NULLS) / NUM_ROWS
注释:这里Buckets_this_popular_value为popular value所占buckets总数,Buckets_Total为buckets总数。 将上述统计信息代入上述公式 cardinality = 22449 * (112 / 254) * 1 ≈ 9921 符合下面计算公式:
SCOTT@honor1 > select id1,id2 from test_hist where id2=1110211111123450; 6 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3529380458 ------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 9921 | 232K| 27 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TEST_HIST | 9921 | 232K| 27 (0)| 00:00:01 | -------------------------------------------------------------------------------
# 使用真正的popular value但是在直方图中为non-popular value反而走了不应该走的索引扫描:
SCOTT@honor1 > select id1,id2 from test_hist where id2=1110211111123451; 10007 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2004230999 ------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 24 | 2 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| TEST_HIST | 1 | 24 | 2 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IDX_TEST_HIST_ID2 | 1 | | 1 (0)| 00:00:01 | -------------------------------------------------------------------------------------------------
2.5 根据上述直方图数据分布以及测试得出如下情况
(1)直方图收集number类型时,字符集为AL32UTF8时,取number前15位,其余位有多少位补多少0,第十五位是否出现四舍五入情况,取决于数据分布:
hight balanced直方图含义为每个bucket中平均分布所有not null值,当第十六位为5、6、7、8、9的总和无法放在取前十五位第十六位补0的值所在bucket中时,
那么会再出现下一个bucket,这个bucket的endpoint_value将会为第十五位加一,这个情况可以通过测试得到验证。
(2)ID2值1110211111123450从126 bucket开始表示该值为popular value,0~127个bucket的endpoint_value相同。
endpoint_number为0的user_tab_col_statistics行,表示该endpoint_value行数不足一个bucket,endpoint_number为累计值bucket number。
(3)多个endpoint_value相同的为popular value,可以看出当1110211111123451为最小值且行数大于1个bucket后,则bucket分布如下,不会包含endpoint_number为0的数据分布:
SCOTT@honor1 > select table_name,column_name,endpoint_number,to_char(ENDPOINT_VALUE),ENDPOINT_ACTUAL_VALUE from user_tab_histograms where table_name='TEST_HIST' order by column_name,endpoint_number; TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER TO_CHAR(ENDPOINT_VALUE) ENDPOINT_ACTUAL_VALU --------------- -------------------- --------------- ---------------------------------------- -------------------- TEST_HIST ID2 126 1110211111123450 TEST_HIST ID2 127 11102111111234500
当最小值不满足一个bucket时,则会有直方图中endpoint_number为0的行数,但是并不是一个bucket,只是代表该endpoint_value并不满足一个bucket,如下:
# 下面数据为再次插入了1110211111123450值与1110211111123450~1110211111123459部分值重新收集的统计信息
TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALU ------------------------------ -------------------- ---------------------------------------- ---------------------------------------- -------------------- TEST_HIST ID2 0 1110211111123450 TEST_HIST ID2 112 1110211111123450 ...
(4)取15位后,第十五位四舍五入,其余位补0后,那么实际查询时,传入值如果为具体值,不会参考具体值,上面测试已经验证。
3.19c情况会有所改善吗?
3.1 构造如下数据分布的测试表
CZH@czhpdb > select count(*) from test_hist; COUNT(*) ---------------------------------------- 27200
CZH@czhpdb > select id2,count(*) from test_hist group by id2 having count(*) > 2 order by 1; ID2 COUNT(*) ---------------------------------------- ---------------------------------------- 1110211111123450 900 1110211111123451 10001 1110211111123452 901 1110211111123453 901 1110211111123454 901 1110211111123455 901 1110211111123456 901 1110211111123457 901 1110211111123458 901 其余值均为distinct值,均只有1个。
3.2 收集直方图
CZH@czhpdb > select * from user_tab_histograms where table_name='TEST_HIST' and column_name='ID2' order by endpoint_number; TABLE_NAME COLUM ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE ENDPOINT_ACTUAL_VALUE_RAW ENDPOINT_REPEAT_COUNT SCOPE --------------- ----- --------------- ---------------------------------------- ------------------------- ------------------------- --------------------- ------- TEST_HIST ID2 69 1110211111123450 1110211111123450 C80C0B160C0C0D2333 0 SHARED TEST_HIST ID2 838 1110211111123450 1110211111123451 C80C0B160C0C0D2334 0 SHARED TEST_HIST ID2 907 1110211111123450 1110211111123452 C80C0B160C0C0D2335 0 SHARED TEST_HIST ID2 974 1110211111123450 1110211111123453 C80C0B160C0C0D2336 0 SHARED TEST_HIST ID2 1040 1110211111123450 1110211111123454 C80C0B160C0C0D2337 0 SHARED TEST_HIST ID2 1107 1110211111123450 1110211111123455 C80C0B160C0C0D2338 0 SHARED TEST_HIST ID2 1174 1110211111123460 1110211111123456 C80C0B160C0C0D2339 0 SHARED TEST_HIST ID2 1241 1110211111123460 1110211111123457 C80C0B160C0C0D233A 0 SHARED TEST_HIST ID2 1307 1110211111123460 1110211111123458 C80C0B160C0C0D233B 0 SHARED TEST_HIST ID2 1308 11102111111234500 11102111111234510 C9020C030C0C0C182E0B 0 SHARED TEST_HIST ID2 1309 11102111111234500 11102111111234524 C9020C030C0C0C182E19 0 SHARED TEST_HIST ID2 1310 11102111111234500 11102111111234537 C9020C030C0C0C182E26 0 SHARED TEST_HIST ID2 1311 11102111111234600 11102111111234551 C9020C030C0C0C182E34 0 SHARED TEST_HIST ID2 1312 11102111111234600 11102111111234564 C9020C030C0C0C182E41 0 SHARED TEST_HIST ID2 1313 11102111111234600 11102111111234578 C9020C030C0C0C182E4F 0 SHARED TEST_HIST ID2 1314 11102111111234600 11102111111234591 C9020C030C0C0C182E5C 0 SHARED TEST_HIST ID2 1315 111021111112345000 111021111112345105 C90C0B160C0C0D233406 0 SHARED TEST_HIST ID2 1316 111021111112345000 111021111112345118 C90C0B160C0C0D233413 0 SHARED ...
# 省略其余bucket直到2048 可以看到Oracle 19c对直方图完善非常明显,可以执行与11.2.0.4中无法正确选择执行计划的SQL,测试如下: # 分别测试两个non-popular与 一个真正popular值
CZH@czhpdb > set autot traceonly CZH@czhpdb > select id1,id2 from test_hist where id2=1110211111123451; 10001 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3529380458 ------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10213 | 229K| 33 (4)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TEST_HIST | 10213 | 229K| 33 (4)| 00:00:01 | -------------------------------------------------------------------------------
CZH@czhpdb > select id1,id2 from test_hist where id2=1110211111123450; 900 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2521703107 --------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 916 | 21068 | 9 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| TEST_HIST | 916 | 21068 | 9 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IDX_TEST_HIST_ID2 | 916 | | 4 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------------------
CZH@czhpdb > select id1,id2 from test_hist where id2=1110211111123453; 901 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2521703107 --------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 890 | 20470 | 9 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| TEST_HIST | 890 | 20470 | 9 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IDX_TEST_HIST_ID2 | 890 | | 4 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------------------
# 可以看到19c虽然endpoint_value在AL32UTF8字符集下也会将16位变成0,但是会正确显示ENDPOINT_ACTUAL_VALUE,这就会给CBO提供非常充足信息 # 可以看到CBO有了充足信息,可以正确选择执行计划,19c相比11.2.0.4进步非常明显。
4. 在11g中这种情况有办法改善吗?
有办法改善,但是不同情况 有不同办法,针对本文上面数据情况,可以采用函数索引办法,如下:
# 可通过substr函数索引改善上文中数据倾斜,直方图无法解决问题,测试如下:
# s ubstr函数会隐式使用to_char转换number创建索引,查询时,需要用单引号括起来,否则会发生隐式转换,将不走函数索引。
SCOTT@honor1 > create index idx_func_id2 on test_hist(substr(id2,16,5)); SCOTT@honor1 > exec dbms_stats.gather_table_stats(user,'TEST_HIST',method_opt=>'for all columns size 254',estimate_percent=>100,cascade=>true,no_invalidate=>false); SCOTT@honor1 > select id1,id2 from test_hist where substr(id2,16,5)='1'; # 需要单引号括起来,否则不走索引 10001 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3529380458 ------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 9921 | 261K| 23 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TEST_HIST | 9921 | 261K| 23 (0)| 00:00:01 | -------------------------------------------------------------------------------
SCOTT@honor1 > select id1,id2 from test_hist where substr(id2,16,5)='2'; Execution Plan ---------------------------------------------------------- Plan hash value: 3835702174 -------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 27 | 2 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| TEST_HIST | 1 | 27 | 2 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IDX_FUNC_ID2 | 1 | | 1 (0)| 00:00:01 | --------------------------------------------------------------------------------------------
# 可以看到已经可以正确选择执行计划。
# 这种方法仅适用于数据前多少位完全一致,如果数据所有位均有可能发生变化,这种情况比较难以稳定执行计划。
综上: (1)11.2.0.4 number取前15,其余位根据情况补0. (2)19c中bucket数量扩展至2048,已经可以正确处理上文中数据倾斜情况。 (3)所以直方图并不是因为倾斜才需要收集,而是直方图能真实反应列统计信息才可以收集,才真正有帮助,上面例子就是列存在倾斜,但是由于直方图11.2.0.4直方图限制 导致无法反映真实统计信息,导致倾斜值存在大量行数,但是Oracle会认为是non popular值,cardinality会计算很低,导致走索引扫描或者由于计算cardinality较低,会导致一旦跟其他表关联时 走错误nest loop,将造成SQL执行效率低下。 直方图其他情况(11.2.0.4为例): 详情参见文章:http://www.dbsnake.net/two_interesting_points_on_histogram.html 1、如果目标列的distinct值的数量和目标表的数据量相同,即使该目标列在SYS.COL_USAGE$中有使用记录, Oracle在自动收集直方图统计信息的时候也不会对该列收集直方图统计信息; 2、在手工收集直方图统计信息的时候,如果我手工指定的bucket的数量等于目标列的distinct值的数量, 且这个值是小于等于254的话,那么Oracle此时收集的直方图统计信息的类型应该是FREQUENCY——这个结论成立的前提条件是该列的数据分布是倾斜的。
