[20190810]如何索引一个超长字段.txt

来源:这里教程网 时间:2026-03-03 14:00:18 作者:

[20190810]如何索引一个超长字段.txt --//链接问的一个问题:http://www.itpub.net/thread-2119521-1-1.html 因业务需要,一个字段的字符要很长,最长有4000, 并且还要用字段来判断唯一性,如果直接建立索引他会报错的,而且怎么长查询起 来是否很慢?请问各位前辈,有什么好的方法来处理这个问题。 --//他还加了一个需求保证唯一性。不知道对方使用什么版本,自己测试看看: 1.环境: SCOTT@test01p> @ ver1 PORT_STRING                    VERSION        BANNER                                                                               CON_ID ------------------------------ -------------- -------------------------------------------------------------------------------- ---------- IBMPC/WIN_NT64-9.1.0           12.2.0.1.0     Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production              0 2.测试例子建立: SCOTT@test01p> create table t (id number ,v1 varchar2(4000) ); Table created. SCOTT@test01p> insert into t select level ,lpad('x',100,'x')||level v1 from dual connect by level < 2e5; 199999 rows created. SCOTT@test01p> create unique index i_t_v1 on t(v1); Index created. --//实际上这样的索引还是能建立的,估计重新rebuild online 就会报错。 SCOTT@test01p> alter index i_t_v1 rebuild online ; alter index i_t_v1 rebuild online * ERROR at line 1: ORA-00604: error occurred at recursive SQL level 1 ORA-01450: maximum key length (3215) exceeded --//因为rebuild online 要建立一张IOT表,导致maximum key length (3215) exceeded. SCOTT@test01p> alter index i_t_v1 rebuild ; Index altered. --//取消online 没有问题。 3.测试: --//分析略。 SCOTT@test01p> select segment_name,header_file,header_block, BYTES,BLOCKS from dba_segments where owner=user and segment_name in ('I_T_V1','T'); SEGMENT_NAME         HEADER_FILE HEADER_BLOCK      BYTES     BLOCKS -------------------- ----------- ------------ ---------- ---------- T                             11          306   27262976       3328 I_T_V1                        11         1066   28311552       3456 --//你可以发现在我的例子中,索引占用的空间比表还大,因为我索引的字段长度占10X字节。如果想原链接的情况也许更大。 SCOTT@test01p> alter session set statistics_level = all; Session altered. SCOTT@test01p> select substr( v1,100,3) from t where v1=lpad('x',100,'x')||42; SUBSTR ------ x42 SCOTT@test01p> @ dpc '' '' PLAN_TABLE_OUTPUT ------------------------------------- SQL_ID  9bky8g0hf1td9, child number 1 ------------------------------------- select substr( v1,100,3) from t where v1=lpad('x',100,'x')||42 Plan hash value: 4077016850 ---------------------------------------------------------------------------------------------------------------------- | Id  | Operation         | Name   | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | ---------------------------------------------------------------------------------------------------------------------- |   0 | SELECT STATEMENT  |        |      1 |        |       |     2 (100)|          |      1 |00:00:00.01 |       3 | |*  1 |  INDEX UNIQUE SCAN| I_T_V1 |      1 |      1 |   107 |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       3 | ---------------------------------------------------------------------------------------------------------------------- Query Block Name / Object Alias (identified by operation id): -------------------------------------------------------------    1 - SEL$1 / T@SEL$1 Predicate Information (identified by operation id): ---------------------------------------------------    1 - access("V1"='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx42') 4.继续测试: --//12c 提供standard_hash 函数。 SCOTT@test01p> create unique index if_t_v1_1 on t(standard_hash(v1)); Index created. SCOTT@test01p> create unique index if_t_v1_2 on t(ora_hash(v1)); create unique index if_t_v1_2 on t(ora_hash(v1))                                             * ERROR at line 1: ORA-01452: cannot CREATE UNIQUE INDEX; duplicate keys found --//可以发现使用ora_hash出现重复值,无法建立唯一约束。而使用standard_hash函数没有问题。 SCOTT@test01p> select segment_name,header_file,header_block, BYTES,BLOCKS from dba_segments where owner=user and segment_name in ('I_T_V1','T','IF_T_V1_1'); SEGMENT_NAME         HEADER_FILE HEADER_BLOCK      BYTES     BLOCKS -------------------- ----------- ------------ ---------- ---------- T                             11          306   27262976       3328 IF_T_V1_1                     11          434    7340032        896 I_T_V1                        11         1066   28311552       3456 --//可以发现建立standard_hash的函数索引相对小一些。 SCOTT@test01p> select substr( v1,100,3) from t where v1=lpad('x',100,'x')||43; SUBSTR ------ x43 SCOTT@test01p> @ dpc '' '' PLAN_TABLE_OUTPUT ------------------------------------- SQL_ID  0jc1sx44ckdj9, child number 0 ------------------------------------- select substr( v1,100,3) from t where v1=lpad('x',100,'x')||43 Plan hash value: 4077016850 ------------------------------------------------------------------------------------------------------------------------------- | Id  | Operation         | Name   | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | Reads  | ------------------------------------------------------------------------------------------------------------------------------- |   0 | SELECT STATEMENT  |        |      1 |        |       |     2 (100)|          |      1 |00:00:00.03 |       3 |      2 | |*  1 |  INDEX UNIQUE SCAN| I_T_V1 |      1 |      1 |   111 |     2   (0)| 00:00:01 |      1 |00:00:00.03 |       3 |      2 | ------------------------------------------------------------------------------------------------------------------------------- Query Block Name / Object Alias (identified by operation id): -------------------------------------------------------------    1 - SEL$1 / T@SEL$1 Predicate Information (identified by operation id): ---------------------------------------------------    1 - access("V1"='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx               x43')        filter(STANDARD_HASH("V1")=HEXTORAW('FBDADC99CB387566504D65A6003CFA57CA05F238')) --//产生的执行有点怪怪的,仔细看filter条件就明白了不知道是否存在standard_hash函数索引的原因,也许使用I_T_V1不用回表的原因, --//但是使用的索引是I_T_V1.修改执行语句如下: SCOTT@test01p> select id,substr( v1,100,3) from t where v1=lpad('x',100,'x')||44;         ID SUBSTR ---------- ------         44 x44 SCOTT@test01p> @ dpc '' '' PLAN_TABLE_OUTPUT ------------------------------------- SQL_ID  dazgkpkj95mfs, child number 0 ------------------------------------- select id,substr( v1,100,3) from t where v1=lpad('x',100,'x')||44 Plan hash value: 1893196135 ----------------------------------------------------------------------------------------------------------------------------------- | Id  | Operation                   | Name      | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | ----------------------------------------------------------------------------------------------------------------------------------- |   0 | SELECT STATEMENT            |           |      1 |        |       |     3 (100)|          |      1 |00:00:00.01 |       4 | |*  1 |  TABLE ACCESS BY INDEX ROWID| T         |      1 |      1 |   132 |     3   (0)| 00:00:01 |      1 |00:00:00.01 |       4 | |*  2 |   INDEX UNIQUE SCAN         | IF_T_V1_1 |      1 |      1 |       |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       3 | ----------------------------------------------------------------------------------------------------------------------------------- Query Block Name / Object Alias (identified by operation id): -------------------------------------------------------------    1 - SEL$1 / T@SEL$1    2 - SEL$1 / T@SEL$1 Predicate Information (identified by operation id): ---------------------------------------------------    1 - filter("V1"='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx44'    2 - access("T"."SYS_NC00003$"=HEXTORAW('CC14E546ACC59A5B03CD75C49F62248C12C6D8D0')) --//注意我执行的查询条件是v1=lpad('x',100,'x')||44,并没有使用standard_hash函数查询,如果写成如下: SCOTT@test01p> select id,substr( v1,100,3) from t where standard_hash(v1)=standard_hash(lpad('x',100,'x')||44);         ID SUBSTR ---------- ------         44 x44 SCOTT@test01p> @ dpc '' '' PLAN_TABLE_OUTPUT ------------------------------------- SQL_ID  9z17dmg6hk84n, child number 0 ------------------------------------- select id,substr( v1,100,3) from t where standard_hash(v1)=standard_hash(lpad('x',100,'x')||44) Plan hash value: 1893196135 ----------------------------------------------------------------------------------------------------------------------------------- | Id  | Operation                   | Name      | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | ----------------------------------------------------------------------------------------------------------------------------------- |   0 | SELECT STATEMENT            |           |      1 |        |       |     3 (100)|          |      1 |00:00:00.01 |       4 | |   1 |  TABLE ACCESS BY INDEX ROWID| T         |      1 |      1 |   132 |     3   (0)| 00:00:01 |      1 |00:00:00.01 |       4 | |*  2 |   INDEX UNIQUE SCAN         | IF_T_V1_1 |      1 |      1 |       |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       3 | ----------------------------------------------------------------------------------------------------------------------------------- Query Block Name / Object Alias (identified by operation id): -------------------------------------------------------------    1 - SEL$1 / T@SEL$1    2 - SEL$1 / T@SEL$1 Predicate Information (identified by operation id): ---------------------------------------------------    2 - access("T"."SYS_NC00003$"=HEXTORAW('CC14E546ACC59A5B03CD75C49F62248C12C6D8D0')) --//上下比较可以发现前者多了一个filter("V1"='xx......xx44'. --//也就是使用standard_hash建立的索引也可以使用像v1=lpad('x',100,'x')||44这样的谓词查询。 --//类似的情况到目前位置我仅仅知道trunc以及substr(v1,1,N)的情况。 5.继续测试..2: --//作者并没有提示使用的是12c,如果不是12c应该没有standard_hash函数。自己建立1个md5函数看看。 CREATE OR REPLACE FUNCTION FN_MD5(par1 IN VARCHAR) RETURN VARCHAR2   DETERMINISTIC IS   retval varchar2(32); BEGIN   retval := utl_raw.cast_to_raw(DBMS_OBFUSCATION_TOOLKIT.MD5(INPUT_STRING => par1));   RETURN retval; END FN_MD5; SCOTT@test01p> create unique index if_t_v1_2 on t(fn_md5(v1)); Index created. SCOTT@test01p> select segment_name,header_file,header_block, BYTES,BLOCKS from dba_segments where owner=user and segment_name in ('I_T_V1','T','IF_T_V1_1','IF_T_V1_2'); SEGMENT_NAME         HEADER_FILE HEADER_BLOCK      BYTES     BLOCKS -------------------- ----------- ------------ ---------- ---------- T                             11          306   27262976       3328 IF_T_V1_1                     11          434    7340032        896 I_T_V1                        11         1066   28311552       3456 IF_T_V1_2                     11        14850   10485760       1280 --//使用fn_md5自定义函数的索引比standard_hash索引稍微大一些。 SCOTT@test01p> select id,substr( v1,100,3) from t where fn_md5(v1)=fn_md5(lpad('x',100,'x')||48);         ID SUBSTR ---------- ------         48 x48 SCOTT@test01p> @ dpc '' '' PLAN_TABLE_OUTPUT ------------------------------------- SQL_ID  ggv02trp66xas, child number 0 ------------------------------------- select id,substr( v1,100,3) from t where fn_md5(v1)=fn_md5(lpad('x',100,'x')||48) Plan hash value: 675596911 ----------------------------------------------------------------------------------------------------------------------------------- | Id  | Operation                   | Name      | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | ----------------------------------------------------------------------------------------------------------------------------------- |   0 | SELECT STATEMENT            |           |      1 |        |       |     3 (100)|          |      1 |00:00:00.01 |       4 | |   1 |  TABLE ACCESS BY INDEX ROWID| T         |      1 |      1 |   132 |     3   (0)| 00:00:01 |      1 |00:00:00.01 |       4 | |*  2 |   INDEX UNIQUE SCAN         | IF_T_V1_2 |      1 |      1 |       |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       3 | ----------------------------------------------------------------------------------------------------------------------------------- Query Block Name / Object Alias (identified by operation id): -------------------------------------------------------------    1 - SEL$1 / T@SEL$1    2 - SEL$1 / T@SEL$1 Predicate Information (identified by operation id): ---------------------------------------------------    2 - access("T"."SYS_NC00004$"="FN_MD5"('xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx               xxxxxxxxxxxxxxxxxxxx48')) --//OK没有问题。 6.继续测试..3: --//如果不考虑唯一性,可以考虑ora_hash函数。 SCOTT@test01p> create  index if_t_v1_3 on t(ora_hash(v1)); Index created. SCOTT@test01p> select segment_name,header_file,header_block, BYTES,BLOCKS from dba_segments where owner=user and                segment_name in ('I_T_V1','T','IF_T_V1_1','IF_T_V1_2','IF_T_V1_3'); SEGMENT_NAME         HEADER_FILE HEADER_BLOCK      BYTES     BLOCKS -------------------- ----------- ------------ ---------- ---------- T                             11          306   27262976       3328 IF_T_V1_1                     11          434    7340032        896 I_T_V1                        11         1066   28311552       3456 IF_T_V1_2                     11        14850   10485760       1280 IF_T_V1_3                     11        16130    5242880        640 --//使用ora_hash函数索引更小。 SCOTT@test01p>  select count(*),ora_hash(v1) from t group by ora_hash(v1) order by 1 desc FETCH FIRST 2 ROWS ONLY;   COUNT(*) ORA_HASH(V1) ---------- ------------          2   3933908345          2   3975299677 SCOTT@test01p> select id,substr(v1,100,10) from t where ora_hash(v1)=3933908345;         ID SUBSTR(V1,100,10) ---------- --------------------      36510 x36510     144288 x144288         SCOTT@test01p> select id,substr( v1,100,3) from t where ora_hash(v1)=ora_hash(lpad('x',100,'x')||36510);         ID SUBSTR ---------- ------      36510 x36     144288 x14 SCOTT@test01p> @ dpc '' '' PLAN_TABLE_OUTPUT ------------------------------------- SQL_ID  4dvtgan3r97pp, child number 1 ------------------------------------- select id,substr( v1,100,3) from t where ora_hash(v1)=ora_hash(lpad('x',100,'x')||36510) Plan hash value: 1660760003 ------------------------------------------------------------------------------------------------------------------------------------------- | Id  | Operation                           | Name      | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | ------------------------------------------------------------------------------------------------------------------------------------------- |   0 | SELECT STATEMENT                    |           |      1 |        |       |   801 (100)|          |      2 |00:00:00.01 |       5 | |   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T         |      1 |   2000 |   244K|   801   (0)| 00:00:01 |      2 |00:00:00.01 |       5 | |*  2 |   INDEX RANGE SCAN                  | IF_T_V1_3 |      1 |    800 |       |     1   (0)| 00:00:01 |      2 |00:00:00.01 |       3 | ------------------------------------------------------------------------------------------------------------------------------------------- Query Block Name / Object Alias (identified by operation id): -------------------------------------------------------------    1 - SEL$1 / T@SEL$1    2 - SEL$1 / T@SEL$1 Predicate Information (identified by operation id): ---------------------------------------------------    2 - access("T"."SYS_NC00005$"=3933908345) --//也可以使用索引,缺点返回多余的行。 总结: --//使用standard_hash,自定义md5函数,或者ora_hash,但是ora_hash不保证唯一。我知道就这些,不知道还有什么其它方法。 --//另外关于ORA-01450: maximum key length (3215) exceeded错误,可以考虑定义更大的数据块db_16k_cache_size, --//db_32k_cache_size.

相关推荐