[20240512]没有建表引发的灾难.txt

来源:这里教程网 时间:2026-03-03 20:01:02 作者:

[20240512]没有建表引发的灾难.txt --//五一前遇到的问题,生产系统应用程序升级,但是3个表忘记建立,而编写的程序可能存在问题,导致频繁调用这些根本不可能执行的sql --//语句,正常情况下应该报ORA-00942: table or view does not exist,很奇怪的同事反映应用前台根本不报错,真不知道开发如何写代 --//码,难道没做例外处理以及记录吗?导致前台无法应用程序无法执行任何操作,业务停顿接近1个小时. --//在测试环境模拟看看出现这样的情况,看看如何诊断和分析问题. 1.环境: SCOTT@test01p> @ver1 PORT_STRING          VERSION    BANNER                                                                       CON_ID -------------------- ---------- ---------------------------------------------------------------------------- ------ IBMPC/WIN_NT64-9.1.0 12.2.0.1.0 Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production      0 $ ./sql_idz.sh  'select count(1) from deptxxx'  0 sql_text = select count(1) from deptxxx\0 full_hash_value(16) = AA0B8A5E997323CE2D65F9B7AF91ED4F hash_value(10) = 2945576271 sql_id(32) = 2utgtqyrt3vag sql_id(32) = 2utgtqyrt3vag sql_id(32) = 2utgtqyrt3vag 2.建立测试环境: $ cat m2.txt set verify off declare v_id number; begin     for i in 1 .. &&1 loop         select count(1) into v_id from deptxxx; end loop; end ; / quit SCOTT@test01p> @ m2.txt 20         select count(1) into v_id from deptxxx;                                        * ERROR at line 5: ORA-06550: line 5, column 40: PL/SQL: ORA-00942: table or view does not exist ORA-06550: line 5, column 9: PL/SQL: SQL Statement ignored --//不能通过PL/SQL测试,等于无法编译自动中断执行.也就是仅仅调用1次, 3.换一个测试方式: $ cat m2.sh #! /bin/bash sqlplus -s -l scott/btbtms@test01p <<EOF >/dev/null set feedback off select count(1) from deptxxx; / ... --//snap / quit EOF --//主:中间的/太长,我加入20000个/,我的测试环境cygwin有点慢.如果是linux环境,你可以写成$(seq 20000|xargs -IQ echo /). --//这样执行有一个缺点就是先要展开里面/才能执行,这样在测试环境时有点延迟执行,我的测试环境延迟实在是无法忍受. --//我估计在linux执行也存在类似问题,可以尝试改成$(seq 1000|xargs -IQ echo /). --//实在不行还是写成20000个/的形式. --//开始测试,每次并发10个执行. $ ~/bin/zzdate ; seq 100 | xargs -IQ -P 10 bash -c ./m2.sh trunc(sysdate)+19/24+56/1440+51/86400 == 2024/05/12 19:56:51 == "timestamp'2024-05-12 19:56:51'" ... SYS@test> @ ashtop event 1=1 trunc(sysdate)+19/24+56/1440+51/86400 sysdate   Total                                                                                         Distinct Distinct    Distinct Seconds     AAS %This   EVENT                         FIRST_SEEN          LAST_SEEN           Execs Seen  Tstamps Execs Seen1 ------- ------- ------- ----------------------------- ------------------- ------------------- ---------- -------- -----------     150     3.8   54% | kksfbc child completion       2024-05-12 19:56:56 2024-05-12 19:57:30          1       32          32      69     1.7   25% | library cache lock            2024-05-12 19:56:59 2024-05-12 19:57:30          1       17          17      19      .5    7% | cursor: pin S wait on X       2024-05-12 19:56:57 2024-05-12 19:57:28          1       11          11      19      .5    7% |                               2024-05-12 19:56:57 2024-05-12 19:57:30          1       18          18      11      .3    4% | SQL*Net break/reset to client 2024-05-12 19:56:54 2024-05-12 19:57:28          1       10          10       9      .2    3% | library cache: mutex X        2024-05-12 19:56:59 2024-05-12 19:57:29          1        8           8       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~       1      .0    0% | control file sequential read  2024-05-12 19:57:26 2024-05-12 19:57:26          1        1           1 7 rows selected. --//2024/05/12 19:56:51过了8秒以后出现library cache lock,library cache: mutex X. SYS@test> @ ashtop sql_id,event 1=1 trunc(sysdate)+19/24+56/1440+51/86400 sysdate   Total                                                                                                       Distinct Distinct    Distinct Seconds     AAS %This   SQL_ID        EVENT                         FIRST_SEEN          LAST_SEEN           Execs Seen  Tstamps Execs Seen1 ------- ------- ------- ------------- ----------------------------- ------------------- ------------------- ---------- -------- -----------     633     4.1   49% | 2utgtqyrt3vag library cache lock            2024-05-12 19:56:59 2024-05-12 19:59:24          1      113         113     411     2.7   32% | 2utgtqyrt3vag kksfbc child completion       2024-05-12 19:56:56 2024-05-12 19:59:23          1      121         121      84      .5    7% | 2utgtqyrt3vag                               2024-05-12 19:56:57 2024-05-12 19:59:24          1       77          77      55      .4    4% | 2utgtqyrt3vag cursor: pin S wait on X       2024-05-12 19:56:57 2024-05-12 19:59:24          1       39          39      36      .2    3% | 2utgtqyrt3vag library cache: mutex X        2024-05-12 19:56:59 2024-05-12 19:59:13          1       33          33      35      .2    3% |               SQL*Net break/reset to client 2024-05-12 19:56:54 2024-05-12 19:59:24          1       32          32      14      .1    1% |               library cache: mutex X        2024-05-12 19:57:08 2024-05-12 19:59:14          1       12          12       7      .0    1% |                                             2024-05-12 19:57:24 2024-05-12 19:59:17          1        7           7       2      .0    0% | 2utgtqyrt3vag cursor: mutex S               2024-05-12 19:58:23 2024-05-12 19:58:23          1        1           1       2      .0    0% | cbs6jb3jmg9nm                               2024-05-12 19:57:31 2024-05-12 19:59:20          2        2           2       2      .0    0% |               control file sequential read  2024-05-12 19:57:26 2024-05-12 19:57:32          1        2           2       1      .0    0% | 2utgtqyrt3vag library cache: bucket mutex X 2024-05-12 19:59:24 2024-05-12 19:59:24          1        1           1 12 rows selected. --//sql_id=2utgtqyrt3vag,我前面有计算.执行脚本源代码在最后. SYS@test> @ sql_id 2utgtqyrt3vag --SQL_ID = 2utgtqyrt3vag --//查询不到sql语句,我估计我同事在这里给搞昏了. --//如果当时查询到sql语句,我估计定位很快,不会走许多弯路.... SYS@test> @ sharepool/shp4x 2utgtqyrt3vag 0 TEXT                  KGLHDADR         KGLHDPAR         C40                          KGLHDLMD KGLHDPMD KGLHDIVC KGLOBHD0         KGLOBHD6 KGLOBHS0 KGLOBHS6 KGLOBT16   N0_6_16     N20   KGLNAHSH KGLOBT03      KGLOBT09 --------------------- ---------------- ---------------- ---------------------------- -------- -------- -------- ---------------- -------- -------- -------- -------- --------- ------- ---------- ------------- -------- .. child handle address  000007FF11A10938 000007FF000E6070 select count(1) from deptxxx        0        0       57 00               00              0        0  3561717   3561717 3561717 2945576271 2utgtqyrt3vag     4374 child handle address  000007FF11F45AD8 000007FF000E6070 select count(1) from deptxxx        0        0        3 00               00              0        0  3561717   3561717 3561717 2945576271 2utgtqyrt3vag     4375 parent handle address 000007FF000E6070 000007FF000E6070 select count(1) from deptxxx        1        0   275975 000007FF11F91680 00        3563000        0        0   3563000 3563000 2945576271 2utgtqyrt3vag    65535 4377 rows selected. --//我的测试产生大量子光标.估计 kksfbc child completion,cursor: pin S wait on X与这个因素有关 SYS@test> @ wcx sysdate-3/86400 sysdate -- Display ASH Wait Chain Signatures script v0.7 by Tanel Poder ( http://blog.tanelpoder.com ) %This  SECONDS AAS WAIT_CHAIN                                                                                                                                 FIRST_SEEN          LAST_SEEN ------ ------- --- ------------------------------------------------------------------------------------------------------------------------------------------ ------------------- -------------------    7%        2  .7 -> ,,@=>17,23902,@1=>library cache lock                                                                                                    2024-05-12 20:04:57 2024-05-12 20:04:58    7%        2  .7 -> 179,16416,@1=>181,3323,@1=>library cache lock -> 17,23902,@1=>179,16416,@1=>library cache lock -> ,,@=>17,23902,@1=>library cache lock  2024-05-12 20:04:57 2024-05-12 20:04:58    7%        2  .7 -> 179,16416,@1=>258,12330,@1=>library cache lock -> 17,23902,@1=>179,16416,@1=>library cache lock -> ,,@=>17,23902,@1=>library cache lock 2024-05-12 20:04:57 2024-05-12 20:04:58    7%        2  .7 -> 17,23902,@1=>179,16416,@1=>library cache lock -> ,,@=>17,23902,@1=>library cache lock                                                   2024-05-12 20:04:57 2024-05-12 20:04:58    7%        2  .7 -> 179,16416,@1=>175,11537,@1=>library cache lock -> 17,23902,@1=>179,16416,@1=>library cache lock -> ,,@=>17,23902,@1=>library cache lock 2024-05-12 20:04:57 2024-05-12 20:04:58    7%        2  .7 -> 179,16416,@1=>97,50538,@1=>library cache lock -> 17,23902,@1=>179,16416,@1=>library cache lock -> ,,@=>17,23902,@1=>library cache lock  2024-05-12 20:04:57 2024-05-12 20:04:58    7%        2  .7 -> 179,16416,@1=>18,48560,@1=>library cache lock -> 17,23902,@1=>179,16416,@1=>library cache lock -> ,,@=>17,23902,@1=>library cache lock  2024-05-12 20:04:57 2024-05-12 20:04:58    7%        2  .7 -> 179,16416,@1=>262,10796,@1=>library cache lock -> 17,23902,@1=>179,16416,@1=>library cache lock -> ,,@=>17,23902,@1=>library cache lock 2024-05-12 20:04:57 2024-05-12 20:04:58    7%        2  .7 -> 179,16416,@1=>98,33150,@1=>library cache lock -> 17,23902,@1=>179,16416,@1=>library cache lock -> ,,@=>17,23902,@1=>library cache lock  2024-05-12 20:04:57 2024-05-12 20:04:58    7%        2  .7 -> 179,16416,@1=>176,47654,@1=>library cache lock -> 17,23902,@1=>179,16416,@1=>library cache lock -> ,,@=>17,23902,@1=>library cache lock 2024-05-12 20:04:57 2024-05-12 20:04:58    3%        1  .3 -> ,,@=>175,11537,@1=>kksfbc child completion                                                                                              2024-05-12 20:04:59 2024-05-12 20:04:59    3%        1  .3 -> ,,@=>18,48560,@1=>                                                                                                                      2024-05-12 20:04:59 2024-05-12 20:04:59    3%        1  .3 -> 18,48560,@1=>176,47654,@1=>library cache lock -> ,,@=>18,48560,@1=>                                                                     2024-05-12 20:04:59 2024-05-12 20:04:59    3%        1  .3 -> 18,48560,@1=>98,33150,@1=>library cache lock -> ,,@=>18,48560,@1=>                                                                      2024-05-12 20:04:59 2024-05-12 20:04:59    3%        1  .3 -> 18,48560,@1=>17,23902,@1=>library cache lock -> ,,@=>18,48560,@1=>                                                                      2024-05-12 20:04:59 2024-05-12 20:04:59    3%        1  .3 -> 18,48560,@1=>258,12330,@1=>library cache lock -> ,,@=>18,48560,@1=>                                                                     2024-05-12 20:04:59 2024-05-12 20:04:59    3%        1  .3 -> 18,48560,@1=>97,50538,@1=>library cache lock -> ,,@=>18,48560,@1=>                                                                      2024-05-12 20:04:59 2024-05-12 20:04:59    3%        1  .3 -> ,,@=>181,3323,@1=>kksfbc child completion                                                                                               2024-05-12 20:04:59 2024-05-12 20:04:59    3%        1  .3 -> 18,48560,@1=>179,16416,@1=>library cache lock -> ,,@=>18,48560,@1=>                                                                     2024-05-12 20:04:59 2024-05-12 20:04:59 19 rows selected. --//出现大量library cache lock阻塞. SYS@test> @ ashtop event 1=1 sysdate-5/86400 sysdate   Total                                                                                                      Distinct Distinct    Distinct Seconds     AAS %This   EVENT                   FIRST_SEEN          LAST_SEEN           Execs Seen  Tstamps Execs Seen1 ------- ------- ------- ----------------------- ------------------- ------------------- ---------- -------- -----------      38     7.6   79% | library cache lock      2024-05-12 20:05:52 2024-05-12 20:05:56          1        5           5       4      .8    8% | library cache: mutex X  2024-05-12 20:05:54 2024-05-12 20:05:55          1        2           2       3      .6    6% | kksfbc child completion 2024-05-12 20:05:52 2024-05-12 20:05:56          1        3           3       3      .6    6% |                         2024-05-12 20:05:52 2024-05-12 20:05:56          1        3           3 --// kksfbc child completion 应该与产生大量子光标有关,为什么不是很清楚,生产系统我没有看到,也许不在前30条. SYS@test> @ ashtop sql_id,event,p1,p2,p3 "event like 'library cache%'" sysdate-5/86400 sysdate   Total                                                                                                                                  Distinct Distinct    Distinct Seconds AAS %This   SQL_ID        EVENT                              P1             P2      P3 FIRST_SEEN          LAST_SEEN           Execs Seen  Tstamps Execs Seen1 ------- --- ------- ------------- ---------------------- -------------- -------------- ------- ------------------- ------------------- ---------- -------- -----------       3  .6    6% | 2utgtqyrt3vag library cache lock      8792101725488  8792090672984 5373954 2024-05-12 20:07:02 2024-05-12 20:07:06          1        3           3       2  .4    4% | 2utgtqyrt3vag library cache lock      8792101725488  8791800107672 5373954 2024-05-12 20:07:04 2024-05-12 20:07:06          1        2           2       2  .4    4% | 2utgtqyrt3vag library cache lock      8792101725488  8791808899128 5373954 2024-05-12 20:07:03 2024-05-12 20:07:05          1        2           2       2  .4    4% | 2utgtqyrt3vag library cache lock      8792101725488  8792090306696 5373954 2024-05-12 20:07:03 2024-05-12 20:07:04          1        2           2       2  .4    4% | 2utgtqyrt3vag library cache lock      8792101725488  8792156248984 5373954 2024-05-12 20:07:04 2024-05-12 20:07:05          1        2           2       2  .4    4% | 2utgtqyrt3vag library cache lock      8792101725488  8792156687152 5373954 2024-05-12 20:07:03 2024-05-12 20:07:05          1        2           2       2  .4    4% | 2utgtqyrt3vag library cache: mutex X     3468241561              0 5374033 2024-05-12 20:07:04 2024-05-12 20:07:05          1        2           2       2  .4    4% | 2utgtqyrt3vag library cache: mutex X     3468241561   768799145984 5374033 2024-05-12 20:07:03 2024-05-12 20:07:03          1        1           1       1  .2    2% | 2utgtqyrt3vag library cache lock      8792101725488  8791798282960 5373954 2024-05-12 20:07:02 2024-05-12 20:07:02          1        1           1       1  .2    2% | 2utgtqyrt3vag library cache lock      8792101725488  8791798283256 5373954 2024-05-12 20:07:05 2024-05-12 20:07:05          1        1           1       1  .2    2% | 2utgtqyrt3vag library cache lock      8792101725488  8791798284016 5373954 2024-05-12 20:07:06 2024-05-12 20:07:06          1        1           1       1  .2    2% | 2utgtqyrt3vag library cache lock      8792101725488  8791800107376 5373954 2024-05-12 20:07:05 2024-05-12 20:07:05          1        1           1       1  .2    2% | 2utgtqyrt3vag library cache lock      8792101725488  8791800107376 5373955 2024-05-12 20:07:03 2024-05-12 20:07:03          1        1           1       1  .2    2% | 2utgtqyrt3vag library cache lock      8792101725488  8791800107968 5373954 2024-05-12 20:07:02 2024-05-12 20:07:02          1        1           1       1  .2    2% | 2utgtqyrt3vag library cache lock      8792101725488  8791808899720 5373954 2024-05-12 20:07:04 2024-05-12 20:07:04          1        1           1       1  .2    2% | 2utgtqyrt3vag library cache lock      8792101725488  8791808900480 5373954 2024-05-12 20:07:06 2024-05-12 20:07:06          1        1           1       1  .2    2% | 2utgtqyrt3vag library cache lock      8792101725488  8792090307920 5373954 2024-05-12 20:07:05 2024-05-12 20:07:05          1        1           1       1  .2    2% | 2utgtqyrt3vag library cache lock      8792101725488  8792090308384 5373954 2024-05-12 20:07:02 2024-05-12 20:07:02          1        1           1       1  .2    2% | 2utgtqyrt3vag library cache lock      8792101725488  8792090671928 5373954 2024-05-12 20:07:04 2024-05-12 20:07:04          1        1           1       1  .2    2% | 2utgtqyrt3vag library cache lock      8792101725488  8792100505400 5373954 2024-05-12 20:07:05 2024-05-12 20:07:05          1        1           1       1  .2    2% | 2utgtqyrt3vag library cache lock      8792101725488  8792100505992 5373954 2024-05-12 20:07:04 2024-05-12 20:07:04          1        1           1       1  .2    2% | 2utgtqyrt3vag library cache lock      8792101725488  8792100506752 5373954 2024-05-12 20:07:02 2024-05-12 20:07:02          1        1           1       1  .2    2% | 2utgtqyrt3vag library cache lock      8792101725488  8792102509568 5373954 2024-05-12 20:07:03 2024-05-12 20:07:03          1        1           1       1  .2    2% | 2utgtqyrt3vag library cache lock      8792101725488  8792102509864 5373954 2024-05-12 20:07:04 2024-05-12 20:07:04          1        1           1       1  .2    2% | 2utgtqyrt3vag library cache lock      8792101725488  8792102510920 5373955 2024-05-12 20:07:05 2024-05-12 20:07:05          1        1           1       1  .2    2% | 2utgtqyrt3vag library cache lock      8792101725488  8792156249912 5373954 2024-05-12 20:07:03 2024-05-12 20:07:03          1        1           1       1  .2    2% | 2utgtqyrt3vag library cache lock      8792101725488  8792156250208 5373955 2024-05-12 20:07:02 2024-05-12 20:07:02          1        1           1       1  .2    2% | 2utgtqyrt3vag library cache lock      8792101725488  8792156250672 5373954 2024-05-12 20:07:06 2024-05-12 20:07:06          1        1           1       1  .2    2% | 2utgtqyrt3vag library cache lock      8792101725488  8792156687784 5373954 2024-05-12 20:07:02 2024-05-12 20:07:02          1        1           1       1  .2    2% | 2utgtqyrt3vag library cache lock      8792101725488  8792156688376 5373954 2024-05-12 20:07:06 2024-05-12 20:07:06          1        1           1 30 rows selected. --//记住library cache lock的P1=8792101725488,library cache: mutex X P1=3468241561 --//8792101725488 = 0x7ff1219a530 --//5373954 = /2^16  %2^16 (Type | Mode) = 82,2 = 0x520002 ,82 表示SQL AREA BUILD. --//前面计算的sql语句hash_value(10) = 2945576271,而等待事件的P1,P2,P3并没有与之相关. SYS@test> select distinct kglhdnsp,kglhdnsd,kglobtyd from x$kglob  where KGLHDNSP=82 order by 1;       KGLHDNSP KGLHDNSD       KGLOBTYD -------------- -------------- -------------             82 SQL AREA BUILD CURSOR SYS@test> @ ev_name "library cache%" EVENT#   EVENT_ID NAME                              PARAMETER1     PARAMETER2   PARAMETER3         WAIT_CLASS_ID WAIT_CLASS# WAIT_CLASS  DISPLAY_NAME ------ ---------- --------------------------------- -------------- ------------ ------------------ ------------- ----------- ----------- ----------------------------------    342 2802704141 library cache pin                 handle address pin address  100*mode+namespace    3875070507           4 Concurrency library cache pin    343  916468430 library cache lock                handle address lock address 100*mode+namespace    3875070507           4 Concurrency library cache lock    344 2952162927 library cache load lock           object address lock address 100*mask+namespace    3875070507           4 Concurrency library cache load lock    345 1646780882 library cache: mutex X            idn            value        where                 3875070507           4 Concurrency library cache: mutex X    346 2601513493 library cache: bucket mutex X     idn            value        where                 3875070507           4 Concurrency library cache: bucket mutex X    347 2382371857 library cache: dependency mutex X idn            value        where                 3875070507           4 Concurrency library cache: dependency mutex X    348 2446268751 library cache: mutex S            idn            value        where                 3875070507           4 Concurrency library cache: mutex S   1361  800384622 library cache revalidation                                                          1893977003           0 Other       library cache revalidation   1362 1578286316 library cache shutdown                                                              1893977003           0 Other       library cache shutdown 9 rows selected. --//library cache lock的P1表示handle address,看看是怎么对象. --//library cache: mutex X的P1表示idn,我以前理解对应hash_vale.可以是sql语句或者某个对象的hash_value. --//Tanel Poder有1篇文档,Oracle_Latch_And_Mutex_Contention_Troubleshooting.pdf library cache: mutex* wait events 1) library cache hash bucket number (if idn <= 131072) 2) idn = hash value of the library cache object under protection (if idn >  131072) --//一点疑问,如果hash value<=131072(2^17)呢? --//library cache lock的P1表示handle address,看看是怎么对象. SYS@test> @ fchaz 0x7ff1219a530 LOC KSMCHPTR               KSMCHIDX       KSMCHDUR KSMCHCOM               KSMCHSIZ KSMCHCLS       KSMCHTYP KSMCHPAR         KSMCHPTR_END --- ---------------- -------------- -------------- ---------------- -------------- -------- -------------- ---------------- ----------------- SGA 000007FF1219A500              1              1 KGLHD                       856 recr                 80 00               000007FF1219A857 --//000007FF1219A500+0x30 = 8792101725488 = 0x7ff1219a530,实际上正好等于@ fchaz 0x7ff1219a530的查询地址. --//如果熟悉以前学习可以知道开始的偏移0x30(48字节)就是handle address,顺便说一下我这里是测试环境, --//执行tpt fcha脚本会报错或者要么很慢,生产系统共享池很大,扫描会很慢.我使用我改写fchaz,取消执行前的警告提示. SYS@test> @ sharepool/shp4 7ff1219a530 0 SYS@test> @ prxx ============================== TEXT                          : parent handle address KGLHDADR                      : 000007FF1219A530 KGLHDPAR                      : 000007FF1219A530 C40                           : 2d65f9b7af91ed4f KGLHDLMD                      : 2 KGLHDPMD                      : 0 KGLHDIVC                      : 0 KGLOBHD0                      : 00 KGLOBHD6                      : 00 KGLOBHS0                      : 0 KGLOBHS6                      : 0 KGLOBT16                      : 0 N0_6_16                       : 0 N20                           : 0 KGLNAHSH                      : 3468241561 KGLOBT03                      : KGLOBT09                      : 0 PL/SQL procedure successfully completed. --//看到的对象是2d65f9b7af91ed4f,而不是DEPTXXX. --//注:如果生产系统执行以上语句可能报错或者很慢,实际上选择hash_value=3468241561,使用shp4x.sql脚本查询,就没有这个问题. --//2d65f9b7af91ed4f ,我检索发现正好对应sql语句的 full_hash_value(16)的后16位. --//取后8位计算 af91ed4f = 2945576271,正好对应sql语句的hash_value之. --//而且KGLNAHSH正好等于library cache: mutex X 的P1=3468241561 --//说明是library cache: mutex X导致大量library cache lock. $ ./sql_idz.sh  'select count(1) from deptxxx'  0 sql_text = select count(1) from deptxxx\0 full_hash_value(16) = AA0B8A5E997323CE2D65F9B7AF91ED4F hash_value(10) = 2945576271 sql_id(32) = 2utgtqyrt3vag sql_id(32) = 2utgtqyrt3vag sql_id(32) = 2utgtqyrt3vag SYS@test> @ lcl display  library cache lock problem INST_ID SADDR            HANDLE            MOD  REQ OBJECT               SQL_ID            HASH_VALUE KGLNAOBJ         USER_NAME C50 ------- ---------------- ---------------- ---- ---- -------------------- ------------- -------------- ---------------- --------- --------------------------------------------------       1 000007FF09EF26B8 000007FF1219A530    2    0 2d65f9b7af91ed4f     1sym80r7bkant     3468241561 2d65f9b7af91ed4f SCOTT     alter system kill session '98,36220' immediate; display wait library cache lock SID        SERIAL# SPID     PID      P_SERIAL# EVENT              C50 --- -------------- ---- ------- -------------- ------------------ -------------------------------------------------- 264          53720 7040      27             34 library cache lock alter system kill session '264,53720' immediate;  18          58858 2812      64             14 library cache lock alter system kill session '18,58858' immediate;  97          45103 6488      65              9 library cache lock alter system kill session '97,45103' immediate; 176          47333 4044      66             12 library cache lock alter system kill session '176,47333' immediate; 262          33120 6588      67             10 library cache lock alter system kill session '262,33120' immediate;  17          50789 6304      68              9 library cache lock alter system kill session '17,50789' immediate;  98          36220 4768      69              8 library cache lock alter system kill session '98,36220' immediate; 258          12758 2828      71             23 library cache lock alter system kill session '258,12758' immediate; 8 rows selected. SYS@test> @ sql_id 1sym80r7bkant --SQL_ID = 1sym80r7bkant --//还是看不到sql语句.也许这是问题的难点,实际上对象是2d65f9b7af91ed4f. --//kill这些library cache lock进程没有任何用处,因为程序还在跑. --//实际上如果当时查询x$kglob视图,KGLOBT03=2utgtqyrt3vag,就可以获得sql语句,一看就明白问题在那里,重写建立表就ok了. SYS@test> select * from v$db_object_cache where name = '2d65f9b7af91ed4f'   2  @ prxx ============================== OWNER                         : $BUILD$ NAME                          : 2d65f9b7af91ed4f DB_LINK                       : NAMESPACE                     : SQL AREA BUILD TYPE                          : CURSOR SHARABLE_MEM                  : 0 LOADS                         : 0 EXECUTIONS                    : 0 LOCKS                         : 7 PINS                          : 0 KEPT                          : NO CHILD_LATCH                   : 76441 INVALIDATIONS                 : 0 HASH_VALUE                    : 3468241561 LOCK_MODE                     : SHARED PIN_MODE                      : NONE STATUS                        : UNKOWN TIMESTAMP                     : PREVIOUS_TIMESTAMP            : LOCKED_TOTAL                  : 850974 PINNED_TOTAL                  : 0 PROPERTY                      : FULL_HASH_VALUE               : 53270f034126313b1c7a6805ceb92a99 CON_ID                        : 3 CON_NAME                      : TEST01P ADDR                          : 000007FF1219A530 EDITION                       : PL/SQL procedure successfully completed. --//NAMESPACE=SQL AREA BUILD,select distinct kglhdnsp,kglhdnsd,kglobtyd from x$kglob  where KGLHDNSP=82 order by 1; --//kglhdnsp=82 , 对应8进制是\01222 --//3468241561 % 2^17 = 76441,对应CHILD_LATCH =76441. --//注意OWNER=$BUILD$,我使用SCOTT计算sql_id不对.使用我以前写sql_idx.sh脚本. $ echo -n -e '2d65f9b7af91ed4f.$BUILD$.TEST01P' > cc.txt $ ./sql_idz.sh  cc.txt  1 '\0122\0\0\0' sql_text = 2d65f9b7af91ed4f.$BUILD$.TEST01P\0122\0\0\0 full_hash_value(16) = 53270F034126313B1C7A6805CEB92A99 hash_value(10) = 3468241561 sql_id(32) = 1sym80r7bkant sql_id(32) = 1sym80r7bkant sql_id(32) = 1sym80r7bkant --//基本都能对上前面select * from v$db_object_cache where name = '2d65f9b7af91ed4f'的的输出. --//注:我改写我原来写的脚本,因为cat 文件里面包含chr(0),cat的输出就忽略掉. --//82 ascii码 R,注意使用dbms_sql_translator包计算不用加chr(0),这样对于表少1个chr(0). SELECT dbms_sql_translator.sql_id ('2d65f9b7af91ed4f.$BUILD$.TEST01PR'||chr(0)||chr(0)) c20, dbms_sql_translator.sql_hash ('2d65f9b7af91ed4f.$BUILD$.TEST01PR'||chr(0)||chr(0)) N20 FROM dual; C20                         N20 -------------------- ---------- 1sym80r7bkant        3468241561 --//也能对上. 4.总结: --//我已经基本展示生产系统出现的问题,还是不明白为什么这样会产生大量的子光标. --//在分析sql语句时或许与产生大量子光标有关,在对象2d65f9b7af91ed4f上存在争用,NAMESPACE = SQL AREA BUILD,导致 --//出现library cache: mutex X,library cache lock. --//另外我做了测试,如果打开几个回话,手工执行select count(1) from deptxxx;当然不是并发执行,并没有产生多个子光标的情况. --//也许问题就出在密集的执行产生大量子光标上面. --//在测试前我以为阻塞的位置要么与sql语句的hash_value有关,要么与deptxxx对象有关,甚至在shared pool latch上存在争用, --//因为错误的sql语句每次存在硬解析,实际的情况都不是,而在NAMESPACE = SQL AREA BUILD,建立的对象上存在争用. --//还有一些细节我需要验证,好好消化,写的有点乱... --//我估计同事看到sql_id后查询不到对应sql语句,而不断的kill进程并不会有助于问题解决,在这上面浪费大量时间,而且实际上 --//阻塞的时间都很短. --//实际上只要查询底层的x$kglob的,KGLOBT03=2utgtqyrt3vag,就可以获得sql语句,一看就明白问题在那里,至少能发现表没有建立. --//实际上还可以查询v$db_object_cache的HASH_VALUE(它可以通过sql_id获得),获得对应sql语句,该视图的底层就是x$kglob. --//贴一段生产系统看到的情况. > @ dashtop event "event like 'library cache%'" "timestamp'2024-04-26 08:00:57'" "timestamp'2024-04-26 10:30:00'"     Total   Seconds     AAS %This   EVENT                                      FIRST_SEEN          LAST_SEEN --------- ------- ------- ------------------------------------------ ------------------- -------------------   8164610   913.0   76%   library cache lock                         2024-04-26 08:23:22 2024-04-26 09:17:44   2587860   289.4   24%   library cache: mutex X                     2024-04-26 08:23:22 2024-04-26 09:17:44        10      .0    0%   library cache pin                          2024-04-26 10:00:40 2024-04-26 10:00:40 --//出现问题时间点2024-04-26 08:23:22,到2024-04-26 09:17:44问题解决.后面library cache pin仅仅出现1次,可以不用考虑. > @ dashtop event,sql_id "event='library cache lock'" "timestamp'2024-04-26 08:00:57'" "timestamp'2024-04-26 08:30:00'"     Total   Seconds     AAS %This   EVENT              SQL_ID        FIRST_SEEN          LAST_SEEN --------- ------- ------- ------------------ ------------- ------------------- -------------------    554750   318.3   96%   library cache lock arw6k2rf5s29b 2024-04-26 08:23:22 2024-04-26 08:29:55     13880     8.0    2%   library cache lock bakhcg5b1749p 2024-04-26 08:23:22 2024-04-26 08:29:55      2580     1.5    0%   library cache lock ddd4xgabw2tct 2024-04-26 08:23:33 2024-04-26 08:29:55      2440     1.4    0%   library cache lock 306sjy7n4m9bk 2024-04-26 08:23:22 2024-04-26 08:29:55      2300     1.3    0%   library cache lock av7t0b1v7jn1m 2024-04-26 08:23:22 2024-04-26 08:29:55       440      .3    0%   library cache lock 69zcqdq405z1h 2024-04-26 08:25:31 2024-04-26 08:29:55       420      .2    0%   library cache lock dgvzbsr848vam 2024-04-26 08:25:09 2024-04-26 08:29:55       410      .2    0%   library cache lock bg1t2pc5ym090 2024-04-26 08:24:15 2024-04-26 08:29:55       350      .2    0%   library cache lock 0j7bs1g3fg66x 2024-04-26 08:23:33 2024-04-26 08:29:55       310      .2    0%   library cache lock 7yhjdqcg8wzny 2024-04-26 08:23:22 2024-04-26 08:29:55       300      .2    0%   library cache lock 4f03m8k9vgb6r 2024-04-26 08:23:43 2024-04-26 08:29:55       250      .1    0%   library cache lock 9bsd8hfq3cf81 2024-04-26 08:23:54 2024-04-26 08:29:55       200      .1    0%   library cache lock 5xw2vsz1868zh 2024-04-26 08:25:31 2024-04-26 08:29:55        20      .0    0%   library cache lock 899c5c7ug4dp4 2024-04-26 08:23:22 2024-04-26 08:23:22 14 rows selected. > @ sqlhh arw6k2rf5s29b 100 time unit : millisecond BEGIN_INTERVAL_TIME    INST_ID SQL_ID        PLAN_HASH_VALUE EXECUTIONS ELA_MS_PER_EXEC CPU_MS_PER_EXEC ROWS_PER_EXEC LIOS_PER_EXEC BLKRD_PER_EXEC IOW_MS_PER_EXEC  AVG_IOW_MS CLW_MS_PER_EXEC APW_MS_PER_EXEC CCW_MS_PER_EXEC ------------------- ---------- ------------- --------------- ---------- --------------- --------------- ------------- ------------- -------------- --------------- ----------- --------------- --------------- --------------- 2024-04-26 11:00:15          1 arw6k2rf5s29b      3673589541     135628               0               0           0.0             0              0               0         0.0               0               0               0 2024-04-26 12:00:17          1 arw6k2rf5s29b      3673589541     125952               0               0           0.0             0              0               0         0.0               0               0               0 2024-04-26 13:00:36          1 arw6k2rf5s29b      3673589541     117259               0               0           0.0             0              0               0         0.0               0               0               0 2024-04-26 14:00:13          1 arw6k2rf5s29b      3673589541     121757               0               0           0.0             0              0               0         0.0               0               0               0 2024-04-26 15:00:15          1 arw6k2rf5s29b      3673589541     131134               0               0           0.0             0              0               0         0.0               0               0               0 2024-04-26 16:00:35          1 arw6k2rf5s29b      3673589541     135313               0               0           0.0             0              0               0         0.0               0               0               0 ... 2024-04-29 02:00:20          1 arw6k2rf5s29b      2199259672      54386               0               0           0.0             0              0               0         0.0               0               0               0 2024-04-29 03:00:39          1 arw6k2rf5s29b      2199259672      52685               0               0           0.0             0              0               0         0.0               0               0               0 2024-04-29 04:00:08          1 arw6k2rf5s29b      2199259672      52485               0               0           0.0             0              0               0         0.0               0               0               0 2024-04-29 05:00:17          1 arw6k2rf5s29b      2199259672      52623               0               0           0.0             0              0               0         0.0               0               0               0 2024-04-29 06:00:37          1 arw6k2rf5s29b      2199259672      51764               0               0           0.0             0              0               0         0.0               0               0               0 2024-04-29 07:00:04          1 arw6k2rf5s29b      2199259672      56257               0               0           0.0             0              0               0         0.0               0               0               0 2024-04-29 09:00:35          1 arw6k2rf5s29b      2199259672     127381               0               0           0.0             0              0               0         0.0               0               0               0 47 rows selected. --//可以发现1个细节,我无法解析,9:17恢复系统,很奇怪的是9:17-11:00并没有大量执行.而之后有记录. --//要么就是9,10点执行次数很少,没有写入awr相关历史表中,要么就是这个过程中程序升级回退了. --//最奇怪的涉及查询返回0,我查看相关表现在的情况表中没有任何记录,似乎再2024-04-29 09:00:35之后某个时刻发现问题,不再执行. --//我估计取消了升级与改动,我们团队缺乏必要的交流与沟通,根本不知道开发做了什么. --//另外有1个小技巧可能便于辅助判断,就是执行计划的SQL_PLAN_HASH_VALUE是0. > @ dashtop sql_id,SQL_PLAN_HASH_VALUE "event like 'library cache%'" "timestamp'2024-04-26 08:00:57'" "timestamp'2024-04-26 09:30:00'"     Total   Seconds     AAS %This   SQL_ID        SQL_PLAN_HASH_VALUE FIRST_SEEN          LAST_SEEN --------- ------- ------- ------------- ------------------- ------------------- -------------------   9898240  1852.6   92%   arw6k2rf5s29b                   0 2024-04-26 08:23:22 2024-04-26 09:17:32    409370    76.6    4%   bakhcg5b1749p                   0 2024-04-26 08:23:22 2024-04-26 09:17:44     81520    15.3    1%   ddd4xgabw2tct                   0 2024-04-26 08:23:33 2024-04-26 09:17:44     59940    11.2    1%   0j7bs1g3fg66x                   0 2024-04-26 08:23:33 2024-04-26 09:17:44     53760    10.1    0%   306sjy7n4m9bk                   0 2024-04-26 08:23:22 2024-04-26 09:17:32     40870     7.6    0%   dgvzbsr848vam           823207121 2024-04-26 08:24:58 2024-04-26 09:17:44     37100     6.9    0%   69zcqdq405z1h                   0 2024-04-26 08:25:20 2024-04-26 09:17:44     36280     6.8    0%   7yhjdqcg8wzny                   0 2024-04-26 08:23:22 2024-04-26 09:17:44     32860     6.2    0%   bg1t2pc5ym090          3916753348 2024-04-26 08:24:15 2024-04-26 09:17:44     31400     5.9    0%   av7t0b1v7jn1m                   0 2024-04-26 08:23:22 2024-04-26 09:17:32     17990     3.4    0%   fuspz01qnu8hd                   0 2024-04-26 08:52:28 2024-04-26 09:17:32      8800     1.6    0%   9bsd8hfq3cf81                   0 2024-04-26 08:23:54 2024-04-26 09:17:44      8300     1.6    0%   gc3s9xsr2v4td                   0 2024-04-26 08:44:24 2024-04-26 09:17:44      7060     1.3    0%   c4jmjmy1hsp7t                   0 2024-04-26 08:56:21 2024-04-26 09:17:44      4850      .9    0%   cyy3tk3dqxpf7                   0 2024-04-26 08:33:49 2024-04-26 09:17:32 ... 35 rows selected. --//实际上是SQL_PLAN_HASH_VALUE=0,基本都是表不存在的情况. --//当然我不知道存在一种特殊情况,sql_id记录的是匿名PL/SQL块语句.这样SQL_PLAN_HASH_VALUE=0.我没有测试. 5.相关脚本: --//lcl.sql PROMPT PROMPT display  library cache lock problem PROMPT column object format a20 column user_name format a20 SELECT inst_id       ,kgllkses saddr       ,kgllkhdl handle       ,kgllkmod MOD       ,kgllkreq REQ       ,kglnaobj object       ,KGLLKSQLID sql_id       ,kglnahsh hash_value       ,KGLNAOBJ       ,user_name       ,   'alter system kill session '''        || s.sid        || ','        || s.serial#        || ''''        || ' immediate;'           c50   --,lock_a.*   FROM x$kgllk lock_a, v$session s  WHERE    s.saddr = lock_a.kgllkses and  kgllkmod > 0        AND EXISTS               (SELECT lock_b.kgllkhdl                  FROM x$kgllk lock_b                 WHERE     kgllkses IN (SELECT saddr                                          FROM v$session                                         WHERE event like 'library cache lock') /* blocked session */                                         --WHERE event like 'library cache pin') /* blocked session */                       AND lock_a.kgllkhdl = lock_b.kgllkhdl                       AND kgllkreq > 0); PROMPT PROMPT display wait library cache lock PROMPT SELECT s.sid       ,s.serial#       ,p.spid       ,p.pid       ,p.serial# p_serial#       ,s.event       ,   'alter system kill session '''        || s.sid        || ','        || s.serial#        || ''''        || ' immediate;'           c50   FROM v$session s, v$process p  WHERE s.paddr = p.addr AND s.event= 'library cache lock'; --AND s.event= 'library cache pin'; --//shp4.sql column N0_6_16 format 99999999 column fcura_addrlen new_value _fcura_addrlen set termout off select vsize(addr)*2 fcura_addrlen from x$dual; set termout on SELECT DECODE (kglhdadr,                kglhdpar, 'parent handle address',                'child handle address')           text,        kglhdadr,        kglhdpar,        substr(kglnaobj,1,40) c40,        KGLHDLMD,        KGLHDPMD,        kglhdivc,        kglobhd0,        kglobhd6,        kglobhs0,kglobhs6,kglobt16,        kglobhs0+kglobhs6+kglobt16 N0_6_16,        kglobhs0+kglobhs1+kglobhs2+kglobhs3+kglobhs4+kglobhs5+kglobhs6+kglobt16 N20,        kglnahsh,        kglobt03 ,        kglobt09     FROM x$kglob  WHERE     KGLHDPAR = lpad(upper('&1'), &_fcura_addrlen, '0') or  KGLHDADR = lpad(upper('&1'), &_fcura_addrlen, '0') or  KGLOBHD0 = lpad(upper('&1'), &_fcura_addrlen, '0') --or  KGLOBHD1 = lpad(upper('&1'), &_fcura_addrlen, '0') --or  KGLOBHD2 = lpad(upper('&1'), &_fcura_addrlen, '0') --or  KGLOBHD3 = lpad(upper('&1'), &_fcura_addrlen, '0') --or  KGLOBHD4 = lpad(upper('&1'), &_fcura_addrlen, '0') --or  KGLOBHD5 = lpad(upper('&1'), &_fcura_addrlen, '0') or  KGLOBHD6 = lpad(upper('&1'), &_fcura_addrlen, '0') or  KGLOBT03 = lower('&1') or  KGLNAHSH= &2; --//shp4x.sql column N0_6_16 format 99999999 SELECT /*+ USE_CONCAT(@"SEL$1" 8 OR_PREDICATES(1)) */ DECODE (kglhdadr,                kglhdpar, 'parent handle address',                'child handle address')     text,        kglhdadr,        kglhdpar,        substr(kglnaobj,1,40) c40,        KGLHDLMD,        KGLHDPMD,        kglhdivc,        kglobhd0,        kglobhd6,        kglobhs0,kglobhs6,kglobt16,        kglobhs0+kglobhs6+kglobt16 N0_6_16,        kglobhs0+kglobhs1+kglobhs2+kglobhs3+kglobhs4+kglobhs5+kglobhs6+kglobt16 N20,        kglnahsh,        kglobt03,        kglobt09     FROM x$kglob  WHERE kglobt03 = lower('&1') or KGLNAHSH= &2; --//sql_idz.sh #! /bin/bash # calcucate sql_text of full_hash_value(16),hash_value(10),sql_id(32). # argv1 sql statement or sql of text file # argv2 flag: 0= sql statement 1=sql of text file for sqlplus  2=sql of text file for other 3=original # argv3 default = '\0' add tailstr odebug=${ODEBUG:-0} oflag=${2:-0} tailstr=${3:-'\0'} if [ $oflag -eq 0 ] then     sql_text=${1}${tailstr} fi # sqlplus format sql_text if [ $oflag -eq 1 ] then sql_text="$( cat $1 | sed -e "s/ $//" -e "s/\s*$//" -e '$s/;$//')""${tailstr}" # sql_text="$( cat $1 | unix2dos | sed '$s/;\s*$//')"'\0' # sql_text="$( cat $1 | sed -e "s/ $//" -e sed '$s/;\s*$//')"'\0' # sql_text="$( cat $1 | sed '$s/;\s*$//')"'\0' fi # other format sql_text if [ $oflag -eq 2 ] then sql_text="$( cat $1 | sed '$s/;\s*$//')""${tailstr}" # sql_text="$( cat $1 | unix2dos | sed '$s/;\s*$//')"'\0' # sql_text="$( cat $1 | sed -e "s/ $//" -e '$s/;\s*$//')"'\0' # sql_text="$( cat $1 | sed '$s/;\s*//')"'\0' fi # exact_matching_signature, force_matching_signature if [ $oflag -eq 3 ] then     sql_text=${1} fi v1=$(echo -e -n "$sql_text" | md5sum | sed 's/  -//' | xxd -r -p | od -t x4 |  sed   -n  -e 's/^0\+ //' -e 's/ //gp' | tr 'a-z' 'A-Z') v2=${v1:(-16):16} v3=${v2:(-8):8} # v2=$(echo "obase=16;ibase=16; $v1 % 10000000000000000" | bc| tr -d '\\\r\n') # v3=$(echo "obase=10;ibase=16; $v1 % 100000000" | bc| tr -d '\\\r\n') if [ $odebug -eq 1 ] ; then         echo v1=$v1 v2=$v2 v3=$v3 fi echo "sql_text = $sql_text" echo "full_hash_value(16) = $v1 or ${v1,,}" if [ $oflag -eq 3 ] ; then     echo "xxxxx_matching_signature(10) = $(( 16#$v2 )) or " $(echo $(( 16#$v2 )) + 2^64|bc ) fi echo "hash_value(10) = $(( 16#$v3 )) " BASE32=($(echo {0..9} {a..z} | tr -d 'eilo')) res='' for i in $(echo "obase=32;ibase=16; $v2" | bc| tr -d '\\\r\n') do         res=${res}${BASE32[$(( 10#$i ))]} done echo "sql_id(32) = $(printf "%13s" $res | tr ' ' '0')" echo "sql_id(32) = $(printf "%013s" $res)" res1=$(eval $(echo "obase=32;ibase=16; $v2" | bc| tr -d '\\\r\n' | awk 'BEGIN{RS=" +"; printf "echo " }/./{printf "${BASE32[$(( 10#%02d))]}", $1}' )) echo "sql_id(32) = $(printf "%013s" $res1)"

相关推荐